Skip to content
Snippets Groups Projects
README.md 9.94 KiB
Newer Older
维石's avatar
维石 committed
[![SVG Banners](https://svg-banners.vercel.app/api?type=rainbow&text1=FunClip%20%20🥒&width=800&height=210)](https://github.com/Akshay090/svg-banners)
shixian.shi's avatar
shixian.shi committed

维石's avatar
维石 committed
### <p align="center">「[简体中文](./README_zh.md) | English」</p>
维石's avatar
维石 committed

维石's avatar
维石 committed
**<p align="center"> ⚡ Open-source, accurate and easy-to-use video clipping tool </p>**
**<p align="center"> 🧠 Explore LLM based video clipping with FunClip </p>**
维石's avatar
维石 committed

维石's avatar
维石 committed
<p align="center"> <img src="docs/images/interface.jpg" width=444/></p>
shixian.shi's avatar
shixian.shi committed

维石's avatar
维石 committed
<p align="center" class="trendshift">
维石's avatar
维石 committed
<a href="https://trendshift.io/repositories/10126" target="_blank"><img src="https://trendshift.io/api/badge/repositories/10126" alt="alibaba-damo-academy%2FFunClip | Trendshift" style="width: 250px; height: 55px;" width="300" height="55"/></a>
维石's avatar
维石 committed
</p>

维石's avatar
维石 committed
<div align="center">  
维石's avatar
维石 committed
<h4>
<a href="#What's New"> What's New </a>
<a href="#On Going"> On Going </a>
<a href="#Install"> Install </a>
<a href="#Usage"> Usage </a>
<a href="#Community"> Community </a>
维石's avatar
维石 committed
</h4>
</div>

维石's avatar
维石 committed
**FunClip** is a fully open-source, locally deployed automated video clipping tool. It leverages Alibaba TONGYI speech lab's open-source [FunASR](https://github.com/alibaba-damo-academy/FunASR) Paraformer series models to perform speech recognition on videos. Then, users can freely choose text segments or speakers from the recognition results and click the clip button to obtain the video clip corresponding to the selected segments (Quick Experience [Modelscope⭐](https://modelscope.cn/studios/iic/funasr_app_clipvideo/summary) [HuggingFace🤗](https://huggingface.co/spaces/R1ckShi/FunClip)).
shixian.shi's avatar
shixian.shi committed

维石's avatar
维石 committed
## Highlights🎨
shixian.shi's avatar
shixian.shi committed

维石's avatar
维石 committed
- 🔥Try AI clipping using LLM in FunClip now.
维石's avatar
维石 committed
- FunClip integrates Alibaba's open-source industrial-grade model [Paraformer-Large](https://modelscope.cn/models/iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), which is one of the best-performing open-source Chinese ASR models available, with over 13 million downloads on Modelscope. It can also accurately predict timestamps in an integrated manner.
- FunClip incorporates the hotword customization feature of [SeACo-Paraformer](https://modelscope.cn/models/iic/speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), allowing users to specify certain entity words, names, etc., as hotwords during the ASR process to enhance recognition results.
- FunClip integrates the [CAM++](https://modelscope.cn/models/iic/speech_campplus_sv_zh-cn_16k-common/summary) speaker recognition model, enabling users to use the auto-recognized speaker ID as the target for trimming, to clip segments from a specific speaker.
- The functionalities are realized through Gradio interaction, offering simple installation and ease of use. It can also be deployed on a server and accessed via a browser.
- FunClip supports multi-segment free clipping and automatically returns full video SRT subtitles and target segment SRT subtitles, offering a simple and convenient user experience.
shixian.shi's avatar
shixian.shi committed

维石's avatar
维石 committed
<a name="What's New"></a>
## What's New🚀
维石's avatar
维石 committed
- 2024/06/12 FunClip supports recognize and clip English audio files now. Run `python funclip/launch.py -l en` to try.
维石's avatar
维石 committed
- 🔥2024/05/13 FunClip v2.0.0 now supports smart clipping with large language models, integrating models from the qwen series, GPT series, etc., providing default prompts. You can also explore and share tips for setting prompts, the usage is as follows:
  1. After the recognition, select the name of the large model and configure your own apikey;
维石's avatar
维石 committed
  2. Click on the 'LLM Inference' button, and FunClip will automatically combine two prompts with the video's srt subtitles;
  3. Click on the 'AI Clip' button, and based on the output results of the large language model from the previous step, FunClip will extract the timestamps for clipping;
维石's avatar
维石 committed
  4. You can try changing the prompt to leverage the capabilities of the large language models to get the results you want;
- 2024/05/09 FunClip updated to v1.1.0, including the following updates and fixes:
  - Support configuration of output file directory, saving ASR intermediate results and video clipping intermediate files;
维石's avatar
维石 committed
  - UI upgrade (see guide picture below), video and audio cropping function are on the same page now, button position adjustment;
  - Fixed a bug introduced due to FunASR interface upgrade, which has caused some serious clipping errors;
维石's avatar
维石 committed
  - Support configuring different start and end time offsets for each paragraph;
  - Code update, etc;
- 2024/03/06 Fix bugs in using FunClip with command line.
- 2024/02/28 [FunASR](https://github.com/alibaba-damo-academy/FunASR) is updated to 1.0 version, use FunASR1.0 and SeACo-Paraformer to conduct ASR with hotword customization.
- 2023/10/17 Fix bugs in multiple periods chosen, used to return video with wrong length.
- 2023/10/10 FunClipper now supports recognizing with speaker diarization ability, choose 'yes' button in 'Recognize Speakers' and you will get recognition results with speaker id for each sentence. And then you can clip out the periods of one or some speakers (e.g. 'spk0' or 'spk0#spk3') using FunClipper.
维石's avatar
维石 committed

维石's avatar
维石 committed
<a name="On Going"></a>
## On Going🌵
维石's avatar
维石 committed

维石's avatar
维石 committed
- [x] FunClip will support Whisper model for English users, coming soon (ASR using Whisper with timestamp requires massive GPU memory, we support timestamp prediction for vanilla Paraformer in FunASR to achieving this).
维石's avatar
维石 committed
- [x] FunClip will further explore the abilities of large langage model based AI clipping, welcome to discuss about prompt setting and clipping, etc.
- [ ] Reverse periods choosing while clipping.
- [ ] Removing silence periods.
维石's avatar
维石 committed

维石's avatar
维石 committed
<a name="Install"></a>
## Install🔨
维石's avatar
维石 committed

维石's avatar
维石 committed
### Python env install
shixian.shi's avatar
shixian.shi committed

维石's avatar
维石 committed
FunClip basic functions rely on a python environment only.
shixian.shi's avatar
shixian.shi committed
```shell
维石's avatar
维石 committed
# clone funclip repo
维石's avatar
维石 committed
git clone https://github.com/alibaba-damo-academy/FunClip.git
cd FunClip
维石's avatar
维石 committed
# install Python requirments
维石's avatar
维石 committed
pip install -r ./requirements.txt
shixian.shi's avatar
shixian.shi committed
```

维石's avatar
维石 committed
### imagemagick install (Optional)

If you want to clip video file with embedded subtitles
维石's avatar
维石 committed

维石's avatar
维石 committed
1. ffmpeg and imagemagick is required
维石's avatar
维石 committed

维石's avatar
维石 committed
- On Ubuntu
shixian.shi's avatar
shixian.shi committed
```shell
apt-get -y update && apt-get -y install ffmpeg imagemagick
sed -i 's/none/read,write/g' /etc/ImageMagick-6/policy.xml
```
维石's avatar
维石 committed
- On MacOS
shixian.shi's avatar
shixian.shi committed
```shell
brew install imagemagick
sed -i 's/none/read,write/g' /usr/local/Cellar/imagemagick/7.1.1-8_1/etc/ImageMagick-7/policy.xml 
```
维石's avatar
维石 committed
- On Windows
维石's avatar
维石 committed

维石's avatar
维石 committed
Download and install imagemagick https://imagemagick.org/script/download.php#windows
维石's avatar
维石 committed

维石's avatar
维石 committed
Find your python install path and change the `IMAGEMAGICK_BINARY` to your imagemagick install path in file `site-packages\moviepy\config_defaults.py`
维石's avatar
维石 committed

维石's avatar
维石 committed
2. Download font file to funclip/font
shixian.shi's avatar
shixian.shi committed

```shell
维石's avatar
维石 committed
wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ClipVideo/STHeitiMedium.ttc -O font/STHeitiMedium.ttc
shixian.shi's avatar
shixian.shi committed
```
维石's avatar
维石 committed
<a name="Usage"></a>
## Use FunClip
shixian.shi's avatar
shixian.shi committed

维石's avatar
维石 committed
### A. Use FunClip as local Gradio Service
You can establish your own FunClip service which is same as [Modelscope Space](https://modelscope.cn/studios/iic/funasr_app_clipvideo/summary) as follow:
shixian.shi's avatar
shixian.shi committed
```shell
维石's avatar
维石 committed
python funclip/launch.py
维石's avatar
维石 committed
# '-l en' for English audio recognize
# '-p xxx' for setting port number
# '-s True' for establishing service for public accessing
shixian.shi's avatar
shixian.shi committed
```
维石's avatar
维石 committed
then visit ```localhost:7860``` you will get a Gradio service like below and you can use FunClip following the steps:
维石's avatar
维石 committed

维石's avatar
维石 committed
- Step1: Upload your video file (or try the example videos below)
- Step2: Copy the text segments you need to 'Text to Clip'
- Step3: Adjust subtitle settings (if needed)
- Step4: Click 'Clip' or 'Clip and Generate Subtitles'
维石's avatar
维石 committed

维石's avatar
维石 committed
<img src="docs/images/guide.jpg"/>

Follow the guide below to explore LLM based clipping:

<img src="docs/images/LLM_guide.png" width=360/>
维石's avatar
维石 committed

### B. Experience FunClip in Modelscope
维石's avatar
维石 committed

维石's avatar
维石 committed
[FunClip@Modelscope Space⭐](https://modelscope.cn/studios/iic/funasr_app_clipvideo/summary)
维石's avatar
维石 committed

维石's avatar
维石 committed
[FunClip@HuggingFace Space🤗](https://huggingface.co/spaces/R1ckShi/FunClip)
维石's avatar
维石 committed

### C. Use FunClip in command line

FunClip supports you to recognize and clip with commands:
shixian.shi's avatar
shixian.shi committed
```shell
维石's avatar
维石 committed
# step1: Recognize
维石's avatar
维石 committed
python funclip/videoclipper.py --stage 1 \
shixian.shi's avatar
shixian.shi committed
                       --file examples/2022云栖大会_片段.mp4 \
                       --output_dir ./output
维石's avatar
维石 committed
# now you can find recognition results and entire SRT file in ./output/
# step2: Clip
维石's avatar
维石 committed
python funclip/videoclipper.py --stage 2 \
shixian.shi's avatar
shixian.shi committed
                       --file examples/2022云栖大会_片段.mp4 \
                       --output_dir ./output \
                       --dest_text '我们把它跟乡村振兴去结合起来,利用我们的设计的能力' \
                       --start_ost 0 \
                       --end_ost 100 \
                       --output_file './output/res.mp4'
```

维石's avatar
维石 committed
<a name="Community"></a>
## Community Communication🍟
维石's avatar
维石 committed

维石's avatar
维石 committed
FunClip is firstly open-sourced bu FunASR team, any useful PR is welcomed.
维石's avatar
维石 committed

维石's avatar
维石 committed
You can also scan the following DingTalk group or WeChat group QR code to join the community group for communication.
维石's avatar
维石 committed

维石's avatar
维石 committed
|                           DingTalk group                            |                     WeChat group                      |
维石's avatar
维石 committed
|:-------------------------------------------------------------------:|:-----------------------------------------------------:|
| <div align="left"><img src="docs/images/dingding.png" width="250"/> | <img src="docs/images/wechat.png" width="215"/></div> |

维石's avatar
维石 committed
## Find Speech Models in FunASR
维石's avatar
维石 committed

维石's avatar
维石 committed
[FunASR](https://github.com/alibaba-damo-academy/FunASR) hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on ModelScope, researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun!
shixian.shi's avatar
shixian.shi committed

维石's avatar
维石 committed
📚FunASR Paper: <a href="https://arxiv.org/abs/2305.11013"><img src="https://img.shields.io/badge/Arxiv-2305.11013-orange"></a> 
维石's avatar
维石 committed

维石's avatar
维石 committed
📚SeACo-Paraformer Paper: <a href="https://arxiv.org/abs/2308.03266"><img src="https://img.shields.io/badge/Arxiv-2308.03266-orange"></a>
维石's avatar
维石 committed

维石's avatar
维石 committed
🌟Support FunASR: <a href='https://github.com/alibaba-damo-academy/FunASR/stargazers'><img src='https://img.shields.io/github/stars/alibaba-damo-academy/FunASR.svg?style=social'></a>