FunASR Chinese Speech Recognition | pyVideoTrans-Open Source Video Translation Tool -pyvideotrans.com github.com/jianchang512/pyvideotrans

FunASR Chinese Speech Recognition:

FunASR is an open-source speech recognition toolkit from Alibaba. It outperforms the whisper series in Chinese speech scenarios. Video translation software already supports its use through HTTP calls via the zh_recogn and SenseVoice projects. You only need to deploy the corresponding integrated packages for zh_recogn and SenseVoice. After starting them, enter the API address in the video translation software to use it.

However, many users find this operation confusing. Therefore, starting from version v2.97, this functionality has been integrated directly into the video translation software. This means you no longer need to separately deploy and start the zh_recogn and SenseVoice projects. You can simply select FunASR Chinese Recognition within the software to use it.

Select FunASR Chinese in Speech Recognition

After selecting FunASR Chinese Recognition in the speech recognition settings, you can choose to use either the paraformer-zh model or the SenseVoiceSmall model. It is recommended to choose the former, as it offers better performance and speed than the latter.

Downloading the FunASR Chinese Recognition Model Online for the First Time

To avoid an excessively large package size, the FunASR models are not included in the software package. The first time you use it, the models will be automatically downloaded from modelscope.cn. After downloading, they will be saved in the models folder under the software directory, specifically in the hub subfolder. Depending on your network conditions, this may take anywhere from a few minutes to tens of minutes. As long as there are no red error messages, please wait patiently for the download to complete.