ModelScope Usage
ModelScope is an open-source model-as-service platform supported by Alibaba, which provides flexible and convenient model applications for users in academia and industry. For specific usages and open source models, please refer to ModelScope. In the domain of speech, we provide autoregressive/non-autoregressive speech recognition, speech pre-training, punctuation prediction and other models, which are convenient for users.
Overall Introduction
We provide the usages of different models under the egs_modelscope
, which supports directly employing our provided models for inference, as well as finetuning the models we provided as pre-trained initial models. Next, we will introduce the model provided in the egs_modelscope/asr/paraformer/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
directory, including infer.py
, finetune.py
and infer_after_finetune .py
. The corresponding functions are as follows:
infer.py
: perform inference on the specified dataset based on our provided modelfinetune.py
: employ our provided model as the initial model for fintuninginfer_after_finetune.py
: perform inference on the specified dataset based on the finetuned model
Inference
We provide infer.py
to achieve the inference. Based on this file, users can preform inference on the specified dataset based on our provided model and obtain the corresponding recognition results. If the transcript is given, the CER
will be calculated at the same time. Before performing inference, users can set the following parameters to modify the inference configuration:
data_dir
:dataset directory. The directory should contain the wav list filewav.scp
and the transcript filetext
(optional). For the format of these two files, please refer to the instructions in Quick Start. If thetext
file exists, the CER will be calculated accordingly, otherwise it will be skipped.output_dir
:the directory for saving the inference resultsbatch_size
:batch size during the inferencectc_weight
:some models contain a CTC module, users can set this parameter to specify the weight of the CTC module during the inference
In addition to directly setting parameters in infer.py
, users can also manually set the parameters in the decoding.yaml
file in the model download directory to modify the inference configuration.
Finetuning
We provide finetune.py
to achieve the finetuning. Based on this file, users can finetune on the specified dataset based on our provided model as the initial model to achieve better performance in the specificed domain. Before finetuning, users can set the following parameters to modify the finetuning configuration:
data_path
:dataset directory。This directory should contain thetrain
directory for saving the training set and thedev
directory for saving the validation set. Each directory needs to contain the wav list filewav.scp
and the transcript filetext
output_dir
:the directory for saving the finetuning resultsdataset_type
:for small dataset,set assmall
;for dataset larger than 1000 hours,set aslarge
batch_bins
:batch size,if dataset_type is set assmall
,the unit of batch_bins is the number of fbank feature frames; if dataset_type is set aslarge
, the unit of batch_bins is millisecondsmax_epoch
:the maximum number of training epochs
The following parameters can also be set. However, if there is no special requirement, users can ignore these parameters and use the default value we provided directly:
accum_grad
:the accumulation of the gradientkeep_nbest_models
:select thekeep_nbest_models
models with the best performance and average the parameters of these models to get a better modeloptim
:set the optimizerlr
:set the learning ratescheduler
:set learning rate adjustment strategyscheduler_conf
:set the related parameters of the learning rate adjustment strategyspecaug
:set for the spectral augmentationspecaug_conf
:set related parameters of the spectral augmentation
In addition to directly setting parameters in finetune.py
, users can also manually set the parameters in the finetune.yaml
file in the model download directory to modify the finetuning configuration.
Inference after Finetuning
We provide infer_after_finetune.py
to achieve the inference based on the model finetuned by users. Based on this file, users can preform inference on the specified dataset based on the finetuned model and obtain the corresponding recognition results. If the transcript is given, the CER
will be calculated at the same time. Before performing inference, users can set the following parameters to modify the inference configuration:
data_dir
:dataset directory。The directory should contain the wav list filewav.scp
and the transcript filetext
(optional). If thetext
file exists, the CER will be calculated accordingly, otherwise it will be skipped.output_dir
:the directory for saving the inference resultsbatch_size
:batch size during the inferencectc_weight
:some models contain a CTC module, users can set this parameter to specify the weight of the CTC module during the inferencedecoding_model_name
:set the name of the model used for the inference
The following parameters can also be set. However, if there is no special requirement, users can ignore these parameters and use the default value we provided directly:
modelscope_model_name
:the initial model name used when finetuningrequired_files
:files required for the inference when using the modelscope interface
Announcements
Some models may have other specific parameters during the finetuning and inference. The usages of these parameters can be found in the README.md
file in the corresponding directory.