日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

基于開源庫so-vits-svc生成AI歌手

本文為您介紹如何在阿里云DSW中,基于so-vits-svc開源庫端到端生成一個AI歌手。

背景信息

在人工智能浪潮的推動下,技術的不斷加持讓虛擬人類愈發逼真。越來越多的虛擬人類被開發并應用于互聯網中。技術使機器具備了人的特性,而人類也在追求智能化的道路上越走越遠。使用人工智能克隆人類聲音的場景已經不再僅限于熒屏之中。在今天,虛擬人類作為技術創新和文藝創作的結合體,讓AI歌手成為了打開虛擬人與人世界的鑰匙。本文將為您介紹如何生成一個AI歌手。AI歌手的效果演示如下:

  • 目標人物聲音:

  • 歌曲原聲:

  • 換聲后的效果:

準備環境和資源

  • 創建工作空間,詳情請參見創建工作空間

  • 創建DSW實例,其中關鍵參數配置如下。具體操作,請參見創建及管理DSW實例

    • 地域及可用區:進行本實踐操作時,建議選擇華北2(北京)華東2(上海)華東1(杭州)華南1(深圳)這四個地域。這四個地域在后續操作中下載ChatGLM模型數據時速度更快。

    • 實例規格選擇:ecs.gn6v-c8g1.2xlarge。

    • 鏡像選擇:在官方鏡像中選擇stable-diffusion-webui-env:pytorch1.13-gpu-py310-cu117-ubuntu22.04

步驟一:在DSW中打開教程文件

  1. 進入PAI-DSW開發環境。

    1. 登錄PAI控制臺

    2. 在左側導航欄單擊工作空間列表,在工作空間列表頁面中單擊待操作的工作空間名稱,進入對應工作空間內。

    3. 在頁面左上方,選擇使用服務的地域。

    4. 在左側導航欄,選擇模型開發與訓練 > 交互式建模(DSW)

    5. 可選:交互式建模(DSW)頁面的搜索框,輸入實例名稱或關鍵字,搜索實例。

    6. 單擊需要打開的實例操作列下的打開

  2. Notebook頁簽的Launcher頁面,單擊快速開始區域Tool下的DSW Gallery,打開DSW Gallery頁面。image.png

  3. DSW Gallery頁面中,搜索并找到如何生成“AI歌手”教程,單擊教程卡片中的DSW中打開

    單擊后即會自動將本教程所需的資源和教程文件下載至DSW實例中,并在下載完成后自動打開教程文件。aa99bd52391ef07f3a3472e74f627b9d.png

步驟二:運行教程文件

在打開的教程文件ai_singer.ipynb文件中,您可以直接看到教程文本,您可以在教程文件中直接運行對應的步驟的命令,當成功運行結束一個步驟命令后,再順次運行下個步驟的命令。59ec9a1549250e834a7ff4a14cc9fe55.png本教程包含的操作步驟以及每個步驟的運行結果如下。

  1. 下載so-vits-svc源碼并安裝依賴包。

    1. 克隆開源代碼。

      單擊此處查看運行結果

      Cloning into 'so-vits-svc'...
      remote: Enumerating objects: 3801, done.
      remote: Total 3801 (delta 0), reused 0 (delta 0), pack-reused 3801
      Receiving objects: 100% (3801/3801), 10.70 MiB | 29.38 MiB/s, done.
      Resolving deltas: 100% (2392/2392), done.
      Note: switching to '8aeeb10'.
      
      You are in 'detached HEAD' state. You can look around, make experimental
      changes and commit them, and you can discard any commits you make in this
      state without impacting any branches by switching back to a branch.
      
      If you want to create a new branch to retain commits you create, you may
      do so (now or later) by using -c with the switch command. Example:
      
        git switch -c <new-branch-name>
      
      Or undo this operation with:
      
        git switch -
      
      Turn off this advice by setting config variable advice.detachedHead to false
      
      HEAD is now at 8aeeb10 feat(preprocess): skip hidden files with prefix `.`
    2. 安裝依賴包。

      【說明】結果中出現的ERRORWARNING信息可以忽略。

      單擊此處查看運行結果

      Looking in indexes: https://mirrors.cloud.aliyuncs.com/pypi/simple
      Collecting ffmpeg-python
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/d7/0c/56be52741f75bad4dc6555991fabd2e07b432d333da82c11ad701123888a/ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
      Collecting Flask
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/fd/56/26f0be8adc2b4257df20c1c4260ddd0aa396cf8e75d90ab2f7ff99bc34f9/flask-2.3.3-py3-none-any.whl (96 kB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.1/96.1 kB 15.5 MB/s eta 0:00:00
      Collecting Flask_Cors
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/10/69/1e6cfb87117568a9de088c32d6258219e9d1ff7c131abf74249ef2031279/Flask_Cors-4.0.0-py2.py3-none-any.whl (14 kB)
      Requirement already satisfied: gradio>=3.7.0 in /usr/local/lib/python3.10/dist-packages (from -r ./so-vits-svc/requirements.txt (line 4)) (3.16.2)
      Collecting numpy==1.23.5
        Downloading https://mirrors.cloud.aliyuncs.com/pypi/packages/e4/f3/679b3a042a127de0d7c84874913c3e23bb84646eb3bc6ecab3f8c872edc9/numpy-1.23.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.1/17.1 MB 17.0 MB/s eta 0:00:0000:0100:01
      ......
      ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
      xformers 0.0.16rc425 requires torch==1.13.1, but you have torch 2.0.1 which is incompatible.
      torchvision 0.14.1+cu117 requires torch==1.13.1, but you have torch 2.0.1 which is incompatible.
      Successfully installed Flask-2.3.3 Flask_Cors-4.0.0 SoundFile-0.12.1 Werkzeug-2.3.7 antlr4-python3-runtime-4.8 audioread-3.0.0 bitarray-2.8.1 blinker-1.6.2 certifi-2023.7.22 colorama-0.4.6 cython-3.0.2 edge_tts-6.1.8 einops-0.6.1 fairseq-0.12.2 faiss-cpu-1.7.4 ffmpeg-python-0.2.0 hydra-core-1.0.7 itsdangerous-2.1.2 joblib-1.3.2 langdetect-1.0.9 librosa-0.9.1 local_attention-1.8.6 loguru-0.7.0 numpy-1.23.5 omegaconf-2.0.6 onnx-1.14.1 onnxoptimizer-0.3.13 onnxsim-0.4.33 pooch-1.7.0 portalocker-2.7.0 praat-parselmouth-0.4.3 pynvml-11.5.0 pyworld-0.3.4 resampy-0.4.2 sacrebleu-2.3.1 scikit-learn-1.3.0 scikit-maad-1.4.0 scipy-1.10.0 tabulate-0.9.0 tensorboardX-2.6.2.2 threadpoolctl-3.2.0 torch-2.0.1 torchaudio-2.0.2 torchcrepe-0.0.21
      WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
      
      [notice] A new release of pip is available: 23.0.1 -> 23.2.1
      ......
      7Progress: [ 93%] [#####################################################.....] 8Setting up liblilv-0-0:amd64 (0.24.12-2) ...
      Setting up libopenmpt0:amd64 (0.6.1-1) ...
      Setting up libpulse0:amd64 (1:15.99.1+dfsg1-1ubuntu2.1) ...
      7Progress: [ 94%] [######################################################....] 8Setting up libpango-1.0-0:amd64 (1.50.6+ds-2ubuntu1) ...
      Setting up libopenal1:amd64 (1:1.19.1-2build3) ...
      Setting up libswresample3:amd64 (7:4.4.2-0ubuntu0.22.04.1) ...
      7Progress: [ 95%] [#######################################################...] 8Setting up libpangoft2-1.0-0:amd64 (1.50.6+ds-2ubuntu1) ...
      Setting up libsdl2-2.0-0:amd64 (2.0.20+dfsg-2ubuntu1.22.04.1) ...
      Setting up libpangocairo-1.0-0:amd64 (1.50.6+ds-2ubuntu1) ...
      7Progress: [ 96%] [#######################################################...] 8Setting up libsphinxbase3:amd64 (0.8+5prealpha+1-13build1) ...
      Setting up librsvg2-2:amd64 (2.52.5+dfsg-3ubuntu0.2) ...
      Setting up libpocketsphinx3:amd64 (0.8.0+real5prealpha+1-14ubuntu1) ...
      7Progress: [ 97%] [########################################################..] 8Setting up libdecor-0-plugin-1-cairo:amd64 (0.1.0-3build1) ...
      Setting up librsvg2-common:amd64 (2.52.5+dfsg-3ubuntu0.2) ...
      Setting up libavcodec58:amd64 (7:4.4.2-0ubuntu0.22.04.1) ...
      Setting up libchromaprint1:amd64 (1.5.1-2) ...
      7Progress: [ 98%] [########################################################..] 8Setting up libavformat58:amd64 (7:4.4.2-0ubuntu0.22.04.1) ...
      Setting up libavfilter7:amd64 (7:4.4.2-0ubuntu0.22.04.1) ...
      Setting up libavdevice58:amd64 (7:4.4.2-0ubuntu0.22.04.1) ...
      7Progress: [ 99%] [#########################################################.] 8Setting up ffmpeg (7:4.4.2-0ubuntu0.22.04.1) ...
      Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
      Processing triggers for libgdk-pixbuf-2.0-0:amd64 (2.42.8+dfsg-1ubuntu0.2) ...
  2. 下載預訓練模型。

    1. 下載聲音編碼器模型。

      單擊此處查看運行結果

      --2023-08-30 08:40:25--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/pretrained_models/hubert_base.pt
      Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
      Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 189507909 (181M) [application/octet-stream]
      Saving to: ‘./so-vits-svc/pretrain/checkpoint_best_legacy_500.pt’
      
      ./so-vits-svc/pretr 100%[===================>] 180.73M  2.78MB/s    in 28s     
      
      2023-08-30 08:40:54 (6.40 MB/s) - ‘./so-vits-svc/pretrain/checkpoint_best_legacy_500.pt’ saved [189507909/189507909]
    2. 下載預訓練模型。

      單擊此處查看運行結果

      --2023-08-30 08:43:45--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/pretrained_models/ms903%3Asovits4.0-768vec-layer12/clean_D_320000.pth
      Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
      Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 187027770 (178M) [application/octet-stream]
      Saving to: ‘./so-vits-svc/logs/44k/D_0.pth’
      
      ./so-vits-svc/logs/ 100%[===================>] 178.36M  15.8MB/s    in 12s     
      
      2023-08-30 08:43:57 (15.5 MB/s) - ‘./so-vits-svc/logs/44k/D_0.pth’ saved [187027770/187027770]
      
      --2023-08-30 08:43:57--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/pretrained_models/ms903%3Asovits4.0-768vec-layer12/clean_G_320000.pth
      Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
      Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
      HTTP request sent, awaiting response... 200 OK
      Length: 209268661 (200M) [application/octet-stream]
      Saving to: ‘./so-vits-svc/logs/44k/G_0.pth’
      
      ./so-vits-svc/logs/ 100%[===================>] 199.57M  20.6MB/s    in 10s     
      
      2023-08-30 08:44:07 (20.0 MB/s) - ‘./so-vits-svc/logs/44k/G_0.pth’ saved [209268661/209268661]
  3. 下載訓練數據。

    您可以直接下載PAI準備好的訓練數據。您也可以自行下載數據并參照教程文本中的附錄內容完成數據清洗操作。

    單擊此處查看運行結果

    --2023-08-30 08:44:24--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/data/thchs30-C12.tar.gz
    Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
    Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 58648074 (56M) [application/gzip]
    Saving to: ‘thchs30-C12.tar.gz’
    
    thchs30-C12.tar.gz  100%[===================>]  55.93M  14.2MB/s    in 4.0s    
    
    2023-08-30 08:44:28 (14.0 MB/s) - ‘thchs30-C12.tar.gz’ saved [58648074/58648074]
    
    ./
    ./C12_569.wav
    ./C12_520.wav
    ./C12_724.wav
    ./C12_626.wav
    ./C12_559.wav
    ./C12_583.wav
    ......
    ./C12_687.wav
    ./C12_534.wav
    ./C12_745.wav
    ./C12_684.wav
    ./C12_738.wav
    ./C12_657.wav
    ./C12_523.wav
    ./C12_625.wav

    下載的樣本數據格式如下,支持多種人聲的訓練。

    dataset_raw
    ├───speaker1(C12)
    │   ├───xxx1.wav
    │   ├───...
    │   └───xxxn.wav
    ├───speaker2(可選)
    │    ├───xxx1.wav
    │    ├───...
    │    └───xxxn.wav
    ├───speakerN(可選)
  4. 預處理訓練數據。

    1. 重采樣數據。

      單擊此處查看運行結果

      CPU count: 8
      dataset_raw/C12
      resampling: ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:2700:0100:01
    2. 將數據切分為訓練集和驗證集并生成配置文件。

      單擊此處查看運行結果

      100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 124.91it/s]
      2023-08-30 08:46:12.683 | INFO     | __main__:<module>:74 - Writing ./filelists/train.txt
      100%|████████████████████████████████████| 248/248 [00:00<00:00, 1516308.15it/s]
      2023-08-30 08:46:12.684 | INFO     | __main__:<module>:80 - Writing ./filelists/val.txt
      100%|██████████████████████████████████████████| 2/2 [00:00<00:00, 79137.81it/s]
      2023-08-30 08:46:12.691 | INFO     | __main__:<module>:115 - Writing to configs/config.json
      2023-08-30 08:46:12.691 | INFO     | __main__:<module>:118 - Writing to configs/diffusion.yaml
    3. 生成音頻特征數據,并保存至./so-vits-svc/dataset/44k/C12目錄下。

      單擊此處查看運行結果

      vec768l12
      2023-08-30 08:46:37.346 | INFO     | __main__:<module>:152 - Using device: 
      2023-08-30 08:46:37.346 | INFO     | __main__:<module>:153 - Using SpeechEncoder: vec768l12
      2023-08-30 08:46:37.346 | INFO     | __main__:<module>:154 - Using extractor: dio
      2023-08-30 08:46:37.346 | INFO     | __main__:<module>:155 - Using diff Mode: False
        0%|                                                     | 0/1 [00:00<?, ?it/s]2023-08-30 08:46:40.577 | INFO     | __mp_main__:process_batch:107 - Loading speech encoder for content...
      2023-08-30 08:46:40.596 | INFO     | __mp_main__:process_batch:113 - Rank 1 uses device cuda:0
      WARNING:xformers:WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
          PyTorch 1.13.1+cu117 with CUDA 1107 (you have 2.0.1+cu117)
          Python  3.10.9 (you have 3.10.6)
        Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
        Memory-efficient attention, SwiGLU, sparse and more won't be available.
        Set XFORMERS_MORE_DETAILS=1 for more details
      load model(s) from pretrain/checkpoint_best_legacy_500.pt
      2023-08-30 08:46:46.644 | INFO     | __mp_main__:process_batch:115 - Loaded speech encoder for rank 1
      100%|█████████████████████████████████████████| 250/250 [02:43<00:00,  1.53it/s]
      100%|████████████████████████████████████████████| 1/1 [02:53<00:00, 173.02s/it]
  5. 訓練(可選)

    【說明】由于模型訓練時間比較長,您可以跳過該步驟,使用PAI準備好的模型文件直接進行模型推理。

    為了獲得更好的效果,建議您將epochs參數值修改為1000,每個epoch訓練時長大約為20~30秒。訓練時長大約持續500分鐘。

    單擊此處查看運行結果

    INFO:44k:{'train': {'log_interval': 200, 'eval_interval': 800, 'seed': 1234, 'epochs': 1500, 'learning_rate': 0.0001, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 6, 'fp16_run': False, 'half_type': 'fp16', 'lr_decay': 0.999875, 'segment_size': 10240, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'use_sr': True, 'max_speclen': 512, 'port': '8001', 'keep_ckpts': 3, 'all_in_mem': False, 'vol_aug': False}, 'data': {'training_files': 'filelists/train.txt', 'validation_files': 'filelists/val.txt', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': 22050, 'unit_interpolate_mode': 'nearest'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4, 4], 'n_layers_q': 3, 'n_flow_layer': 4, 'use_spectral_norm': False, 'gin_channels': 768, 'ssl_dim': 768, 'n_speakers': 1, 'vocoder_name': 'nsf-hifigan', 'speech_encoder': 'vec768l12', 'speaker_embedding': False, 'vol_embedding': False, 'use_depthwise_conv': False, 'flow_share_parameter': False, 'use_automatic_f0_prediction': True}, 'spk': {'C12': 0}, 'model_dir': './logs/44k'}
    ./logs/44k/G_0.pth
    emb_g.weight is not in the checkpoint,please check your checkpoint.If you're using pretrain model,just ignore this warning.
    INFO:44k:emb_g.weight is not in the checkpoint
    load 
    INFO:44k:Loaded checkpoint './logs/44k/G_0.pth' (iteration 0)
    ./logs/44k/D_0.pth
    load 
    INFO:44k:Loaded checkpoint './logs/44k/D_0.pth' (iteration 0)
    ./logs/44k/D_0.pth
    ......
    INFO:44k:Train Epoch: 990 [17%]
    INFO:44k:Losses: [2.3169736862182617, 2.2942988872528076, 9.555232048034668, 14.556828498840332, 0.6244402527809143], step: 62600, lr: 8.299526322416852e-05, reference_loss: 29.347774505615234
    INFO:44k:====> Epoch: 990, cost 23.75 s
    INFO:44k:====> Epoch: 991, cost 22.81 s
    INFO:44k:====> Epoch: 992, cost 22.70 s
    INFO:44k:====> Epoch: 993, cost 22.99 s
    INFO:44k:Train Epoch: 994 [93%]
    INFO:44k:Losses: [2.5843334197998047, 2.4109506607055664, 8.15036392211914, 12.917271614074707, 0.6071179509162903], step: 62800, lr: 8.295377337271398e-05, reference_loss: 26.6700382232666
    INFO:44k:====> Epoch: 994, cost 23.86 s
    INFO:44k:====> Epoch: 995, cost 21.87 s
    INFO:44k:====> Epoch: 996, cost 23.03 s
    INFO:44k:====> Epoch: 997, cost 22.81 s
    INFO:44k:====> Epoch: 998, cost 23.05 s
    INFO:44k:Train Epoch: 999 [69%]
    INFO:44k:Losses: [2.552673816680908, 2.0296831130981445, 3.976914405822754, 13.161809921264648, 0.2755252420902252], step: 63000, lr: 8.290194022426301e-05, reference_loss: 21.996606826782227
    INFO:44k:====> Epoch: 999, cost 24.08 s
    INFO:44k:====> Epoch: 1000, cost 22.81 s

步驟三:推理模型

完成以上操作后,您已經成功完成了AI歌手的模型訓練。您可以使用上述步驟訓練好的模型文件或者使用PAI準備好的模型文件進行離線推理。推理結果默認保存在./results目錄下。您可以在教程文件中繼續運行推理章節的操作步驟。具體操作步驟以及每個步驟的執行結果如下。

  1. (可選)下載PAI準備好的模型文件,并將模型文件保存至./so-vits-svc/logs/G_8800_8gpus.pth目錄下。

    【說明】如果您使用上述步驟訓練好的模型文件進行離線推理,則可以跳過該步驟。

    --2023-08-30 08:50:10--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/models/C12/G_8800.pth
    Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
    Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 627897375 (599M) [application/octet-stream]
    Saving to: ‘logs/G_8800_8gpus.pth’
    
    logs/G_8800_8gpus.p 100%[===================>] 598.81M  13.8MB/s    in 45s     
    
    2023-08-30 08:50:55 (13.3 MB/s) - ‘logs/G_8800_8gpus.pth’ saved [627897375/627897375]
  2. 下載測試數據,并保存至./raw目錄下。本教程使用UVR5分離好的數據作為測試數據。由于離線推理需要使用干凈的人聲數據,如果您想自行準備測試數據,則需要參照教程文本中的附錄內容完成數據清洗操作。同時,推理數據必須存放在./raw目錄下。

    --2023-08-30 08:51:48--  http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/projects/so-vits-svc/data/one.tar.gz
    Resolving pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)... 39.98.1.111
    Connecting to pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com (pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com)|39.98.1.111|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 15195943 (14M) [application/gzip]
    Saving to: ‘./raw/one.tar.gz’
    
    one.tar.gz          100%[===================>]  14.49M  12.5MB/s    in 1.2s    
    
    2023-08-30 08:51:50 (12.5 MB/s) - ‘./raw/one.tar.gz’ saved [15195943/15195943]
    
    one/
    one/1_one_(Instrumental).wav
    one/1_one_(Vocals).wav
    one/one.mp3
    one/1_1_one_(Vocals)_(Vocals).wav
    one/1_1_one_(Vocals)_(Instrumental).wav
  3. 將聲音替換為C12人物的聲音。

    load 
    WARNING:xformers:WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
        PyTorch 1.13.1+cu117 with CUDA 1107 (you have 2.0.1+cu117)
        Python  3.10.9 (you have 3.10.6)
      Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
      Memory-efficient attention, SwiGLU, sparse and more won't be available.
      Set XFORMERS_MORE_DETAILS=1 for more details
    load model(s) from pretrain/checkpoint_best_legacy_500.pt
    #=====segment start, 7.76s======
    vits use time:0.8072702884674072
    #=====segment start, 6.62s======
    vits use time:0.11305761337280273
    #=====segment start, 6.76s======
    vits use time:0.11228108406066895
    #=====segment start, 6.98s======
    vits use time:0.11324000358581543
    #=====segment start, 0.005s======
    jump empty segment
  4. 讀取聲音。

  5. 合并人聲和伴奏。

    Export successfully!