97人人超碰国产精品,这里只有精品免费观看网站,亚洲欧美日韩一区在线观看

說明

支持的領域 / 任務：audio（音頻）/ ttsv2（語音合成）。

CosyVoice語音合成是基于通義實驗室的生成式語音大模型（CosyVoice），依托大規模預訓練語言模型，深度融合文本理解和語音生成的一項新型語音合成技術，能夠精準解析并詮釋各類文本內容，將其轉化為宛如真人般的自然語音，提供超自然擬人的語音合成能力。支持文本至語音的流式輸入和流式輸出。

除了傳統的“輸入一段文本→直接輸出音頻/流式輸出音頻”的交互方式外，CosyVoice還提供了“流式輸入文本→流式輸出音頻”的純流式交互方式，可以實時合成LLM流式生成的文本。

前提條件

已開通服務并獲得API-KEY：獲取API-KEY。
已安裝最新版SDK：安裝DashScope SDK。

同步調用

提交單個語音合成任務，無需調用回調函數，進行語音合成（無流式輸出中間結果），最終一次性獲取完整結果。

請求示例

以下示例展示如何使用同步接口調用語音大模型CosyVoice的發音人龍小淳（longxiaochun），將文案“今天天氣怎么樣”合成采樣率為22050Hz、音頻格式為MP3的音頻，并保存到名為output.mp3的文件中。

說明

需要使用您的API-KEY替換示例中的your-dashscope-api-key，代碼才能正常運行。
同步接口將阻塞當前線程，直到合成完成或者出現錯誤。

Python

# coding=utf-8

import dashscope
from dashscope.audio.tts_v2 import *

# 將your-dashscope-api-key替換成您自己的API-KEY
dashscope.api_key = "your-dashscope-api-key"
model = "cosyvoice-v1"
voice = "longxiaochun"


synthesizer = SpeechSynthesizer(model=model, voice=voice)
audio = synthesizer.call("今天天氣怎么樣？")
print('requestId: ', synthesizer.get_last_request_id())
with open('output.mp3', 'wb') as f:
    f.write(audio)

Java

package SpeechSynthesisDemo;

import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;

public class Tts2File {
  /**
   * 將your-dashscope-api-key替換成您自己的API-KEY
   */
  private static String apikey = "your-dashscope-api-key";
  private static String model = "cosyvoice-v1";
  private static String voice = "longxiaochun";

  public static void StreamAuidoDataToSpeaker() {
    SpeechSynthesisParam param =
        SpeechSynthesisParam.builder()
            .apiKey(apikey)
            .model(model)
            .voice(voice)
            .build();
    SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, null);
    ByteBuffer audio = synthesizer.call("今天天氣怎么樣？");
    File file = new File("output.mp3");
    System.out.print("requestId: " + synthesizer.getLastRequestId());
    try (FileOutputStream fos = new FileOutputStream(file)) {
      fos.write(audio.array());
    } catch (IOException e) {
      throw new RuntimeException(e);
    }
  }

  public static void main(String[] args) {
    StreamAuidoDataToSpeaker();
    System.exit(0);
  }
}

請求參數說明

參數	類型	是否必填	默認值	說明
model	string	是	無	指定用于語音合成的模型名（指定為：cosyvoice-v1）。
voice	string	是	無	指定用于語音合成的音色名，更多信息，請參見音色列表。
text	string	是	無	待合成文本。
format	AudioFormat	否	模型列表中發音人對應的默認采樣率和音頻格式。	合成音頻的編碼格式，支持下列格式： WAV_8000HZ_MONO_16BIT WAV_16000HZ_MONO_16BIT WAV_22050HZ_MONO_16BIT WAV_24000HZ_MONO_16BIT WAV_44100HZ_MONO_16BIT WAV_48000HZ_MONO_16BIT MP3_8000HZ_MONO_128KBPS MP3_16000HZ_MONO_128KBPS MP3_22050HZ_MONO_256KBPS MP3_24000HZ_MONO_256KBPS MP3_44100HZ_MONO_256KBPS MP3_48000HZ_MONO_256KBPS PCM_8000HZ_MONO_16BIT PCM_16000HZ_MONO_16BIT PCM_22050HZ_MONO_16BIT PCM_24000HZ_MONO_16BIT PCM_44100HZ_MONO_16BIT PCM_48000HZ_MONO_16BIT
volume	int	否	50	合成音頻的音量，取值范圍：0~100。
speech_rate	double	否	1.0	合成音頻的語速，取值范圍：0.5~2。 0.5：表示默認語速的0.5倍速。 1：表示默認語速。默認語速是指模型默認輸出的合成語速，語速會依據每一個發音人略有不同，約每秒鐘4個字。 2：表示默認語速的2倍速。
pitch_rate	double	否	1.0	合成音頻的語調，取值范圍：0.5~2。

返回結果說明

返回結果為合成的二進制音頻數據。

接口詳情

Python

"""
Speech synthesis.
If callback is set, the audio will be returned in real-time through the on_event interface.
Otherwise, this function blocks until all audio is received and then returns the complete audio data.

Parameters:
-----------
text: str
    utf-8 encoded text
return: bytes
    If a callback is not set during initialization, the complete audio is returned as the function's return value. Otherwise, the return value is null.
"""
def call(self, text:str):

Java

/**
 * Speech synthesis.<br>
 * If callback is set, the audio will be returned in real-time through the on_event interface.<br>
 * Otherwise, this function blocks until all audio is received and then returns the complete audio data.
 *
 * @param text utf-8 encoded text
 * @return If a callback is not set during initialization, the complete audio is returned as the function's return value. Otherwise, the return value is null.
 */
public ByteBuffer call(String text)

異步調用

提交單個語音合成任務，通過回調的方式流式輸出中間結果，合成結果通過ResultCallback中的回調函數流式進行獲取。

調用示例

以下示例，展示如何使用同步接口調用語音大模型CosyVoice的發音人龍小淳（longxiaochun），將文案“今天天氣怎么樣”合成采樣率為22050Hz，音頻格式為MP3的音頻。

說明

需要使用您的API-KEY替換示例中的your-dashscope-api-key，代碼才能正常運行。
異步接口不會阻塞當前線程，需要監聽onComplete事件接收完所有音頻。

Python

# coding=utf-8

import dashscope
from dashscope.audio.tts_v2 import *

# 將your-dashscope-api-key替換成您自己的API-KEY
dashscope.api_key = "your-dashscope-api-key"
model = "cosyvoice-v1"
voice = "longxiaochun"


class Callback(ResultCallback):
    _player = None
    _stream = None

    def on_open(self):
        self.file = open("output.mp3", "wb")
        print("websocket is open.")

    def on_complete(self):
        print("speech synthesis task complete successfully.")

    def on_error(self, message: str):
        print(f"speech synthesis task failed, {message}")

    def on_close(self):
        print("websocket is closed.")
        self.file.close()

    def on_event(self, message):
        print(f"recv speech synthsis message {message}")

    def on_data(self, data: bytes) -> None:
        print("audio result length:", len(data))
        self.file.write(data)


callback = Callback()

synthesizer = SpeechSynthesizer(
    model=model,
    voice=voice,
    callback=callback,
)

synthesizer.call("今天天氣怎么樣？")
print('requestId: ', synthesizer.get_last_request_id())

Java

package com.alibaba.dashscope;

import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;
import java.util.concurrent.CountDownLatch;

public class StreamInputTtsPlayableDemo {
  /**
   * 將your-dashscope-api-key替換成您自己的API-KEY
   */
  private static String apikey = "your-dashscope-api-key";
  private static String model = "cosyvoice-v1";
  private static String voice = "longxiaochun";

  public static void StreamAuidoDataToSpeaker() {
    CountDownLatch latch = new CountDownLatch(1);

    // 配置回調函數
    ResultCallback<SpeechSynthesisResult> callback =
        new ResultCallback<SpeechSynthesisResult>() {
          @Override
          public void onEvent(SpeechSynthesisResult result) {
            System.out.println("收到消息: " + result);
            if (result.getAudioFrame() != null) {
              // TODO: 處理音頻
              System.out.println("收到音頻");
            }
          }

          @Override
          public void onComplete() {
            System.out.println("收到Complete");
            latch.countDown();
          }

          @Override
          public void onError(Exception e) {
            System.out.println("收到錯誤: " + e.toString());
            latch.countDown();
          }
        };

    SpeechSynthesisParam param =
        SpeechSynthesisParam.builder()
            .apiKey(apikey)
            .model(model)
            .voice(voice)
            .build();
    SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, callback);
    // 帶Callback的call方法將不會阻塞當前線程
    synthesizer.call("今天天氣怎么樣？");
    System.out.print("requestId: " + synthesizer.getLastRequestId());
    // 等待合成完成
    try {
      latch.await();
      // 等待播放線程全部播放完
    } catch (InterruptedException e) {
      throw new RuntimeException(e);
    }
  }

  public static void main(String[] args) {
    StreamAuidoDataToSpeaker();
    System.exit(0);
  }
}

請求參數說明

和同步調用一致，在初始化時設定callback，則call函數轉為異步接口，會立刻返回null。音頻從回調函數中實時返回。詳情請參見請求參數說明。

返回結果說明

數據在on_event回調返回的SpeechSynthesisResult對象中。包含如下成員方法用于獲取數據：

成員方法	方法簽名	說明方法
getAudioFrame	ByteBuffer getAudioFrame()	返回一個流式合成片段的增量二進制音頻數據，可能為空。

call函數無返回數據。

流式輸入調用

調用示例

在同一個語音合成任務中分多次提交文本，并通過回調的方式實時獲取合成結果。

以下示例，展示如何使用同步接口調用語音合成大模型CosyVoice的發音人龍小淳（longxiaochun），分多次發送文案，合成采樣率為22050Hz，編碼格式為PCM的音頻，并使用播放器實時播放。

Python

# coding=utf-8
#
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import time
import pyaudio
import dashscope
from dashscope.api_entities.dashscope_response import SpeechSynthesisResponse
from dashscope.audio.tts_v2 import *

# 將your-dashscope-api-key替換成您自己的API-KEY
dashscope.api_key = "your-dashscope-api-key"
model = "cosyvoice-v1"
voice = "longxiaochun"


class Callback(ResultCallback):
    _player = None
    _stream = None

    def on_open(self):
        print("websocket is open.")
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16, channels=1, rate=22050, output=True
        )

    def on_complete(self):
        print("speech synthesis task complete successfully.")

    def on_error(self, message: str):
        print(f"speech synthesis task failed, {message}")

    def on_close(self):
        print("websocket is closed.")
        # 停止播放器
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()

    def on_event(self, message):
        print(f"recv speech synthsis message {message}")

    def on_data(self, data: bytes) -> None:
        print("audio result length:", len(data))
        self._stream.write(data)


callback = Callback()

test_text = [
    "流式文本語音合成SDK，",
    "可以將輸入的文本",
    "合成為語音二進制數據，",
    "相比于非流式語音合成，",
    "流式合成的優勢在于實時性",
    "更強。用戶在輸入文本的同時",
    "可以聽到接近同步的語音輸出，",
    "極大地提升了交互體驗，",
    "減少了用戶等待時間。",
    "適用于調用大規模",
    "語言模型（LLM），以",
    "流式輸入文本的方式",
    "進行語音合成的場景。",
]

synthesizer = SpeechSynthesizer(
    model=model,
    voice=voice,
    format=AudioFormat.PCM_22050HZ_MONO_16BIT,  
    callback=callback,
)


for text in test_text:
    synthesizer.streaming_call(text)
    time.sleep(0.5)
synthesizer.streaming_complete()
print('requestId: ', synthesizer.get_last_request_id())

Java

package com.alibaba.dashscope;

import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;
import java.util.concurrent.CountDownLatch;

public class StreamInputTtsPlayableDemo {
    private static String[] textArray = {"流式文本語音合成SDK，",
            "可以將輸入的文本", "合成為語音二進制數據，", "相比于非流式語音合成，",
            "流式合成的優勢在于實時性", "更強。用戶在輸入文本的同時",
            "可以聽到接近同步的語音輸出，", "極大地提升了交互體驗，",
            "減少了用戶等待時間。", "適用于調用大規模", "語言模型（LLM），以",
            "流式輸入文本的方式", "進行語音合成的場景。"};
    /**
     * 將your-dashscope-api-key替換成您自己的API-KEY
     */
    private static String apikey = "your-dashscope-api-key";
    private static String model = "cosyvoice-v1";
    private static String voice = "longxiaochun";

    public static void StreamAuidoDataToSpeaker() {
        CountDownLatch latch = new CountDownLatch(1);

        // 配置回調函數
        ResultCallback<SpeechSynthesisResult> callback =
                new ResultCallback<SpeechSynthesisResult>() {
                    @Override
                    public void onEvent(SpeechSynthesisResult result) {
                        System.out.println("收到消息: " + result);
                        if (result.getAudioFrame() != null) {
                            // TODO: 處理音頻
                            System.out.println("收到音頻");
                        }
                    }

                    @Override
                    public void onComplete() {
                        System.out.println("收到Complete");
                        latch.countDown();
                    }

                    @Override
                    public void onError(Exception e) {
                        System.out.println("收到錯誤: " + e.toString());
                        latch.countDown();
                    }
                };

        SpeechSynthesisParam param =
                SpeechSynthesisParam.builder()
                        .apiKey(apikey)
                        .model(model)
                        .voice(voice)          
                        .format(SpeechSynthesisAudioFormat
                                .PCM_22050HZ_MONO_16BIT) // 流式合成使用PCM或者MP3
                        .build();
        SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, callback);
        // 帶Callback的call方法將不會阻塞當前線程
        // 帶Callback的call方法將不會阻塞當前線程
        for (String text : textArray) {
            synthesizer.streamingCall(text);
        }
        synthesizer.streamingComplete();
        System.out.print("requestId: " + synthesizer.getLastRequestId());
        // 等待合成完成
        try {
            latch.await();
            // 等待播放線程全部播放完
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) {
        StreamAuidoDataToSpeaker();
        System.exit(0);
    }
}

接口詳情

發送文本

Python

"""
Streaming input mode: You can call the stream_call function multiple times to send text. A session will be created on the first call.
The session ends after calling streaming_complete.
Parameters:
-----------
text: str
    utf-8 encoded text
"""
def streaming_call(self, String text):

Java

/**
 * Streaming input mode: You can call the stream_call function multiple times to send text. A session will be created on the first call.
 * The session ends after calling streaming_complete.
 * @param text utf-8 encoded text
 */
public void streamingCall(String text)

同步結束任務流

Python

"""
Synchronously stop the streaming input speech synthesis task. Wait for all remaining synthesized audio before returning

Parameters:
-----------
complete_timeout_millis: int
    Throws TimeoutError exception if it times out.
"""
def streaming_complete(self, complete_timeout_millis=10000):

Java

/**
 * Synchronously stop the streaming input speech synthesis task. Wait for all remaining synthesized audio before returning
 * If it does not complete within 10 seconds, a timeout occurs and a TimeoutError exception is thrown.
 */
public void streamingComplete()

/**
 * Synchronously stop the streaming input speech synthesis task. Wait for all remaining synthesized audio before returning
 * @param completeTimeoutMillis The timeout period for await. Throws TimeoutError exception if it times out.
 */
public void streamingComplete(long completeTimeoutMillis)

異步結束任務流

Python

"""
Asynchronously stop the streaming input speech synthesis task, returns immediately.
You need to listen and handle the STREAM_INPUT_TTS_EVENT_SYNTHESIS_COMPLETE event in the on_event callback.
Do not destroy the object and callback before this event.
"""
def async_streaming_complete(self):

Java

/**
 * Asynchronously stop the streaming input speech synthesis task, returns immediately.
 * You need to listen and handle the STREAM_INPUT_TTS_EVENT_SYNTHESIS_COMPLETE event in the on_event callback.
 * Do not destroy the object and callback before this event.
 */
public void asyncStreamingComplete()

取消當前任務

Python

"""
Immediately terminate the streaming input speech synthesis task and discard any remaining audio that is not yet delivered.
"""
def streaming_cancel(self):

Java

/**
 * Immediately terminate the streaming input speech synthesis task and discard any remaining audio that is not yet delivered.
 */
public void streamingCancel()

通過Flowable的調用

Java SDK還額外提供了通過Flowable流式調用的方式進行語音合成。在Flowable對象onComplete( )后，可以通過Synthesis對象的getAudioData( )獲取完整結果。

非流式輸入調用示例

以下示例展示了通過Flowable對象的blockingForEach接口，阻塞式的獲取每次流式返回的SpeechSynthesisResult類型數據msg。

package com.alibaba.dashscope;

import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.utils.Constants;

public class StreamInputTtsPlayableDemo {
  /**
   * 將your-dashscope-api-key替換成您自己的API-KEY
   */
  private static String apikey = "your-dashscope-api-key";
  private static String model = "cosyvoice-v1";
  private static String voice = "longxiaochun";

  public static void StreamAuidoDataToSpeaker() throws NoApiKeyException {
    SpeechSynthesisParam param =
        SpeechSynthesisParam.builder()
            .apiKey(apikey)
            .model(model)
            .voice(voice)
            .build();
    SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, null);
    synthesizer.callAsFlowable("今天天氣怎么樣?").blockingForEach(result -> {
      System.out.println("收到消息: " + result);
      if (result.getAudioFrame() != null) {
        // TODO: 處理音頻
        System.out.println("收到音頻");
      }
    });
  }

  public static void main(String[] args) throws NoApiKeyException {
    StreamAuidoDataToSpeaker();
    System.exit(0);
  }
}

接口詳情

/**
 * Stream output speech synthesis using Flowable features (non-streaming input)
 * @param text Text to be synthesized
 * @return The output event stream, including real-time audio
 * @throws ApiException
 * @throws NoApiKeyException
 */
public Flowable<SpeechSynthesisResult> callAsFlowable(String text)
        throws ApiException, NoApiKeyException

流式輸入調用示例

以下示例展示了通過Flowable對象作為輸入參數，輸入文本流。并通過Flowable對象作為返回值，利用的blockingForEach接口，阻塞式地獲取每次流式返回的SpeechSynthesisResult類型數據msg。

package com.alibaba.dashscope;

import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.exception.NoApiKeyException;
import io.reactivex.BackpressureStrategy;
import io.reactivex.Flowable;

public class StreamInputTtsPlayableDemo {
  private static String[] textArray = {"流式文本語音合成SDK，",
      "可以將輸入的文本", "合成為語音二進制數據，", "相比于非流式語音合成，",
      "流式合成的優勢在于實時性", "更強。用戶在輸入文本的同時",
      "可以聽到接近同步的語音輸出，", "極大地提升了交互體驗，",
      "減少了用戶等待時間。", "適用于調用大規模", "語言模型（LLM），以",
      "流式輸入文本的方式", "進行語音合成的場景。"};
  /**
   * 將your-dashscope-api-key替換成您自己的API-KEY
   */
  private static String apikey = "your-daskscope-api-key";
  private static String model = "cosyvoice-v1";
  private static String voice = "longxiaochun";

  public static void StreamAuidoDataToSpeaker() throws NoApiKeyException {
    // 模擬流式輸入
    Flowable<String> textSource = Flowable.create(emitter -> {
      new Thread(() -> {
        for (int i = 0; i < textArray.length; i++) {
          emitter.onNext(textArray[i]);
          try {
            Thread.sleep(1000);
          } catch (InterruptedException e) {
            throw new RuntimeException(e);
          }
        }
        emitter.onComplete();
      }).start();
    }, BackpressureStrategy.BUFFER);

    SpeechSynthesisParam param =
        SpeechSynthesisParam.builder()
            .apiKey(apikey)
            .model(model)
            .voice(voice)      
            .build();
    SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, null);
    synthesizer.streamingCallAsFlowable(textSource).blockingForEach(result -> {
      if (result.getAudioFrame() != null) {
        // TODO: 將音頻片段發送給播放器
        System.out.println(
            "audio result length: " + result.getAudioFrame().capacity());
      }
    });
  }

  public static void main(String[] args) throws NoApiKeyException {
    StreamAuidoDataToSpeaker();
    System.exit(0);
  }
}

返回結果說明

該接口主要通過返回的Flowable<SpeechSynthesisResult>來獲取流式結果，也可以在Flowable的所有流式數據返回完成后通過對應SpeechSynthesizer對象的getAudioData來獲取完整的合成結果。關于Flowable的使用，請參見rxjava API。

接口詳情

/**
 * Stream input and output speech synthesis using Flowable features
 * @param textStream The text stream to be synthesized
 * @return The output event stream, including real-time audio
 * @throws ApiException
 * @throws NoApiKeyException
 */
public Flowable<SpeechSynthesisResult> streamingCallAsFlowable(
    Flowable<String> textStream)

日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

API詳情

前提條件

同步調用

請求示例

請求參數說明

返回結果說明

接口詳情

異步調用

調用示例

請求參數說明

返回結果說明

流式輸入調用

調用示例

接口詳情

通過Flowable的調用

非流式輸入調用示例

接口詳情

流式輸入調用示例

返回結果說明

接口詳情