Audio Guidelines
When sending audio to the Voysis Query APIs, the format and quality of the audio is important. The Query API currently only supports PCM data and must conform to the following restrictions:
Format | Raw PCM, WAV containing Raw PCM or webm container containing raw PCM. |
Sample Rate | 48,000Hz, 44,100Hz or 16,000Hz. |
Encoding | Signed integer or floating point. |
Bits Per Sample | 16 (for signed integer) or 32 (for floating point) |
Endian: | Little- or Big-Endian (though little endian is strongly recommended) |
Channels | 1 |
Audio that does not match these restrictions will not be processed correctly and may be rejected.
MIME Type
When creating a query via the Query API, the client is required to specify the MIME type of the audio that will be streamed so that it can be decoded correctly.
- For the REST API the mime type should be sent in the
Content-Type
header. - For the WebSocket API the mime type should be sent as
mimeType
in theaudioQuery
object.
The query API supports three MIME types.
audio/pcm
To stream raw samples to the Query API without any form of container, use the audio/pcm
MIME type. The parameters of the MIME type must indicate the low-level format of the audio.
rate | the sample rate of the audio in Hz |
bits | the number of bits per sample. Must be either 16 (for signed integer encoded PCM) or 32 (for floating point encoded PCM) |
encoding | the encoding of the PCM samples, either signed-int or float |
channels | The number of channels. Only single channel (mono) audio is currently supported. |
big-endian | true to indicate the audio is encoded in big endian format, false if the audio is little endian. Note that big endian encoded audio is not supported in all circumstances and therefore Voysis recommends always delivering little endian coded audio. |
Any of these parameters can be omitted from the MIME type, in which case they will take on their default value. The defaults, if not otherwise overridden are:
rate | 16000 |
bits | 16 |
encoding | signed-int |
channels | 1 |
big-endian | false |
Some examples of the raw audio/pcm
MIME type:
# Default to 16KHz, 16-bit signed-int, 1 channel, little-endian
Content-Type: audio/pcm
# Override the sample rate to 48KHz
Content-Type: audio/pcm;rate=48000
# Specify that the audio is 32-bit floating point PCM
Content-Type: audio/pcm;rate=48000;encoding=float;bits=32
audio/wav
Audio may be streamed as a valid wav file. If the MIME type is set to audio/wav
then all audio parameters (rate, bits per sample, encoding, channels and endianess) will be decoded from the wav header.
audio/webm
Audio may also be streamed up as a webm container. However, decoding opus or vorbis streams is not supported, only raw PCM data embedded within the webm container is supported. While such a format is not officially supported by the webm specification, it is possible to create audio in this format from the Chrome/Chromium browser using the WebRTC APIs. Voysis' JavaScript client library uses this when executing in a Chrome-based browser.
Updated over 5 years ago