Audio Guidelines

When sending audio to the Voysis Query APIs, the format and quality of the audio is important. The Query API currently only supports PCM data and must conform to the following restrictions:

FormatRaw PCM, WAV containing Raw PCM or webm container containing raw PCM.
Sample Rate48,000Hz, 44,100Hz or 16,000Hz.
EncodingSigned integer or floating point.
Bits Per Sample16 (for signed integer) or 32 (for floating point)
Endian:Little- or Big-Endian (though little endian is strongly recommended)
Channels1

Audio that does not match these restrictions will not be processed correctly and may be rejected.

MIME Type

When creating a query via the Query API, the client is required to specify the MIME type of the audio that will be streamed so that it can be decoded correctly.

  • For the REST API the mime type should be sent in the Content-Type header.
  • For the WebSocket API the mime type should be sent as mimeType in the audioQuery object.

The query API supports three MIME types.

audio/pcm

To stream raw samples to the Query API without any form of container, use the audio/pcm MIME type. The parameters of the MIME type must indicate the low-level format of the audio.

ratethe sample rate of the audio in Hz
bitsthe number of bits per sample. Must be either 16 (for signed integer encoded PCM) or 32 (for floating point encoded PCM)
encodingthe encoding of the PCM samples, either signed-int or float
channelsThe number of channels. Only single channel (mono) audio is currently supported.
big-endiantrue to indicate the audio is encoded in big endian format, false if the audio is little endian. Note that big endian encoded audio is not supported in all circumstances and therefore Voysis recommends always delivering little endian coded audio.

Any of these parameters can be omitted from the MIME type, in which case they will take on their default value. The defaults, if not otherwise overridden are:

rate16000
bits16
encodingsigned-int
channels1
big-endianfalse

Some examples of the raw audio/pcm MIME type:

# Default to 16KHz, 16-bit signed-int, 1 channel, little-endian
Content-Type: audio/pcm

# Override the sample rate to 48KHz
Content-Type: audio/pcm;rate=48000

# Specify that the audio is 32-bit floating point PCM
Content-Type: audio/pcm;rate=48000;encoding=float;bits=32

audio/wav

Audio may be streamed as a valid wav file. If the MIME type is set to audio/wav then all audio parameters (rate, bits per sample, encoding, channels and endianess) will be decoded from the wav header.

audio/webm

Audio may also be streamed up as a webm container. However, decoding opus or vorbis streams is not supported, only raw PCM data embedded within the webm container is supported. While such a format is not officially supported by the webm specification, it is possible to create audio in this format from the Chrome/Chromium browser using the WebRTC APIs. Voysis' JavaScript client library uses this when executing in a Chrome-based browser.