General Concepts

This page describes items and concepts that are common across all the Voysis query APIs.

Queries

Queries are the entities that are used to organise interactions with a user.

What is a Query?

A query represents an individual interaction your User has with the Voysis AI. Each time they search for something, it will be a new query. Queries may be of different types (audio or text)

Properties

  • locale: The locale of the query. It is the IEFT Language Tag (for example, en-US)
  • userId: A value that uniquely identifies a specific user. This value must be supplied by the client and should remain unique across different sessions and, where possible, app reinstalls or browser data clearances. This allows Voysis to store audio and retrain the AI so that results improve over time. It is also used to meet data protection requirements.
    • The userId should not contain personally identifiable information. A good example of a userId is the numeric primary key of a user record. A bad example is an e-mail address (as this is personally identifiable). However, a one-way hash off an e-mail address would be acceptable.
    • The Authorization system contains a similar field called uid. Note that these fields are not directly linked. You may use the same value for both of these fields, but it is not a requirement.
  • queryType: This specifies what type of query is being performed. Currently audio and text are the supported values.

Audio Profile Identifier

An audio profile id is a way of uniquely identifying a device and user. This value is used to create user/device specific acoustic model profiles which over time is used to improve the accuracy of the system as the user interacts with the service.

This value must be generated by the client and ideally used for the lifetime of that client. It should be a 128-bit UUID.

Voice Activity Detection (VAD)

Voice Activity Detection is a feature of the Voysis AI platform that detects when a user starts and stops speaking.

By default this feature is active when using the WebSocket API. After a user has initiated a new query, typically by manually interacting with a client app, the Voysis system will automatically detect when they have finished speaking, removing the need for the user to have a second manual interaction to indicate they are finished speaking. The Voysis AI will notify the client app that VAD has activated so that it can stop streaming further audio data, and all received audio will be processed.

VAD can be disabled on a per-query basis when using the WebSocket API (see X-Voysis-Ignore-Vad). For technical reasons, VAD is unavailable when using the REST API and a second user interaction is required at the client to indicate that they have finished speaking.

Wakeword

Wakeword is a piece of functionality whereby a user can initiate an interaction with a device using their voice rather than a touch-based interaction such as tapping a button. The user must first say a unique "wakeword" phrase which the device recognises as its instruction to begin an interaction, typically providing some kind of visual or aural feedback to indicate it is ready.