Interface LiveClientRealtimeInput

User input that is sent in real time.

This is different from LiveClientContent in a few ways:

  • Can be sent continuously without interruption to model generation.
  • If there is a need to mix data interleaved across the LiveClientContent and the LiveClientRealtimeInput, server attempts to optimize for best response, but there are no guarantees.
  • End of turn is not explicitly specified, but is rather derived from user activity (for example, end of speech).
  • Even before the end of turn, the data is processed incrementally to optimize for a fast start of the response from the model.
  • Is always assumed to be the user's input (cannot be used to populate conversation history).
interface LiveClientRealtimeInput {
    activityEnd?: ActivityEnd;
    activityStart?: ActivityStart;
    audio?: Blob;
    audioStreamEnd?: boolean;
    mediaChunks?: Blob[];
    text?: string;
    video?: Blob;
}

Properties

activityEnd?: ActivityEnd

Marks the end of user activity.

activityStart?: ActivityStart

Marks the start of user activity.

audio?: Blob

The realtime audio input stream.

audioStreamEnd?: boolean

Indicates that the audio stream has ended, e.g. because the microphone was turned off.

This should only be sent when automatic activity detection is enabled (which is the default).

The client can reopen the stream by sending an audio message.

mediaChunks?: Blob[]

Inlined bytes data for media input.

text?: string

The realtime text input stream.

video?: Blob

The realtime video input stream.