Skip to content

Speech client v2, my, working OK, example #3578

@sorokinvj

Description

@sorokinvj

Hey guys, I am not sure where to put it, but I just want to share my implementation of the speech client v2 and some thoughts about migrating from v1 to v2.
Official docs are unfortunately provide only python code, and there is not much info except this repo and the example I used to migrate: transcribeStreaming.v2.js

My sdk verison is:

"@google-cloud/speech": "^6.0.1",

In my code I first initialize the service as:

const service = createGoogleService({ language, send })

and then use service.transcribeAudio(data) whenever there is a new audio coming from the frontend, which uses

const mediaRecorder = new MediaRecorder(audioStream, { mimeType: 'audio/webm;codecs=opus' }) // its a default param;
mediaRecorder.ondataavailable = (event: BlobEvent) => {
... send the event.data to the backend
}

thus an audio chunk is just a browser Blob object.

My service:

import { logger } from '../..//logger';
import { getText, transformGoogleResponse } from './utils';
import { v2 as speech } from '@google-cloud/speech';
import { StreamingRecognizeResponse } from './google.types';
import { TranscriptionService } from '../transcription.types';
import { MachineEvent } from '../../websocket/websocket.types';
import { Sender } from 'xstate';
import { parseErrorMessage } from '../../../utils';
import { findRecognizerByLanguageCode } from './recognizers';

export const createGoogleService = ({
  language,
  send,
}: {
  language: string;
  send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
  return new Promise((resolve, reject) => {
    try {
      const client = new speech.SpeechClient({
        keyFilename: 'assistant-demo.json',
      });

      const recognizer = findRecognizerByLanguageCode(language).name;

      const configRequest = {
        recognizer,
        streamingConfig: {
          config: {
            autoDecodingConfig: {},
          },
          streamingFeatures: {
            enableVoiceActivityEvents: true,
            interimResults: false,
          },
        },
      };

      logger.info('Creating Google service with recogniser:', recognizer);

      const recognizeStream = client
        ._streamingRecognize()
        .on('error', error => {
          logger.error('Error on "error" in recognizeStream', error);
          send({ type: 'ERROR', data: parseErrorMessage(error) });
        })
        .on('data', (data: StreamingRecognizeResponse) => {
          if (data.speechEventType === 'SPEECH_ACTIVITY_END') {
            send({ type: 'SPEECH_END', data: 'SPEECH_END' });
          }
          if (data.results.length > 0) {
            const transcription = transformGoogleResponse(data);
            if (transcription) {
              const transcriptionText = getText(transcription);
              if (!transcriptionText?.length) {
                // if the transcription is empty, do nothing
                return;
              }
              send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
            }
          }
        })
        .on('end', () => {
          logger.warn('Google recognizeStream ended');
        });

      let configSent = false;

      const transcribeAudio = (audioData: Buffer) => {
        if (!configSent) {
          recognizeStream.write(configRequest);
          configSent = true;
        }
        recognizeStream.write({ audio: audioData });
      };

      const stop = () => {
        if (recognizeStream) {
          recognizeStream.end();
        }
      };

      resolve({ stop, transcribeAudio });
    } catch (error) {
      logger.error('Error creating Google service:', error);
      reject(error);
    }
  });
};

Migration considerations

  1. to use v2 you need to create a recognizer, I did it with this function:
/**
 * Creates a new recognizer.
 *
 * @param {string} projectId - The ID of the Google Cloud project.
 * @param {string} location - The location for the recognizer.
 * @param {string} recognizerId - The ID for the new recognizer.
 * @param {string} languageCode - The language code for the recognizer.
 * @returns {Promise<object>} The created recognizer.
 * @throws Will throw an error if the recognizer creation fails.
 */
export const createRecognizer = async (
  projectId: string,
  location: string,
  recognizerId: string,
  languageCode: string
) => {
  const client = new v2.SpeechClient({
    keyFilename: 'assistant-demo.json',
  });

  const request = {
    parent: `projects/${projectId}/locations/${location}`,
    recognizer: {
      languageCodes: [languageCode],
      model: 'latest_long',
      // Add any additional configuration here
    },
    recognizerId,
  };

  try {
    console.log('Creating recognizer...', request);
    const [operation] = await client.createRecognizer(request);
    const [recognizer] = await operation.promise();
    return recognizer;
  } catch (error) {
    console.error('Failed to create recognizer:', error);
    throw error;
  }
};
  1. The config object now should be sent as first data to the stream object, immediately before the audio, so if you did recognizingClient.write(audioData) before, now you should do (but only once!)recognizingClient.write(newConfigWithRecognizer) and then recognizingClient.write({ audio: audioData }) <<< notice the object notation
  2. The config object itself has been changed to:
public streamingConfig?: (google.cloud.speech.v2.IStreamingRecognitionConfig|null);

/** Properties of a StreamingRecognitionConfig. */
interface IStreamingRecognitionConfig {

** StreamingRecognitionConfig config */
config?: (google.cloud.speech.v2.IRecognitionConfig|null);

/** StreamingRecognitionConfig configMask */
configMask?: (google.protobuf.IFieldMask|null);

/** StreamingRecognitionConfig streamingFeatures */
streamingFeatures?: (google.cloud.speech.v2.IStreamingRecognitionFeatures|null);
}
  1. When instantiating streamingClient use _streamingRecognize() (this probably is likely to be changed)

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority: p3Desirable enhancement or fix. May not be included in next release.samplesIssues that are directly related to samples.triage meI really want to be triaged.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions