Language Detection in Google Cloud function always returns 'en-IN' even if I specify alternate languages

Question

Language Detection in Google Cloud function always returns 'en-IN' even if I specify alternate languages

155 views Asked by Akshay Kumar At 27 September 2023 at 11:35

I'm trying to detect the language of a given audio using Google's SpeechClient. But even if the audio is in Hindi or Tamil, Google's language detection always returns only 'en-IN' which is not ideal.

Any help would be appreciated.

Below is the code:

 const { SpeechClient } = require('@google-cloud/speech').v1p1beta1;;
 const speechClient = new SpeechClient();

 // Get the audio file from Google Cloud Storage.
  const audioFile = `gs://${event.bucket}/${event.name}`;

  const config = {
    encoding: 'LINEAR16',
    languageCode: 'en-IN',
    alternativeLanguageCodes: ['hi-IN', 'ta-IN'],
    enableAutomaticPunctuation: true,
    model: 'default'
  };

  const request = {
    config,
    audio: {
      uri: audioFile,
    },
  };

  const [response] = await speechClient.recognize(request);
  const detectedLanguage = response.results[0].languageCode;

I tried to upload audios in Hindi and Tamil. I expected the detectedLanguage to return 'hi-IN' or 'ta-IN' according to the audio language but I always get only 'en-IN' which is not correct.

Original Q&A

There are 1 answers

**Ravishankar Choudhary** · Answer 1 · 2023-09-27T11:52:40+00:00

The problem with your code is that you are setting the languageCode field to en-IN. This tells Google to use English as the primary language for transcription, and to only consider the alternative languages if the audio is not in English.

To fix this, you need to remove the languageCode field from your configuration object. This will tell Google to automatically detect the language of the audio, and to use the alternative languages if the audio is not in one of the most popular languages.

Here is the updated code:

const { SpeechClient } = require('@google-cloud/speech').v1p1beta1;;
const speechClient = new SpeechClient();

// Get the audio file from Google Cloud Storage.
const audioFile = `gs://${event.bucket}/${event.name}`;

const config = {
  encoding: 'LINEAR16',
  alternativeLanguageCodes: ['hi-IN', 'ta-IN'],
  enableAutomaticPunctuation: true,
  model: 'default'
};

const request = {
  config,
  audio: {
    uri: audioFile,
  },
};

const [response] = await speechClient.recognize(request);
const detectedLanguage = response.results[0].languageCode;

With this updated code, Google should be able to correctly detect the language of your Hindi and Tamil audio files.

Here are some additional tips for improving the accuracy of language detection:

Use high-quality audio files. The better the quality of the audio, the easier it will be for Google to detect the language. Avoid using noisy or cluttered audio backgrounds. Background noise can make it difficult for Google to accurately detect the language. Speak clearly and at a moderate pace. This will help Google to better understand the words that you are saying. If you are speaking multiple languages in the same audio file, try to separate them by pauses. This will help Google to switch between languages more easily.

TechQA.

Language Detection in Google Cloud function always returns 'en-IN' even if I specify alternate languages

There are 1 answers

Related Questions in JAVASCRIPT

Related Questions in NODE.JS

Related Questions in GOOGLE-CLOUD-FUNCTIONS

Related Questions in LANGUAGE-DETECTION

Popular Questions

Trending Questions