How to convert voice to text in Javascript

By FoxLearn 2/19/2025 8:31:29 AM   38
To convert voice to text in JavaScript, you can use the Web Speech API's webkitSpeechRecognition interface, which is built into most modern browsers, primarily Chrome.

The Web Speech API, which was introduced in late 2012, enables web developers to integrate speech input and text-to-speech output capabilities within a browser. These features are typically not available through traditional speech recognition or screen reader software.

One of the key advantages of this API is that it ensures user privacy, before any website can access the user's microphone, explicit permission is required.

How to use the WebkitSpeechRecognition API to convert voice input to text?

You can use the interim_transcript and final_transcript to display the recognized text in real-time:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Voice to Text</title>
</head>
<body>
  <h1>Voice to Text Converter</h1>
  <button onclick="startRecognition()">Start Recognition</button>
  <button onclick="stopRecognition()">Stop Recognition</button>
  <div id="output">
    <p><strong>Recognized Text:</strong></p>
    <p id="transcript"></p>
  </div>

  <script>
    // Initialize the speech recognition object
    const recognition = new webkitSpeechRecognition();
    recognition.continuous = true; // Keep recognizing until stopped
    recognition.interimResults = true; // Show real-time results
    recognition.lang = "en-US"; // Set the language (e.g., "en-US" for English)

    // When speech is recognized
    recognition.onresult = function(event) {
      let finalTranscript = '';
      let interimTranscript = '';
      
      for (let i = event.resultIndex; i < event.results.length; ++i) {
        if (event.results[i].isFinal) {
          finalTranscript += event.results[i][0].transcript;
        } else {
          interimTranscript += event.results[i][0].transcript;
        }
      }

      // Show the recognized text in real-time
      document.getElementById('transcript').innerText = interimTranscript + finalTranscript;
    };

    // Start recognition
    function startRecognition() {
      recognition.start();
    }

    // Stop recognition
    function stopRecognition() {
      recognition.stop();
    }
  </script>
</body>
</html>

In this example:

  • WebkitSpeechRecognition is the interface used for speech recognition.
  • recognition.continuous is set to true, meaning the microphone will continuously listen to the user until stopped.
  • recognition.interimResults is set to true, meaning the browser will show text as it's being recognized (before the user finishes speaking).
  • recognition.lang specifies the language for speech recognition. You can set it to different languages (e.g., "en-US" for English or "es-ES" for Spanish).
  • recognition.onresult is the event handler where the recognized speech is processed and displayed. It separates the final text (complete phrases) from the interim text (in-progress text).

Artyom.js is a powerful wrapper library for the WebkitSpeechRecognition API that simplifies its usage. It allows you to create advanced features like voice commands, speech synthesis, and more. We’ll focus on the artyom.newDictation function, which streamlines the recognition process.

How to integrate Artyom into your project?

Html

<!DOCTYPE html>
<html>
  <head>
    <title>Dictation Example</title>
    <script type="text/javascript" src="path/to/artyom.min.js"></script>
  </head>
  <body>
    <input type="button" onclick="startRecognition();" value="Start Recognition" />
    <input type="button" onclick="stopRecognition();" value="Stop Recognition" />
    <script>
      // JavaScript code will go here
    </script>
  </body>
</html>

JavaScript

var settings = {
    continuous: true, // Keep going without interruption (requires HTTPS)
    onResult: function(text) {
        // 'text' contains the recognized speech
        console.log(text);
    },
    onStart: function() {
        console.log("Dictation started by the user");
    },
    onEnd: function() {
        alert("Dictation stopped by the user");
    }
};

var UserDictation = artyom.newDictation(settings);

function startRecognition() {
  UserDictation.start();
}

function stopRecognition() {
  UserDictation.stop();
}

Once the Artyom library is linked to your project, you can easily handle speech recognition. The real magic happens when the onResult callback is triggered, delivering the recognized text.

Although Artyom makes integration easier, it’s recommended for beginners to first experiment with the plain WebkitSpeechRecognition API to gain a deeper understanding of how it works.

The WebkitSpeechRecognition API offers great potential, but it’s unfortunate that it’s currently only supported by Google Chrome. However, you can enhance the code further by detecting the user’s browser to decide when to initialize WebkitSpeechRecognition.