How to convert voice to text in Javascript
By FoxLearn 2/19/2025 8:31:29 AM 38
The Web Speech API, which was introduced in late 2012, enables web developers to integrate speech input and text-to-speech output capabilities within a browser. These features are typically not available through traditional speech recognition or screen reader software.
One of the key advantages of this API is that it ensures user privacy, before any website can access the user's microphone, explicit permission is required.
How to use the WebkitSpeechRecognition
API to convert voice input to text?
You can use the interim_transcript
and final_transcript
to display the recognized text in real-time:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Voice to Text</title> </head> <body> <h1>Voice to Text Converter</h1> <button onclick="startRecognition()">Start Recognition</button> <button onclick="stopRecognition()">Stop Recognition</button> <div id="output"> <p><strong>Recognized Text:</strong></p> <p id="transcript"></p> </div> <script> // Initialize the speech recognition object const recognition = new webkitSpeechRecognition(); recognition.continuous = true; // Keep recognizing until stopped recognition.interimResults = true; // Show real-time results recognition.lang = "en-US"; // Set the language (e.g., "en-US" for English) // When speech is recognized recognition.onresult = function(event) { let finalTranscript = ''; let interimTranscript = ''; for (let i = event.resultIndex; i < event.results.length; ++i) { if (event.results[i].isFinal) { finalTranscript += event.results[i][0].transcript; } else { interimTranscript += event.results[i][0].transcript; } } // Show the recognized text in real-time document.getElementById('transcript').innerText = interimTranscript + finalTranscript; }; // Start recognition function startRecognition() { recognition.start(); } // Stop recognition function stopRecognition() { recognition.stop(); } </script> </body> </html>
In this example:
- WebkitSpeechRecognition is the interface used for speech recognition.
- recognition.continuous is set to
true
, meaning the microphone will continuously listen to the user until stopped. - recognition.interimResults is set to
true
, meaning the browser will show text as it's being recognized (before the user finishes speaking). - recognition.lang specifies the language for speech recognition. You can set it to different languages (e.g.,
"en-US"
for English or"es-ES"
for Spanish). - recognition.onresult is the event handler where the recognized speech is processed and displayed. It separates the final text (complete phrases) from the interim text (in-progress text).
Artyom.js
is a powerful wrapper library for the WebkitSpeechRecognition API that simplifies its usage. It allows you to create advanced features like voice commands, speech synthesis, and more. We’ll focus on the artyom.newDictation
function, which streamlines the recognition process.
How to integrate Artyom into your project?
Html
<!DOCTYPE html> <html> <head> <title>Dictation Example</title> <script type="text/javascript" src="path/to/artyom.min.js"></script> </head> <body> <input type="button" onclick="startRecognition();" value="Start Recognition" /> <input type="button" onclick="stopRecognition();" value="Stop Recognition" /> <script> // JavaScript code will go here </script> </body> </html>
JavaScript
var settings = { continuous: true, // Keep going without interruption (requires HTTPS) onResult: function(text) { // 'text' contains the recognized speech console.log(text); }, onStart: function() { console.log("Dictation started by the user"); }, onEnd: function() { alert("Dictation stopped by the user"); } }; var UserDictation = artyom.newDictation(settings); function startRecognition() { UserDictation.start(); } function stopRecognition() { UserDictation.stop(); }
Once the Artyom library is linked to your project, you can easily handle speech recognition. The real magic happens when the onResult
callback is triggered, delivering the recognized text.
Although Artyom makes integration easier, it’s recommended for beginners to first experiment with the plain WebkitSpeechRecognition API to gain a deeper understanding of how it works.
The WebkitSpeechRecognition API offers great potential, but it’s unfortunate that it’s currently only supported by Google Chrome. However, you can enhance the code further by detecting the user’s browser to decide when to initialize WebkitSpeechRecognition.
- LET vs VAR in JavaScript Variable Declarations
- How to add voice commands to webpage in Javascript
- How to capture an image in javascript
- How to Build Your Own JavaScript Library
- How to reverse a string properly in Javascript
- How to bypass 'Access-Control-Allow-Origin' error with XMLHttpRequest
- What is Hoisting in JavaScript
- How to get the client IP address in Javascript