How to add voice commands to webpage in Javascript

By FoxLearn 2/19/2025 8:27:13 AM   55
To add voice commands to your webpage using JavaScript, you can use the Web Speech API, specifically the SpeechRecognition interface.

As a developer, you're now able to create a website with voice commands that suit your needs. The HTML5 Speech Recognition API allows JavaScript to access a browser's audio stream and convert it into text. With Artyom.js, a library for managing voice commands, this task becomes simple.

Note: WebkitSpeechRecognition is currently only available in Google Chrome. While we hope it will eventually become a standard for all browsers, for now, you can only try Artyom in Chrome.

Basic Setup Using SpeechRecognition

To start, add Artyom.js to your document within the <head> tag. You can get the library from the official GitHub repository:

<!DOCTYPE html>
<html>
  <head>
    <title>Cooking with Artyom.js</title>
    <!-- Important: Load Artyom in the head tag for voice resources to load properly -->
    <script type="text/javascript" src="path/to/artyom.min.js"></script>
    <script>
         // Create a globally accessible instance of Artyom
         window.artyom = new Artyom();
    </script>
  </head>
  <body>
    <script>
      // Artyom is now available!
    </script>
  </body>
</html>

It's important to read the documentation to understand how commands work. Artyom lets you add both simple and "smart" commands.

Normal commands: Triggered by recognized speech that matches any word in the indexes array.

artyom.addCommands({
  indexes: ["Hello", "Hey", "Hurra"],
  action: function(i) {
    // i = index of the matched word
    console.log("Something matches!");
  }
});

Smart commands: Allow you to capture parts of the spoken text, such as a variable name, for more dynamic functionality.

artyom.addCommands({
  smart: true,  // Mark this command as "smart"
  indexes: ["How many people live in *"], // '*' represents dynamic spoken text
  action: function(i, wildcard) {
    switch(wildcard) {
      case "Berlin":
        alert("Why should I know something like this?");
        break;
      case "Paris":
        alert("I don't know.");
        break;
      default:
        alert("I don't know the city " + wildcard + ". Add more cases!");
        break;
    }
  }
});

You can use artyom.simulateInstruction() to test how the voice command will behave when triggered. This allows you to verify your commands without speaking.

artyom.simulateInstruction("How many people live in Paris");
// Alert: "I don't know."

To start Artyom, use the initialize function. Here are the basic settings you'll need to configure:

  • lang: Language code for the supported Artyom language (see the documentation for available languages).
  • continuous: Set to true for HTTPS connections to allow continuous listening, otherwise set to false for one-time listening.
  • listen: Set to true to enable Artyom's listening mode.
  • debug: Set to true to log recognized speech and other information in the console.
artyom.initialize({
   lang: "en-GB", // Language code (English - Great Britain)
   continuous: false, // Use continuous mode if you have HTTPS
   debug: true, // Show debug info in the console
   listen: true // Start listening for commands
});

Once initialized, Artyom will be ready to process voice commands.

If you want to stop Artyom, use the fatality function. This halts the Artyom instance immediately.

artyom.fatality();

Here's a basic structure of an HTML file to set up voice recognition:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Voice Commands</title>
</head>
<body>
  <h1>Voice Command Example</h1>
  <button id="start">Start Voice Command</button>

  <script>
    // Check for browser support
    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

    if (SpeechRecognition) {
      const recognition = new SpeechRecognition();
      recognition.lang = 'en-US';
      recognition.interimResults = true;  // Show intermediate results
      recognition.maxAlternatives = 1;    // Limit the recognition alternatives

      document.getElementById("start").onclick = function () {
        recognition.start();  // Start voice recognition
      };

      recognition.onstart = function () {
        console.log('Voice recognition started');
      };

      recognition.onresult = function (event) {
        const transcript = event.results[0][0].transcript;
        console.log("You said: ", transcript);

        // Here, you can check for specific voice commands and trigger actions
        if (transcript.toLowerCase().includes("hello")) {
          alert("Hello there!");
        } else if (transcript.toLowerCase().includes("goodbye")) {
          alert("Goodbye!");
        }
      };

      recognition.onerror = function (event) {
        console.error("Error occurred in recognition: ", event.error);
      };
    } else {
      console.log("Speech Recognition not supported in this browser.");
    }
  </script>
</body>
</html>

To make the voice command system more flexible, you can add more conditions inside the onresult handler.

For example, you can check if the recognized text matches specific commands and trigger different actions, like controlling a light, playing music, or even navigating to different parts of your webpage.

recognition.onresult = function (event) {
  const transcript = event.results[0][0].transcript;
  console.log("You said: ", transcript);

  if (transcript.toLowerCase().includes("hello")) {
    alert("Hello, how can I assist you?");
  } else if (transcript.toLowerCase().includes("play music")) {
    alert("Playing music now!");
  } else if (transcript.toLowerCase().includes("open google")) {
    window.location.href = 'https://www.google.com';
  }
};

For a more advanced solution, you can use Artyom.js, a library that simplifies voice command handling and adds additional features like voice output (text-to-speech) and more intelligent command parsing.