There is a very high chance that you have interacted with apps that provide some form of voice experience. It could be an app with text-to-speech functionality, like reading your text messages or notifications aloud. It could also be an app with voice recognition functionality like Siri or Google Assistant.
With the advent of HTML5, there has been a very fast growth in the number of API available on the web platform. There are a couple of API known as the Web Speech API that have been developed to make it possible to seamlessly build varying kinds of voice applications and experiences for the web. These API are still pretty experimental, although there is increasing support for most of them across all the modern browsers.
In this article, you will build an application that retrieves a random quotation, displays the quotation, and offers the user the ability to use text-to-speech for the browser to read the quotation aloud.
To complete this tutorial, you will need:
This tutorial was verified with Node v14.4.0, npm
v6.14.5, axios
v0.19.2, cors
v2.8.5, express
v4.17.1, and jQuery v3.5.1.
The Web Speech API has two major interfaces:
SpeechSynthesis - For text-to-speech applications. This allows apps to read out their text content using the device’s speech synthesizer. The available voice types are represented by a SpeechSynthesisVoice
object, while the text to be uttered is represented by a SpeechSynthesisUtterance
object. See the support table for the SpeechSynthesis
interface to learn more about browser support.
SpeechRecognition - For applications that require asynchronous voice recognition. This allows apps to recognize voice context from an audio input. A SpeechRecognition
object can be created using the constructor. The SpeechGrammar
interface exists for representing the set of grammar that the app should recognize. See the support table for the SpeechRecognition
interface to learn more about browser support.
This tutorial will focus on SpeechSynthesis
.
Getting a reference to a SpeechSynthesis
object can be accomplished with a single line of code:
var synthesis = window.speechSynthesis;
The following code snippet shows how to check for browser support:
if ('speechSynthesis' in window) {
var synthesis = window.speechSynthesis;
} else {
console.log('Text-to-speech not supported.');
}
It is very useful to check if SpeechSynthesis
is supported by the browser before using the functionality it provides.
In this step, you will build on your already existing code to get the available speech voices. The getVoices()
method returns a list of SpeechSynthesisVoice
objects representing all the available voices on the device.
Take a look at the following code snippet:
if ('speechSynthesis' in window) {
var synthesis = window.speechSynthesis;
// Regex to match all English language tags e.g en, en-US, en-GB
var langRegex = /^en(-[a-z]{2})?$/i;
// Get the available voices and filter the list to only have English speakers
var voices = synthesis
.getVoices()
.filter((voice) => langRegex.test(voice.lang));
// Log the properties of the voices in the list
voices.forEach(function (voice) {
console.log({
name: voice.name,
lang: voice.lang,
uri: voice.voiceURI,
local: voice.localService,
default: voice.default,
});
});
} else {
console.log('Text-to-speech not supported.');
}
In this section of code, you get the list of available voices on the device and filter the list using the langRegex
regular expression to ensure that we get voices for only English speakers. Finally, you loop through the voices in the list and log the properties of each to the console.
In this step, you will construct speech utterances by using the SpeechSynthesisUtterance
constructor and setting values for the available properties.
The following code snippet creates a speech utterance for reading the text "Hello World"
:
if ('speechSynthesis' in window) {
var synthesis = window.speechSynthesis;
// Get the first `en` language voice in the list
var voice = synthesis.getVoices().filter(function (voice) {
return voice.lang === 'en';
})[0];
// Create an utterance object
var utterance = new SpeechSynthesisUtterance('Hello World');
// Set utterance properties
utterance.voice = voice;
utterance.pitch = 1.5;
utterance.rate = 1.25;
utterance.volume = 0.8;
// Speak the utterance
synthesis.speak(utterance);
} else {
console.log('Text-to-speech not supported.');
}
Here, you get the first en
language voice from the list of available voices. Next, you create a new utterance using the SpeechSynthesisUtterance
constructor. You then set some of the properties on the utterance object like voice
, pitch
, rate
, and volume
. Finally, it speaks the utterance using the speak()
method of SpeechSynthesis
.
Note: There is a limit to the size of the text that can be spoken in an utterance. The maximum length of the text that can be spoken in each utterance is 32,767 characters.
Notice that you passed the text to be uttered in the constructor.
You can also set the text to be uttered by setting the text
property of the utterance object.
Here is a simple example:
var synthesis = window.speechSynthesis;
var utterance = new SpeechSynthesisUtterance("Hello World");
// This overrides the text "Hello World" and is uttered instead
utterance.text = "My name is Glad.";
synthesis.speak(utterance);
This overrides whatever text that was passed in the constructor.
In the previous code snippet, we spoke utterances by calling the speak()
method on the SpeechSynthesis
instance. We can now pass in the SpeechSynthesisUtterance
instance as an argument to the speak()
method to speak the utterance.
var synthesis = window.speechSynthesis;
var utterance1 = new SpeechSynthesisUtterance("Hello World");
var utterance2 = new SpeechSynthesisUtterance("My name is Glad.");
var utterance3 = new SpeechSynthesisUtterance("I'm a web developer from Nigeria.");
synthesis.speak(utterance1);
synthesis.speak(utterance2);
synthesis.speak(utterance3);
There are a couple of other things you can do with the SpeechSynthesis
instance such as pause, resume, and cancel utterances. Hence the pause()
, resume()
, and cancel()
methods are available as well on the SpeechSynthesis
instance.
We have seen the basic aspects of the SpeechSynthesis
interface. We will now start building our text-to-speech application. Before we begin, ensure that you have Node and npm installed on your machine.
Run the following commands on your terminal to set up a project for the app and install the dependencies.
Create a new project directory:
- mkdir web-speech-app
Move into the newly created project directory:
- cd web-speech-app
Initialize the project:
- npm init -y
Install the dependencies needed for the project - express
, cors
, and axios
:
- npm install express cors axios
Modify the "scripts"
section of the package.json
file to look like the following snippet:
"scripts": {
"start": "node server.js"
}
Now that you have initialized a project for the application, you will proceed to set up a server for the app using Express.
Create a new server.js
file and add the following content to it:
const cors = require('cors');
const path = require('path');
const axios = require('axios');
const express = require('express');
const app = express();
const PORT = process.env.PORT || 5000;
app.set('port', PORT);
// Enable CORS (Cross-Origin Resource Sharing)
app.use(cors());
// Serve static files from the /public directory
app.use('/', express.static(path.join(__dirname, 'public')));
// A simple endpoint for fetching a random quote from QuotesOnDesign
app.get('/api/quote', (req, res) => {
axios
.get(
'https://quotesondesign.com/wp-json/wp/v2/posts/?orderby=rand'
)
.then((response) => {
const [post] = response.data;
const { title, content } = post || {};
return title && content
? res.json({ status: 'success', data: { title, content } })
: res
.status(500)
.json({ status: 'failed', message: 'Could not fetch quote.' });
})
.catch((err) =>
res
.status(500)
.json({ status: 'failed', message: 'Could not fetch quote.' })
);
});
app.listen(PORT, () => console.log(`> App server is running on port ${PORT}.`));
Here, you set up a Node server using Express. You enabled CORS (Cross-Origin Request Sharing) using the cors()
middleware. You also use the express.static()
middleware to serve static files from the /public
directory in the project root. This will enable you to serve the index page that you will create soon.
Finally, you set up a GET
/api/quote
route for fetching a random quote from the QuotesOnDesign API service. You are using axios (a promise based HTTP client library) to make the HTTP request.
Here is what a sample response from the QuotesOnDesign API looks like:
Output[
{
"title": { "rendered": "Victor Papanek" },
"content": {
"rendered": "<p>Any attempt to separate design, to make it a thing-by-itself, works counter to the inherent value of design as the primary, underlying matrix of life.</p>\n",
"protected": false
}
}
]
Note: For more information on changes to QuotesOnDesign’s API, consult their page documenting changes between 4.0 and 5.0.
When you fetch a quote successfully, the quote’s title
and content
are returned in the data
field of the JSON response. Otherwise, a failure JSON response with a 500
HTTP status code would be returned.
Next, you will create an index page for the app view.
First, create a new public
folder at the root of your project:
- mkdir public
Next, create a new index.html
file in the newly created public
folder and add the following content to it:
<html>
<head>
<title>Daily Quotes</title>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.1/css/bootstrap.min.css" integrity="sha384-WskhaSGFgHYWDcbwN70/dfYBj47jz9qbsMId/iRN3ewGhXQFZCSftd1LZCfmhktB" crossorigin="anonymous">
</head>
<body class="position-absolute h-100 w-100">
<div id="app" class="d-flex flex-wrap align-items-center align-content-center p-5 mx-auto w-50 position-relative"></div>
<script src="https://unpkg.com/jquery/dist/jquery.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/feather-icons/dist/feather.min.js"></script>
<script src="main.js"></script>
</body>
</html>
This creates a basic index page for the app with just one <div id="app">
which will serve as the mount point for all the dynamic content of the app.
You have also added a link to the Bootstrap CDN to get some default Bootstrap 4 styling for the app. You have also included jQuery for DOM manipulations and AJAX requests, and Feather icons for elegant SVG icons.
Now you are down to the last piece that powers the app — the main script. Create a new main.js
file in the public
directory of your app and add the following content to it:
jQuery(function ($) {
let app = $('#app');
let SYNTHESIS = null;
let VOICES = null;
let QUOTE_TEXT = null;
let QUOTE_PERSON = null;
let VOICE_SPEAKING = false;
let VOICE_PAUSED = false;
let VOICE_COMPLETE = false;
let iconProps = {
'stroke-width': 1,
'width': 48,
'height': 48,
'class': 'text-secondary d-none',
'style': 'cursor: pointer'
};
function iconSVG(icon) {}
function showControl(control) {}
function hideControl(control) {}
function getVoices() {}
function resetVoice() {}
function fetchNewQuote() {}
function renderQuote(quote) {}
function renderVoiceControls(synthesis, voice) {}
function updateVoiceControls() {}
function initialize() {}
initialize();
});
This code uses jQuery
to execute a function when the DOM is loaded. You get a reference to the #app
element and initialize some variables. You also declare a couple of empty functions that you will implement in the following sections. Finally, we call the initialize()
function to initialize the application.
The iconProps
variable contains a couple of properties that will be used for rendering Feather icons as SVG to the DOM.
With that code in place, you are ready to start implementing the functions. Modify the public/main.js
file to implement the following functions:
// Gets the SVG markup for a Feather icon
function iconSVG(icon) {
let props = $.extend(iconProps, { id: icon });
return feather.icons[icon].toSvg(props);
}
// Shows an element
function showControl(control) {
control.addClass('d-inline-block').removeClass('d-none');
}
// Hides an element
function hideControl(control) {
control.addClass('d-none').removeClass('d-inline-block');
}
// Get the available voices, filter the list to have only English filters
function getVoices() {
// Regex to match all English language tags e.g en, en-US, en-GB
let langRegex = /^en(-[a-z]{2})?$/i;
// Get the available voices and filter the list to only have English speakers
VOICES = SYNTHESIS.getVoices()
.filter(function (voice) {
return langRegex.test(voice.lang);
})
.map(function (voice) {
return {
voice: voice,
name: voice.name,
lang: voice.lang.toUpperCase(),
};
});
}
// Reset the voice variables to the defaults
function resetVoice() {
VOICE_SPEAKING = false;
VOICE_PAUSED = false;
VOICE_COMPLETE = false;
}
The iconSVG(icon)
function takes a Feather icon name string as an argument (e.g., 'play-circle'
) and returns the SVG markup for the icon. Check the Feather website to see the complete list of available Feather icons. Also check the Feather documentation to learn more about the API.
The getVoices()
function uses the SYNTHESIS
object to fetch the list of all the available voices on the device. Then, it filters the list using a regular expression to get the voices of only English speakers.
Next, you will implement the functions for fetching and rendering quotes on the DOM. Modify the public/main.js
file to implement the following functions:
function fetchNewQuote() {
// Clean up the #app element
app.html('');
// Reset the quote variables
QUOTE_TEXT = null;
QUOTE_PERSON = null;
// Reset the voice variables
resetVoice();
// Pick a voice at random from the VOICES list
let voice =
VOICES && VOICES.length > 0
? VOICES[Math.floor(Math.random() * VOICES.length)]
: null;
// Fetch a quote from the API and render the quote and voice controls
$.get('/api/quote', function (quote) {
renderQuote(quote.data);
SYNTHESIS && renderVoiceControls(SYNTHESIS, voice || null);
});
}
function renderQuote(quote) {
// Create some markup for the quote elements
let quotePerson = $('<h1 id="quote-person" class="mb-2 w-100"></h1>');
let quoteText = $('<div id="quote-text" class="h3 py-5 mb-4 w-100 font-weight-light text-secondary border-bottom border-gray"></div>');
// Add the quote data to the markup
quotePerson.html(quote.title.rendered);
quoteText.html(quote.content.rendered);
// Attach the quote elements to the DOM
app.append(quotePerson);
app.append(quoteText);
// Update the quote variables with the new data
QUOTE_TEXT = quoteText.text();
QUOTE_PERSON = quotePerson.text();
}
Here in the fetchNewQuote()
method, you first reset the app element and variables. You then pick a voice randomly using Math.random()
from the list of voices stored in the VOICES
variable. You use $.get()
to make an AJAX request to the /api/quote
endpoint, to fetch a random quote, and render the quote data to the view alongside the voice controls.
The renderQuote(quote)
method receives a quote object as its argument and adds the contents to the DOM. Finally, it updates the quote variables: QUOTE_TEXT
and QUOTE_PERSON
.
If you look at the fetchNewQuote()
function, you’ll notice that you made a call to the renderVoiceControls()
function. This function is responsible for rendering the controls for playing, pausing, and stopping the voice output. It also renders the current voice in use and the language.
Make the following modifications to the public/main.js
file to implement the renderVoiceControls()
function:
function renderVoiceControls(synthesis, voice) {
let controlsPane = $('<div id="voice-controls-pane" class="d-flex flex-wrap w-100 align-items-center align-content-center justify-content-between"></div>');
let voiceControls = $('<div id="voice-controls"></div>');
// Create the SVG elements for the voice control buttons
let playButton = $(iconSVG('play-circle'));
let pauseButton = $(iconSVG('pause-circle'));
let stopButton = $(iconSVG('stop-circle'));
// Helper function to enable pause state for the voice output
let paused = function () {
VOICE_PAUSED = true;
updateVoiceControls();
};
// Helper function to disable pause state for the voice output
let resumed = function () {
VOICE_PAUSED = false;
updateVoiceControls();
};
// Click event handler for the play button
playButton.on('click', function (evt) {});
// Click event handler for the pause button
pauseButton.on('click', function (evt) {});
// Click event handler for the stop button
stopButton.on('click', function (evt) {});
// Add the voice controls to their parent element
voiceControls.append(playButton);
voiceControls.append(pauseButton);
voiceControls.append(stopButton);
// Add the voice controls parent to the controlsPane element
controlsPane.append(voiceControls);
// If voice is available, add the voice info element to the controlsPane
if (voice) {
let currentVoice = $('<div class="text-secondary font-weight-normal"><span class="text-dark font-weight-bold">' + voice.name + '</span> (' + voice.lang + ')</div>');
controlsPane.append(currentVoice);
}
// Add the controlsPane to the DOM
app.append(controlsPane);
// Show the play button
showControl(playButton);
}
Here, you create container elements for the voice controls and the controls pane. You use the iconSVG()
function created earlier to get the SVG markup for the control buttons and create the button elements as well. You define the paused()
and resumed()
helper functions, which will be used while setting up the event handlers for the buttons.
Finally, you render the voice control buttons and the voice info to the DOM. It is also configured so only the Play button is shown initially.
Next, you will implement the click event handlers for the voice control buttons you defined in the previous section.
Set up the event handlers as shown in the following code snippet:
// Click event handler for the play button
playButton.on('click', function (evt) {
evt.preventDefault();
if (VOICE_SPEAKING) {
// If voice is paused, it is resumed when the playButton is clicked
if (VOICE_PAUSED) synthesis.resume();
return resumed();
} else {
// Create utterances for the quote and the person
let quoteUtterance = new SpeechSynthesisUtterance(QUOTE_TEXT);
let personUtterance = new SpeechSynthesisUtterance(QUOTE_PERSON);
// Set the voice for the utterances if available
if (voice) {
quoteUtterance.voice = voice.voice;
personUtterance.voice = voice.voice;
}
// Set event listeners for the quote utterance
quoteUtterance.onpause = paused;
quoteUtterance.onresume = resumed;
quoteUtterance.onboundary = updateVoiceControls;
// Set the listener to activate speaking state when the quote utterance starts
quoteUtterance.onstart = function (evt) {
VOICE_COMPLETE = false;
VOICE_SPEAKING = true;
updateVoiceControls();
};
// Set event listeners for the person utterance
personUtterance.onpause = paused;
personUtterance.onresume = resumed;
personUtterance.onboundary = updateVoiceControls;
// Refresh the app and fetch a new quote when the person utterance ends
personUtterance.onend = fetchNewQuote;
// Speak the utterances
synthesis.speak(quoteUtterance);
synthesis.speak(personUtterance);
}
});
// Click event handler for the pause button
pauseButton.on('click', function (evt) {
evt.preventDefault();
// Pause the utterance if it is not in paused state
if (VOICE_SPEAKING) synthesis.pause();
return paused();
});
// Click event handler for the stop button
stopButton.on('click', function (evt) {
evt.preventDefault();
// Clear the utterances queue
if (VOICE_SPEAKING) synthesis.cancel();
resetVoice();
// Set the complete status of the voice output
VOICE_COMPLETE = true;
updateVoiceControls();
});
Here, you set up the click event listeners for the voice control buttons. When the Play button is clicked, it starts speaking the utterances starting with the quoteUtterance
and then the personUtterance
. However, if the voice output is in a paused state, it resumes it.
You set VOICE_SPEAKING
to true
in the onstart
event handler for the quoteUtterance
. The app will also refresh and fetch a new quote when the personUtterance
ends.
The Pause button pauses the voice output, while the Stop button ends the voice output and removes all utterances from the queue, using the cancel()
method of the SpeechSynthesis
interface. The code calls the updateVoiceControls()
function each time to display the appropriate buttons.
You have made a couple of calls and references to the updateVoiceControls()
function in the previous code snippets. This function is responsible for updating the voice controls to display the appropriate controls based on the voice state variables.
Make the following modifications to the public/main.js
file to implement the updateVoiceControls()
function:
function updateVoiceControls() {
// Get a reference to each control button
let playButton = $('#play-circle');
let pauseButton = $('#pause-circle');
let stopButton = $('#stop-circle');
if (VOICE_SPEAKING) {
// Show the stop button if speaking is in progress
showControl(stopButton);
// Toggle the play and pause buttons based on paused state
if (VOICE_PAUSED) {
showControl(playButton);
hideControl(pauseButton);
} else {
hideControl(playButton);
showControl(pauseButton);
}
} else {
// Show only the play button if no speaking is in progress
showControl(playButton);
hideControl(pauseButton);
hideControl(stopButton);
}
}
In this section of code, you first get a reference to each of the voice control button elements. Then, you specify which voice control buttons should be visible at different states of the voice output.
You are now ready to implement the initialize()
function. This function is responsible for initializing the application. Add the following code snippet to the public/main.js
file to implement the initialize()
function.
function initialize() {
if ('speechSynthesis' in window) {
SYNTHESIS = window.speechSynthesis;
let timer = setInterval(function () {
let voices = SYNTHESIS.getVoices();
if (voices.length > 0) {
getVoices();
fetchNewQuote();
clearInterval(timer);
}
}, 200);
} else {
let message = 'Text-to-speech not supported by your browser.';
// Create the browser notice element
let notice = $('<div class="w-100 py-4 bg-danger font-weight-bold text-white position-absolute text-center" style="bottom:0; z-index:10">' + message + '</div>');
fetchNewQuote();
console.log(message);
// Display non-support info on DOM
$(document.body).append(notice);
}
}
This code first checks if speechSynthesis
is available on the window
global object and is then assigned to the SYNTHESIS
variable if it is available. Next, you set up an interval for fetching the list of available voices.
You are using an interval here because there is a known asynchronous behavior with SpeechSynthesis.getVoices()
that makes it return an empty array at the initial call because the voices have not been loaded yet. The interval ensures that you get a list of voices before fetching a random quote and clearing the interval.
You have now successfully completed the text-to-speech app. You can start the app by running the following command in your terminal:
- npm start
The app will be running on port 5000
if it is available.
Visit localhost:5000
in your browser to observe the app.
Now, interact with the play button to hear the quotation spoken.
In this tutorial, you used the Web Speech API to build a text-to-speech app for the web. You can learn more about the Web Speech API and also find some helpful resources at the MDN Web Docs.
If you’d like to continue refining your app, there are a couple of interesting features you can still implement and experiment with such as volume controls, voice pitch controls, speed/rate controls, percentage of text uttered, etc.
The complete source code for this tutorial is available on GitHub.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
I tried to execute the code. Did not get the buttons or the quote text visible on the screen. Is this sill working? Is there an issue with browser settings? What else can I check? The server ran successfully and with a verbose line of code I could see it is executing correctly. The index.html page also works fine. The doubt is on the output of public/main.js file. I am not getting any response from it. The Quote or the writer’s name is not visible at all. Neither the play pause or other buttons.