Question

Preset SpeechSynthesisUtterance - lang and voice - and pass that TTS to audio stream

Hello,

Workflow:

Text Form -> submit -> preset e.g. lang Italian and voice Microsoft Elsa - Italian (Italy) for SpeechSynthesisUtterance -> that synthesis make as MP3 source for audio stream.

Is it possible?


Submit an answer


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

KFSys
Site Moderator
Site Moderator badge
June 25, 2024

Yes, it’s possible to preset the language and voice for SpeechSynthesisUtterance in JavaScript and convert that to an audio stream. However, the Web Speech API, which provides text-to-speech functionality, does not natively support exporting audio to a file (like MP3). You’d typically need a server-side solution for that, but I can show you how to set up the client-side TTS and play it back in the browser.

Here’s an example of how you can use SpeechSynthesisUtterance to set the language and voice, and then play the audio in the browser:

<!DOCTYPE html>
<html>
<head>
    <title>TTS Example</title>
</head>
<body>
    <form id="tts-form">
        <textarea id="text-input" rows="4" cols="50">Enter text to convert to speech</textarea><br>
        <button type="submit">Convert to Speech</button>
    </form>
    <script>
        // Wait for the voices to be loaded
        function populateVoiceList() {
            if (typeof speechSynthesis === 'undefined') {
                return;
            }

            const voices = speechSynthesis.getVoices();
            const voiceSelect = document.createElement('select');
            voiceSelect.id = 'voice-select';

            voices.forEach((voice) => {
                const option = document.createElement('option');
                option.value = voice.name;
                option.innerHTML = `${voice.name} (${voice.lang})`;
                voiceSelect.appendChild(option);
            });

            document.body.appendChild(voiceSelect);
        }

        populateVoiceList();
        if (typeof speechSynthesis !== 'undefined' && speechSynthesis.onvoiceschanged !== undefined) {
            speechSynthesis.onvoiceschanged = populateVoiceList;
        }

        document.getElementById('tts-form').addEventListener('submit', function(event) {
            event.preventDefault();

            const text = document.getElementById('text-input').value;
            const voiceSelect = document.getElementById('voice-select');
            const selectedVoice = voiceSelect.value;

            const utterance = new SpeechSynthesisUtterance(text);
            utterance.lang = 'it-IT'; // Italian language code

            const voices = speechSynthesis.getVoices();
            for (let i = 0; i < voices.length; i++) {
                if (voices[i].name === selectedVoice) {
                    utterance.voice = voices[i];
                    break;
                }
            }

            speechSynthesis.speak(utterance);
        });
    </script>
</body>
</html>

Explanation:

  1. HTML Form: The form takes text input and submits it.
  2. JavaScript:
    • populateVoiceList(): This function loads available voices and populates a dropdown list.
    • Event Listener: When the form is submitted, it creates a SpeechSynthesisUtterance object with the specified text, language (it-IT for Italian), and selected voice.
    • Speech Synthesis: The speechSynthesis.speak() method is called to play the speech.

Note:

  • This example focuses on client-side playback. Converting speech to MP3 and streaming it would require server-side processing with additional tools like FFmpeg.
  • You can replace 'it-IT' with any other language code supported by the Web Speech API.
  • The SpeechSynthesisUtterance does not directly support saving the output as an MP3 file; you’d need to use a server-side text-to-speech API for that.
Bobby Iliev
Site Moderator
Site Moderator badge
June 23, 2024

Hi there,

Yes, this workflow should be possible.

What you could do is:

  1. Create a text form in HTML
  2. Set up event listener for form submission
  3. After that create a SpeechSynthesisUtterance with preset language and voice
  4. Then convert the speech synthesis to an audio stream
  5. And then create an audio element to play the stream

Feel free to give current setup and I will be happy to help out.

Also if you are hitting any issues, feel free to share the errors here as well.

- Bobby

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.