Text to Speech with Javascript
Updated on July 23, 2019
Concepts and Methods Involved
SpeechSynthesis API — SpeechSynthesis API is the API that performs text to speech service in the browser. This is exposed through window.speechSynthesis
SpeechSynthesis API
SpeechSynthesis API is a part of the Web Speech API, that is responsible for speech service. The global window.speechSynthesis object implements the SpeechSynthesis API.
The important methods defined in it are :
getVoices() : This method will give a list of available voices that can be played. They will come in different languages, and you can set a language of your preference to speak. Each voice has few properties, some of them are name, lang etc.
Important : The list of voices may be loaded asynchronously in the browser — some browsers (like Chrome) make a server request to get the voice list, while some (like Firefox) have the list loaded into them. So in order for the getVoices() method to work, you may need to wait for the list of languages to be loaded (otherwise list of voices may be returned empty). This can be done by listening to the voiceschanged event fired by window.speechSynthesis.
A simple check that can be performed is to get the voice list initially. If empty, then listen to the voiceschanged eent.
speak() : This method will add a speech (or an utterance) to a queue called utterance queue. This speech will be spoken after all speeches in the queue before it have been spoken.
Here are the complete APIs for the SpeechSynthesis object.
SpeechSynthesisUtterance API
Whenever you want a speech to be spoken, you will need to create a SpeechSynthesisUtterance object.
This object contains properties that affect various factors defining a speech :
lang : Language of the speech
pitch : Pitch of the speech
rate : Speed at which speech will be spoken
text : Text of the speech
voice : Voice of speech. This will be one of the voices returned by window.speechSynthesis.getVoices() method
volume : Volume of the speech
In addition there are several events that are fired along the way of a speech, some of them are :
onstart : Fired when speech has begun to be spoken
onend : Fired when speech has finished
onboundary : Fired when speech reaches a word or sentence boundary
Here are the complete APIs for the SpeechSynthesisUtterance object.
Sample Javascript Code
Browser Compatability
SpeechSynthesis API is availabe in all current versions of Firefox, Chrome, Edge & Safari.
Don't Autoplay a Speech
Some sites start a speech upon the page being loaded. To prevent such autoplay behaviour, it is now required to have some user interaction before speech synthesis API will work. Otherwise it will throw an error.
Find more about autoplay policies on the web.
Reference : https://usefulangle.com/post/98/javascript-text-to-speech
Example II : SpeechSynthesisUtterance
The SpeechSynthesisUtterance
interface of the Web Speech API represents a speech request. It contains the content the speech service should read and information about how to read it (e.g. language, pitch and volume.)
Constructor
SpeechSynthesisUtterance.SpeechSynthesisUtterance()
Returns a new SpeechSynthesisUtterance
object instance.
Properties
SpeechSynthesisUtterance
also inherits properties from its parent interface, EventTarget
.SpeechSynthesisUtterance.lang
Gets and sets the language of the utterance.SpeechSynthesisUtterance.pitch
Gets and sets the pitch at which the utterance will be spoken at.SpeechSynthesisUtterance.rate
Gets and sets the speed at which the utterance will be spoken at.SpeechSynthesisUtterance.text
Gets and sets the text that will be synthesised when the utterance is spoken.SpeechSynthesisUtterance.voice
Gets and sets the voice that will be used to speak the utterance.SpeechSynthesisUtterance.volume
Gets and sets the volume that the utterance will be spoken at.
Events
Listen to these events using addEventListener()
or by assigning an event listener to the on
eventname
property of this interface.boundary
Fired when the spoken utterance reaches a word or sentence boundary.
Also available via the onboundary
property.end
Fired when the utterance has finished being spoken.
Also available via the onend
property.error
Fired when an error occurs that prevents the utterance from being succesfully spoken.
Also available via the onerror
propertymark
Fired when the spoken utterance reaches a named SSML "mark" tag.
Also available via the onmark
property.pause
Fired when the utterance is paused part way through.
Also available via the onpause
property.resume
Fired when a paused utterance is resumed.
Also available via the onresume
property.start
Fired when the utterance has begun to be spoken.
Also available via the onstart
property.
Examples
In our basic Speech synthesiser demo, we first grab a reference to the SpeechSynthesis controller using window.speechSynthesis
. After defining some necessary variables, we retrieve a list of the voices available using SpeechSynthesis.getVoices()
and populate a select menu with them so the user can choose what voice they want.
Inside the inputForm.onsubmit
handler, we stop the form submitting with preventDefault(), use the constructor
to create a new utterance instance containing the text from the text <input>
, set the utterance's voice
to the voice selected in the <select>
element, and start the utterance speaking via the SpeechSynthesis.speak()
method.
Reference : https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance
Last updated