Before deciding on text-to-speech APIs, I played around with a few of the heavy hitters. Both Google and IBM have incredibly robust infrastructures revolving around their machine learning technologies, which provides a multitude of options for receiving text and outputting speech and vice versa. The opportunities are endless, but it’s more power than necessary for this project.
Mozilla has an easy-to-use speech API that goes perfectly with a typing word game already I already have in place. In two previous blogs, I talked about a simple typing game I made using the Wordnik API. In those versions of the game, the app outputs the Wordnik API results, which a user then attempts to type correctly, subsequently triggering another API call and another word if successful.
I decided to change this game up and make it like a spelling bee, so a user must listen to the word and type it out. Continuing play will again depend on successfully typing input and will continue until the time runs out.
In their docs, Mozilla makes it clear this is incomplete work, “experimental technology.” How much more is to come? What sort of experiments are pending?
The SpeechSynthesis interface controls the Web Speech API through a series of built in functions including pause, play, and cancel. All of the built-in methods are inherited properties from the EventTarget.
Spoken text, known as an utterance, is triggered by the speak() function. Therefore, to get my Wordnik API word spoken to my user, I simply have to pass it in as a prop to my Speaking component, and it will be the “text” from which the Web Speech API reads.
In this simple component below, the word repeats until you move to the next one.
Additional functionality includes a variety of different voices and pitches which you can cater to your own ear.