Amazon, in a combined effort with managed hosting provider WP Engine, has launched a plugin titled ‘Amazon Polly’, the sole purpose of which is to turn blog posts into audio streams.
The plugin relies heavily on Amazon’s existing text-to-speech technology, Polly, and allows webmasters to improve the accessibility of their content by embedding audio in a blog post or turning their blog posts into podcasts (which relies on another feature called Pollycast).
Amazon Polly utilises Speech Synthesis Markup Language (SSML), a markup language that allows a webmaster to control various aspects of audio output including pronunciation and speech rate (however text input doesn’t need to be provided in SSML, it can also be provided in plain text).
Amazon highlights some of the capabilities of SSML and Polly in a blog post published in 2016. It cites the example of the word “live” which, in the phrases “live from New York” and “I live in Seattle”, demands different pronunciation dependent on context.
While traditional speech-to-text software would struggle to pick up on nuances such as these, Amazon claims Polly can understand the difference.
While Amazon states that Polly can support 47 different male and female voices and 24 languages, it also states in their Amazon Polly Developer Guide that Amazon Polly is not a translation service.
You can run a test of Polly in the AWS console here (you’ll have to sign-in with an AWS account).
The plugin is available to install now, however some users may be put off by the fact that some configuration will be required. It’s also worth noting that Amazon Polly is only free of charge for new customer for the first 12 months – in this period, and under the AWS free tier, you can transfer 5 million characters of text to audio free of charge.
Amazon states that, after the initial 12 month period is over, you’ll pay $0.004 for each minute of generated audio. It also highlights a pricing example for the book the Adventures of Huckleberry Finn, which it estimates would cost $2.40 to transfer from text to speech.