The Future of Voice Activated Content

The Future of Voice Generated Content

With the adoption rate of smart speaker devices increasing by quarter, it seems more individuals will be communicating with their “assistants” as the years go on, in some way shape or form.

Google took a swipe at the dominant market share Amazon held by outselling Amazon in two bystander sales quarters. The real test will be this winter quarter where Google has poised itself to move inventory away from Amazon by launching it’s Google Express store - tying it to it’s vast search index technologies. Amazon itself is firing back through it’s expansion of DPS services which has been rapidly eating into Google’s GDN/DBM share.

On top of the North American competition you also have the Chinese and South Korean markets which are not to be ignored as they cater to a much wider evolving consumer audience.

What this shows is just an overall global consumer demand - or I should say consumer desire; for smart speaker devices.

As this grows - there will also be a learning curve to voice generated content that drives action from users through these devices. There will be a point where the consumer will want to interact with the assistant. This transition will take time to occur.

In the meantime we are shown some beautiful ingenuity that might be compatible with wider audiences from individuals - such as the ‘Rapping Neural Network’. Fed entirely by Kanye West’s (or should I say YE’s?) voice, the creator behind this Network; Robbiebarrat aims to.. well.. make songs that sound as if it’s Kanye (or YE?).

The University of Edinburgh is building a toolkit named ‘Merlin’ which aims to be an open-source go to for DIY voice synthesis projects.

Another good source for DIY voice synthesis projects would be the Nanami documentation.

Lyrebird is a company which claims to give you your own voice - but it still has some hurdles to get through before it can be considered readily usable. More to see once it’s API comes out of closed beta.

If all of this has gotten you curious on sound synthesis and mastering it’s potential to craft voice generated content here is some more reading material.

GST Tacotron (expressive end-to-end speech synthesis using global style token)

Prosodic Modifications | Prosoidic Modifications 2 (Pitch, loudness, tempo)

Modifications for Music

Learning with Siri

Generating Sequences with Recurrent Neural Networks

Credit for content goes to Mila. If you’d like to learn more about this great group of researchers you can do so here.

Future Voice Activated Content