Sudhir Raikar, IIFL | Mumbai | April 25, 201609:51 IST
VocaliD aptly sums up its enduring value prop on its website: ‘The voice company that is bringing speaking machines to life.’ This initiative has indeed taken man- machine collaboration to another level.
Belmont-based VocaliD is not your everyday startup. Brainchild of 43-year old speech scientist Rupal Patel, it’s a project aimed at designing personalized synthetic voices to help people with severe speech impairments speak in a voice that suits their body and persona. In what’s a telling outcome of Patel’s painstaking collaboration with Dr. Tim Bunnell of Nemours Al Dupont Hospital for Children, VoacliD has developed algorithms to build unique voices for people who are forced to rely on computer aided-speech in the aftermath of a stroke, Parkinson’s Disease, cerebral palsy or other serious impediments.
A BS from the University of Calgary and Masters and PhD from the University of Toronto, Patel’s experience spans over 15 years of clinical and research experience in assistive technology. She has authored over 50 peer- reviewed journal articles and several hundred conference presentations in the areas of speech motor control and assistive communication technology and has raised $5M+ in research funding from federal agencies and foundations.
She holds appointments in the Harvard/MIT Speech and Hearing Biosciences and Technology program, The Department of Psychiatry at University of Massachusetts, and Haskins Laboratory at Yale University. A tenured Professor at Northeastern University in the College of Computer and Information Science, she has founded an interdisciplinary laboratory at the Department of Communication Sciences and Disorders. Besides Patel, the VocaliD team (https://www.vocalid.co/about) comprises some of the best brains – scientists, engineers and entrepreneurs - all committed champions.
It was a young woman called Samantha who unknowingly made Patel’s mission even more purposeful in serving the larger cause of its beneficiaries. The team realized midway - in the course of the team’s efforts to create a perfectly scientific voice for her - that Samantha didn’t want a perfect voice; she wanted ‘her’ voice back. This lent momentum to VocaliD’s experiments to iteratively improve techniques, which rely on combining the recipient’s vocal identity features with the speech clarity features of a matched voice donor. Before VocaliD’s innovation, a synthetic voice invariably meant a generic computerized voice, the most popular example being Stephen Hawking, who uses a synthesiser called DECtalk.
The VocaliD team is focused on raising funds for a unique Human Voicebank Initiative that seeks to build the infrastructure to gather and store all donor voices. The goal is to collect one million voice samples by 2020 to create the world’s largest voice repository. This corpus would help VocaliD generate unique vocal identities for hundreds of recipients through matching donors.
The Human Voicebank Initiative is indeed herculean as the firm’s Lab model cannot not be scaled to ensure mass reach.
Voice donors need to visit either Patel’s or Dr. Bunnell’s labs to record three hours of speech (around 3,200 sentences) in a professional sound studio. The patient “utterances” – sounds that they are able to produce – provide clues to the original speech texture, prior to the impediment. Surrogates of the same gender and age group are then made to read from classic books. The two voices are blended together to create a high-quality matching voice using a tool called ModelTalker. It takes at least 800 sentences to create a usable voice, and around 3000 to make it sound natural, to the extent possible. The VocaliD wizards can reverse engineer a voice with just three seconds of sound, using algorithms to find a matched speaker within the Voicebank and blend the vocal DNA with their recordings. The result is a personalized digital voice that preserves the match's clarity, and conveys the beneficiary’s unique vocal identity.
The lab-based arrangement limits the possibility of making a real difference for the hundreds of people waiting for voices and for the even larger populace who may want voices in the future. Given the fact that beneficiaries are not limited to a single age group or backgrounds, the group of donors need to be diverse in the true sense of the word.
VocaliD has since broad-based its model through software that can run on tablets and mobile phones that are equipped with high-quality microphones, indeed god sent for the given purpose. Gaming is another option the startup is working on to make communication engaging for children as also for those who may lack the technical awareness to learn about the project in DIY mode.
VocaliD aptly sums up its enduring value prop on its website: ‘The voice company that is bringing speaking machines to life.’ This initiative has indeed helped man and machine collaboration move up the value chain. Needless to say, mass scale public participation will help the team achieve its 2020 mission, may be even surpass it. The VocaliD voicebanking platform currently has over 11,000 members from over 110 countries. You can help both these figures swell phenomenally. Visit https://www.vocalid.co/how to find out more.
Picture credit: vocalid.co