【百度研发语音识别系统:听力比人要好?】

【百度研发语音识别系统:听力比人要好?】据MIT Technology Review网站报道,百度开发的“深度语音2”对英语、普通话的识别能力已非常优越,在某些场合下甚至优于人类。“深度语音2”依赖于机器学习技术进行翻译。据称,语音查询在中国非常受欢迎,不仅因为输入已显得非常耗时,更在于解决某些人士对拼音的不熟悉的窘境。

China’s leading Internet-search company, Baidu, has developed a voice system that can recognize English and Mandarin speech better than people, in some cases.

The new system, called Deep Speech 2, is especially significant in how it relies entirely on machine learning for translation. Whereas older voice-recognition systems include many handcrafted components to aid audio processing and transcription, the Baidu system learned to recognize words from scratch, simply by listening to thousands of hours of transcribed audio.

The technology relies on a powerful technique known as deep learning, which involves training a very large multilayered virtual network of neurons to recognize patterns in vast quantities of data. The Baidu app for smartphones lets users search by voice, and also includes a voice-controlled personal assistant called Duer (see “Baidu’s Duer Joins the Personal Assistant Party”). Voice queries are more popular in China because it is more time-consuming to input text, and because some people do not know how to use Pinyin, the phonetic system for transcribing Mandarin using Latin characters.

“Historically, people viewed Chinese and English as two vastly different languages, and so there was a need to design very different features,” says Andrew Ng, a former Stanford professor and Google researcher, and now chief scientist for the Chinese company. “The learning algorithms are now so general that you can just learn.”

Deep learning has its roots in ideas first developed more than 50 years ago, but in the past few years new mathematical techniques, combined with greater computer power and huge quantities of training data, have led to remarkable progress, especially in tasks that require some sort of visual or auditory perception. The technique has already improved the performance of voice recognition and image processing, and large companies including Google, Facebook, and Baidu are applying it to the massive data sets they own.

Deep learning is also being adopted for ever-more tasks. Facebook, for example, uses deep learning to find faces in the images that its users upload. And more recently it has made progress in using deep learning to parse written text (see “Teaching Machines to Understand Us”). Google now uses deep learning in more than 100 different projects, from search to self-driving cars.

In 2013, Baidu opened its own effort to harness this new technology, theDeep Learning Institute, co-located at the company’s Beijing headquarters and in Silicon Valley. Deep Speech 2 was primarily developed by a team in California.

In developing Deep Speech 2, Baidu also created new hardware architecture for deep learning that runs seven times faster than the previous version. Deep learning usually relies on graphics processors, because these are good for the intensive parallel computations involved.

The speed achieved “allowed us to do experimentation on a much larger scale than people had achieved previously,” says Jesse Engel, a research scientist at Baidu and one of more than 30 researchers named on a paper describing Deep Speech 2. “We were able to search over a lot of [neural network] architectures, and reduce the word error rate by 40 percent.”

Ng adds that this has recently produced some impressive results. “For short phrases, out of context, we seem to be surpassing human levels of recognition,” he says.

He adds: “In Mandarin, there are a lot of regional dialects that are spoken by much smaller populations, so there’s much less data. This could help us recognize the dialects better.”

http://www.technologyreview.com/news/544651/baidus-deep-learning-system-rivals-people-at-speech-recognition/


Comments are closed.



无觅相关文章插件