It's probably not a human being who's calling you! Google Duplex disrupts smart assistants

Curated | Vincent
Contribution | Vincent, Natalie, Debra
Editor | Natalie


AI Frontline Guide: May 9, 2018, Beijing time, will be a busy day for the global technology media, not only the second day of the Microsoft Build 2018 developer conference will be held today, Google I/O developers The conference also kicked off on this day. Compared with the Build conference, Google has made more preparations, and black technologies emerge in an endless stream. The most impressive thing is that Google Assistant has evolved to be almost the same as a real person.

For more dry goods, please pay attention to the WeChat public account "AI Frontline", (ID: ai-front)
Call! Here comes the fake Google Assistant!

Let's review the demo clip that makes people "shiver":

v.qq.com/x/page/l064…

Can you really tell if the guest who called to make an appointment is human?

Google CEO Sundar Pichai has repeatedly emphasized: This is a real phone recording!

Then, the video demonstrates that Google voice assistant helps the director to book hairdressing services, including time, location, appointment service content, etc. The voice assistant can make a more natural response according to the speaker's speech:

The final video shows that the voice assistant successfully helped the owner to make an appointment, and the whole process went very smoothly.

Another recorded video that follows demonstrates how the voice assistant responds to complex and unanticipated situations. The demo said that it takes 5 guests to make an appointment at the original appointment time. Google's voice assistant will intimately ask "how long will the waiting time take?" This is a small detail that humans may not think of, and the result is very smooth. There are no guests. Bit limited time.

This wave of operations received applause and knowing smiles from the audience. It seems that everyone is quite satisfied with this little assistant.

However, some people joked that this assistant will not pretend to be me to make phone calls? Even the phone can be called for me. People who don’t want to answer call and throw it to the voice assistant directly. Even the entertainment is saved, so why are you still alive? ! Jokes aside, I have to admit that the Google Assistant function is still quite powerful - but it is currently only in the demo , because there is no actual operation on the spot, is it afraid of making mistakes on the spot? Or is the stability of this 666 operation not particularly strong, and Google is afraid of embarrassing people around the world? Whether Google Assistant is really so 666, it will be judged after everyone uses this function.

The black technology behind Duplex

According to the official introduction, the reason why Google Assistant can be almost exactly the same as a real person on the phone depends on this technology called Google Duplex, which is used for natural conversations on the phone to perform "real world". "New Technology for Missions. The technology is designed to accomplish specific tasks, such as scheduling certain types of appointments. For tasks like this, the system makes the conversational experience as natural as possible, enabling people to have a normal conversation as if they were speaking to another human being, not a machine.

In order to make the dialogue sound as natural as possible, in addition to the sound, Google has improved its natural language understanding ability a lot. In natural conversations, people speak much faster than machines do, often less clearly, so speech recognition is more difficult and results in a higher rate of word errors. The problem is exacerbated during phone calls, where there is often a lot of background noise and poor sound quality.

In a longer conversation, the same sentence can have very different meanings depending on the context. For example, when booking "Ok for 4" it can mean the booking time or the number of people. Often, the relevant context may return several sentences, a problem compounded by the increased word error rate in phone calls.

At the heart of Duplex is a Recurrent Neural Network (RNN) specifically designed to address these challenges, developed using TensorFlow Extended (TFX). To achieve high accuracy, the designers trained Duplex's RNN on a corpus of anonymous telephone conversation data. The network uses the output of Google's Automatic Speech Recognition (ASR) technology, as well as features from audio, the history of the conversation, the parameters of the conversation (such as the desired service for an appointment or the current time), and more. Designers train comprehension models separately for each task, leveraging a shared corpus across tasks. Finally, use hyperparameter optimization in TFX to further improve the model.

The incoming sound is processed by the ASR system, then analyzed with contextual data and other inputs to generate response text, and finally the response text is read aloud by the TTS system.

As we can see in the video, Google Assistant even emits modal particles such as eh, well, etc. during the conversation. This is also a careful design by the designers to make it more human-like. In the system processing information In the process, making such a sound will make the other person feel more like a human being thinking.

In addition, Google also emphasized the importance of latency . For example, when people say something as simple as "Hello?" and they expect an immediate response, people are more sensitive to delays. Duplex uses faster low-confidence models (such as speech recognition or endpoints) when it detects that low latency is required. In extreme cases don't even wait for the RNN to return a response, but use a faster approximate response (usually hesitant to respond, just like a person hesitates a little without fully understanding the other). This allows Google Assistant to respond with less than 100 milliseconds of latency in these situations. Interestingly, in some cases, the researchers found that introducing more delay actually made the conversation more natural, such as when answering a very complex sentence.

Based on Google Duplex, users do not need to make calls directly, but only need to interact with Google Assistant, and the subsequent calls are made entirely by Google Assistant in the background, and users do not need to intervene. It feels like a major benefit for the many social phobia "patients" who prefer to send emails and messages, and feel nervous at the thought of making a phone call...

According to the official introduction, Duplex is configured in Google Assistant, which will solve the affairs of various life scenarios for users, and the live demonstration is only a small part of its functions. But when it comes to this, in fact, the editor's brain is a bit open, and I want to ask some questions:

In the previous smart assistants, the user issued instructions and the smart assistants executed them, but people still needed to operate things such as making calls and ordering a table. However, after Google, the role of people seems to be less. It only needs one sentence, and the smart assistant will help you. Once there is a problem in the communication, who should take the responsibility?

However, Google officials also emphasized that at present, Google Duplex can only be limited to certain closed areas, and only when these areas are narrow enough is it suitable for Duplex to conduct in-depth exploration. Duplex is only capable of natural conversations after deep training in these domains, it is not capable of broader general conversations.

What are the amazing new features of Google Assistant?

In addition to Duplex technology, at today's I/O conference, Google also announced many other functional updates to the virtual assistant, many of which are very powerful...

new voice

Maybe users are tired of the standard voice of Google Assistant, which is why Google decided to add 6 different male and female voices to it. One of them comes from American singer John Legend, who once starred in "La La Land".

But it's not the fact that John Legend was brought in, but the efficiency with which Google was able to generate new sounds for the Assistant.

With the help of DeepMind's deep neural network model WaveNet, with only a small amount of corpus and powerful calculations, Google can produce a voice image that is highly similar to the original corpus, and the time is reduced from several months to hundreds of hours.

More powerful multi-turn dialogue and multitasking capabilities

Scott Huffman, vice president of Google Assistant, aired a video of an online superhero of a granny who doesn't know how to use the Google Home smart speaker, and pointed out that there is still a lot of room for improvement in the user experience. He then demonstrated a new feature: Multiple Actions—enhancing the ability of intelligent voice assistants to have natural, multi-turn “conversations” with humans.

Talking to Google Assistant in the past required a wake-up word "Ok Google" in front of each sentence. This setting is finally retired from today. In addition, Google Assistant can understand multiple meanings expressed in a sentence and handle multiple tasks at once.

For example, in the example above, the user first asked about the Warriors' game results, then asked about the Warriors' next game, and finally asked the virtual assistant to remind him to look for his sweater when he got home. Added a wake word at the beginning.

It's easy for humans to understand a few consecutive sentences in a situation, but in the past, some virtual assistants couldn't even complete a simple task, let alone multitasking. Today, Google Assistant seems to be able to handle multitasking pretty well.

Gmail Smart Compse

You should know that Gmail and Inbox support smart replies, but in the past there were only simple replies like "thank you" and "that's it". Soon, Gmail will get a powerful smart compose feature. Like autocomplete for search engines, Gmail will automatically suggest the next word based on the previous word you wrote until you've written the entire email...

It sounds a bit mysterious, you can see the effect:

According to reports, Google Assistant has been connected to more than 500 million devices worldwide, divided into 5,000 different devices, and there are more than 40 car brands.

In addition to the improvement of natural language processing capabilities, Google has put another improvement in visual aids - Visually Assistive, Google Assistant Product Manager Director Lilian Rincon For example, for example, if you ask a Starbucks coffee shop, the phone will show the coffee at the same time Store menu content.

One More Thing

Although it is not stated in the official article, we can also guess that the improvement of Google Assistant level is inseparable from the training model and basic equipment behind it. If nothing else, Google Assistant is likely to use the newly released TPU 3.0 at this conference. to train.

Before we officially introduce TPU 3.0, we want to interrupt and talk about a recent little move by GPU manufacturer Nvidia. Before the start of the IO conference, NVIDIA suddenly broke out a set of data for the latest model GPU V100:

  • When training ResNet-50, a single V100 Tensor Core GPU can achieve 1,075 images per second, a 4x performance improvement over previous generation Pascal GPUs.

  • A DGX-1 server with 8 Tensor Core V100s can achieve 7,850 images per second, nearly double the 4,200 images per year on the same system.

  • A single AWS P3 cloud instance powered by eight Tensor Core V100s can train ResNet-50 in less than three hours, 3x faster than a TPU instance.

If you guessed correctly, Nvidia should be comparing the previous generation TPU, which is version 2.0. I chose to release it at this time, presumably Nvidia has also moved a little bit of thought. However, the release of TPU 3.0 may make Nvidia's careful thinking come to nothing.

In addition to improving the performance of TPU 3.0 to 8 times that of the previous generation, Waymo's CEO said that in the training of unmanned vehicles using the new version of TPU, the performance has increased by 15 times. At the same time, this generation of TPUs has also added a liquid cooling system, based on a new architecture, which can execute larger, more complex and more accurate models and solve more difficult problems. At present, TensorFlow is the most widely used deep learning framework, especially after the commercial use of Cloud TPU, it can attract more people to use its services.

Yesterday at the Microsoft Build 2018 developer conference, Project Brainwave for FPGA also released a preview version. Although it lags behind in the chip field, it can be seen that Microsoft is also trying to catch up. Not long ago, companies such as Facebook and Alibaba also announced plans to enter the chip field. Will the next decisive battle start from chips?

References:

https://ai.googleblog.com/2018/05/duplex-ai-system-for-natural-conversation.html

mp.weixin.qq.com/s/gG8mdlkOo…

https://devblogs.nvidia.com/tensor-core-ai-performance-milestones/


Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326691013&siteId=291194637