NLP and EVs, a notable move toward automated vehicles

Amin Zahedi
Jun 7, 2021
4 min read

Updated: Jun 9, 2021

Image Credits: https://cognitechx.com/wp-content/uploads/2020/04/107_agfuzc1tyxpllu5muc1kyxjrlwjsdwu-scaled-1.jpg

Just 20 years ago, the idea of owning a smart car was nothing more than a dream. Today, artificial intelligence (AI) permeates every part of our lives. As AI and data sources mature every day at an exceptional rate, researchers are now focused on designing different models to select the most optimal solutions to a variety of problems under varying conditions. They are accomplishing this by developing machine learning (ML) algorithms. By combining ML, cloud computing, 5G technology, and vehicle automation into electric vehicles (EVs), the future of transportation is looking bright. Among the diverse applications of ML, natural language processing (NLP) is designed to solve higher skill problems that deal with sequence models such as audio or text. NLP in EVs, with the incorporation of the technologies mentioned above, represents a remarkable leap in EV automation.

ML can be incorporated into many different parts of EVs, from low-level internal energy management controllers, battery systems, and temperature analysis, to high-level systems dealing with cameras, radar, light detection, and ranging (LiDAR); and even to model a community of EV users and their interactions with the power grid.

NLP is a branch of ML that deals with sequence models such as speech recognition, sentiment classification, word embedding, and machine translation. With NLP, several new features can be added to the vehicles. Passengers can talk with the vehicle AI engine enabled with speech recognition and even ask for a ride to a particular destination. If they are far from the vehicle, they can call and request the vehicle to come to pick them up, which could be done by text message or with a phone call through human-AI interactions. Moreover, users can request entertainment services such as displaying the weather, showing the local news, or playing requested music. NLP can also work in the tourism industry by adding features to sight-seeing buses and providing informative videos and audios or the operating hours of various sites. With machine translation, tourists can also have a more comfortable trip and improved communication with other nationalities in real-time.

In addition to the cited utilities, NLP can offer technical services for drivers and passengers. In Hybrid Electric Vehicles (HEVs) that may face two or more forms of energy sources or storage, drivers can communicate with the core of the vehicle and command how to provide the requested power demand at each time step. For example, in an HEV with an engine and electric motor, the driver can use EV mode for city driving, while switching to the HEV mode for highways or wherever higher power is needed.

NLP works based on Recurrent Neural Networks (RNN) that, in comparison with the naive architecture of neural networks, each cell receives the previous cell's information in addition to its own input, as shown in Fig. 1. For text recognition, it can break down the text into sentences, assign each RNN cell to a word input, and use a vocabulary set or word dictionary for the word representation and the model training. An easy way to represent each word is to use a one-hot vector for each word, with the size of the vocabulary set represented with zeros, except the index of that specific word, which is represented by a one. With RNN, previous words' parameters in a sentence influence the learning of the current word. For EVs, it is possible to provide a particular dictionary with words with a higher frequency of usage, which may hasten the learning process.

Gated Recurrent Unit (GNU) and Long Short Term Memory (LSTM) are two architectures that can add memory to the RNN. Even if many words are separating the two related words in a sentence, these two architectures allow the network to recognize them and aid with gradient vanishing.

Fig.1 Basic RNN model

In speech recognition, the input is audio that predicts and generates the output transcript. Audio can be seen as air pressure variation over time, but audio modeling can be considered the intensity of different frequencies over time, visible on a spectrogram. By computing spectrogram features, input features can be generated, passing it through an RNN to reach the transcript. Today, researchers in academia train their models with over 300 – 3,000 hours of data sets, with end-to-end deep learning networks, although this number can go up to 100,000 hours for commercial systems. Usually, a bidirectional LSTM or GRU network with more than one layer will be selected for the speech recognition architecture. One crucial feature of speech recognition is enabling the computer with a trigger word detection system which makes it do something in response to activation words. This can be seen with technology like Alexa for Amazon Echo, Okay Google for Google Home, or Hey Siri for Apple Siri. The strategy is similar to the speech recognition steps. It is enough to set the target label "y" to zero for each RNN cell output, corresponding to not detecting the trigger word. Right after receiving the trigger word, set the target label for that cell to one.

Fig. 2: Spectrogram of an audio recording [1].

The color in the spectrogram shows the degree to which different frequencies are present (loud) in the audio at different points in time.
Green means a certain frequency is more active or more present in the audio clip (louder).
Blue squares denote less active frequencies.
The dimension of the output spectrogram depends upon the hyperparameters of the spectrogram software and the length of the input.

All the concepts mentioned can be distributed through a cloud-based system, and the vehicles or run independently in each vehicle. Several other abilities can be added to the vehicle with a cloud-based system, such as a third party can interact and control the vehicle remotely during an emergency situation by talking or commanding the vehicle. Passengers can also play virtual games with each other in a vehicle or play with other cars through the cloud. With 5G, these features can be all happening in real-time. All in all, NLP is one of the hot topics in ML, and there is no surprise to see that more gravely in the near future of EVs.

Reference:

[1] – “Deep Learning Specialization” by Andrew Ng

Bình luận