Deep Reinforcement Learning for Autonomous Vehicles

Deep Reinforcement Learning for Autonomous Vehicles

A review of Machine Learning methods used for end-to-end autonomous vehicle path planning

Reinforcement Learning (RL) methods teach an agent, a vehicle, to learn how to drive in an environment such that it maximizes successful planning to its destination and minimizes unwanted situations. The model iteratively learns to plan better routes, until it can’t, based on rewards and punishments.

The complex environment has various dynamic variables — other vehicles, pedestrians and roadworks — so it is difficult to apply supervised learning, which predicts outputs based on features in a dataset.

There are 3 steps in the model — recognition, prediction, and planning. Recognition involves identifying components, such as pedestrians, traffic signs and track marks, of the surrounding environment. ML/AI algorithms have reached human-level recognition.

Then, internal models that predict the future states of the environment are constructed. These include building a map of the environment or tracking an object. Planning aggregates information from the recognition and prediction phases and plans a future sequence of driving actions.

Deep Reinforcement Learning methods combine neural nets with the reward-maximization and punishment-minimization RL principles. These methods have demonstrated human-level control in Atari games. Deep Learning (DL) is used for representational learning of the environment and RL is used for planning. These methods include Deep Q Networks (DQN) and Deep Deterministic Actor Critic (DDAC).

Q-Learning calculates actions for every state and in each step, the agent takes an action following a policy. It then observes the next state and the reward received from the environment. In DQN, given the large number of states, or possible scenarios, in autonomous vehicle movement, Q-functions are formulated as a paramterized function of the states — Q(s,a,w). s is the state, a is the action, and w is a parameter that is determined as a part of minimizing the Mean Square Error (MSE) of Q-function values using Gradient-based methods. The solution involves finding the most optimal value of w.

Deep Deterministic Actor Critic (DDAC) method involves learning two functions to represent continuous actions: 1) the actor, which provides the policy mapping a state to an action, and 2) the critic, which evaluates the action taken in a state.

Incorporating Recurrent Neural Nets (RNNs), which are suitable for prediction on sequential data, enables cars to handle partially observable scenarios. Deep Attention RL methods optimize recognition tasks by focusing on only a part of the features extracted by the neural nets. Attention models reduce the computation and memory complexity for deployment on the embedded hardware by focusing on relevant information.

The proposed end-to-end autonomous driving model has inputs, which are the states of the environment and their aggregations over time, and outputs, the driving actions. The sensors — camera, LIDAR, etc — capture the environment state, which may include objects, their locations, orientations, movements, dimensions, etc. The neural nets take care of weighing each sensor feature according to its relevance in minimizing cost.

This framework is tested on Simulated Car Racing (SCR) software, which gives access to car controls like steering, velocity, acceleration, and brakes, and car states like position, velocity, acceleration, and fuel level. The sensor reports the positions of track borders. The outputs manipulate the steering, gear, acceleration and brake values. The network is trained following the DQN objective and DDAC’s continuous policy smooths the actions and provides better performance.

I summarize the paper here. The model can be seen in action here.

More content at plainenglish.io