Navigation Demonstration Flips the Script on Machine Learning in Naturalistic Scenarios

Intelligence, surveillance, and reconnaissance (ISR) missions frequently analyze activity-based intelligence about routine patterns of life, such as the daily movement of personnel and vehicles. Automating this time-intensive task would enable the reallocation of human analysis. Training machine-learning (ML) agents to analyze even tightly defined slices of the real world requires massive input of rules from human trainers.

The SEI recently released a demonstration of an ML method that allows ML agents to learn the norms of naturalistic behavior on its own. This inverse reinforcement learning (IRL) method predicts behavior and detects anomalies in open-world, naturalistic scenarios.

A common way to train an ML agent to complete a task is with reinforcement learning. For example, a programmer might teach a virtual, ML skeleton to walk by supplying rewards for certain outcomes, such as for forward movement, and penalties for others, such as falling, but no examples of correct behaviors. The ML agent tries many different behavior policies, keeps the ones reinforced by rewards, and eventually learns to walk. Some tasks, like autonomous driving, have far too many reward conditions—staying in the lane, stopping for pedestrians, and so on—for the programmer to re-create.

Inverse reinforcement learning flips the reinforcement-learning approach: programmers give the agent a dataset of policies, but no reward or penalty conditions. The agent observes the policies and infers the underlying rewards and penalties. Those are then fed into a reinforcement learning scheme to teach the agent the rewarded policies. The agent can then act and react in similar situations and, crucially, generalize to novel scenarios, too.

2020_Navigation Demonstration Flips the Script on Machine Learning in Naturalistic Scenarios — The SEI demo maps ship trajectories in New York Harbor.

The SEI’s demonstration of IRL used trajectory data of ships approaching New York Harbor to train an ML agent to plot its own course from anywhere nearby. Any trajectory might be influenced by an indefinite number of rewards. To narrow the field, the SEI told the model to commit most to policies that match observed trajectories and assume the least about trajectories it has not observed. This principle of maximum causal entropy (MCE) helps to keep the agent from drowning in a sea of potential actions and allows the IRL problem to be solvable.

Senior machine learning research scientist Eric Heim and his colleagues at the SEI say the IRL and MCE combination is well-suited to other naturalistic scenarios, such as satellite surveillance of areas where normal behaviors are not known. An ML agent would use IRL to observe, for instance, the movement of vehicles, infer rewards for routine vehicle trajectories, predict future movements, and flag observed abnormal movements.

The benefit of using IRL for these tasks is that they’re pretty robust. Even if they haven’t seen the exact scenario before, these policies tend to generalize to a lot of different, maybe unseen tasks.

Eric Heim

Senior machine learning research scientist, SEI Emerging Technology Center

Because IRL agents mimic the way people learn new patterns of behavior, Heim also believes IRL might be used to train computerized helpers. An ML agent might observe common ways users navigate and make choices in a computer application. Using IRL, the agent could determine the most likely outcome of any choice point and suggest it to beginner users, like having an expert at their side. “The benefit of using IRL for these tasks is that they’re pretty robust,” said Heim. “Even if they haven’t seen the exact scenario before, these policies tend to generalize to a lot of different, maybe unseen tasks.”

The adaptability of IRL opens ML to a range of applications.

To learn more about IRL at the SEI, visit sei.cmu.edu/our-work/projects/display.cfm?customel_datapageid_4050=201338.