Wednesday, June 15, 2022

REAL-WORLD REINFORCEMENT LEARNING

Often in reinforcement learning research, “real-life” tasks are linked to robotics and self-driving cars. However, there is a much broader range of problems that are yet to be fully solved and require less investment into special hardware as, for example, robotics would.

We argue that whether a task is “real-life” or not is a spectrum rather than a binary classification.

A toy problem, such as an abstract grid world with a simple agent, can be very useful for understanding reinforcement learning, but could not be directly used in any problem that someone outside of computer science would be interested in.

A simulated problem can encompass a wider range of tasks. This could be a simple simulator of a pole-balancing system that a researcher constructed themselves based on simple physical equations. Or it could be a sophisticated simulation that mimics a real-world task, such as using a data-driven approach to model an oil well. When thinking about how “real” such a task is, one may consider whether the simulation is based on data, how well it mimics the true system, or whether the simulation may have been designed to highlight a reinforcement learning algorithm’s success (rather than the other way around).

A virtual problem is the setting when a virtual task is the true task. For instance, when competing in video game tournaments or trading stocks, there is no physical implementation. Instead, the true problem is fully virtual, but actions do have real-life consequences.

Finally, a physical problem is one where the agent takes actions that have physical consequences. On the simpler side, this could be maneuvering a robot in a controlled lab setting. On the more complex side, this could involve coordinating multiple self-driving cars through a busy intersection.

A high-fidelity simulation, a virtual problem that has real-life impacts, or a method to optimize a chemical plant over time are all examples that are on the “real-world tasks” end of the spectrum. Similarly, a toy problem or a simple robot arm stacking blocks in a lab are less “real world.” We would also argue that the task could either be control or evaluation (e.g., reinforcement learning can be used to judge the quality of a laser weld, which is also a sequential decision task).

In some cases, reinforcement learning may only be a small component in a bigger system. For example, it could be used to tune the parameters for control engineering within a prediction problem or control a single thermostat. Alternatively, it can be used as an end to end replacement for a problem that either naturally works well for reinforcement learning or can be adapted such that doing so has advantages (e.g., automation). In both cases, it is incredibly important to understand both the limitations and advantages of introducing reinforcement learning to known problems which will be decided by both the data available and real-world politics (e.g., ensuring ethical automation practices).

Share:

0 comments:

Post a Comment