During this seminar, i'll present two topics :
Real-world environments (e.g. industrial robots or power grids) are generally designed with safety constraints in mind implemented in the shape of valid actions masks or contingency controllers. For example, the range of motion and the angles of the motors of a robot can be limited to physical boundaries. Violating constraints thus results in rejected actions or entering in a safe mode driven by an external controller, making RL agents incapable of learning from their mistakes.
In the first part of the talk, i'll describe how to modify Deep Q-learning to take into account this type of information and to avoid constraints violations.
In the second part of the talk, i'll talk about Instruction Following, a task where an agent need to accomplish an objective formulated in natural language. We propose a method to accelerate language understanding and increase Deep Reinforcement Learning sample efficiency, called HIGHER.