Flow: Deep Reinforcement Learning for Traffic Control with Autonomous Vehicles
I created and led a team which studies the potential of employing deep Reinforcement Learning (RL) to improve traffic flow by controlling a fraction of vehicles on the road. Whereas a vast majority of autonomous vehicle (AV) research focuses on situations with either 100% AVs or just a single AV navigating the world, we study the long arduous transition of partial adoption of AVs. Through a series of increasingly complex traffic control experiments using RL, we learned that a small fraction of AVs (5-10%) can cause a significant improvement in traffic congestion and travel times (by 40-150%, depending on the situation). Our early work suggests that effects of AVs, whether positive or negative, may be felt by society much sooner than expected.
Our work is open source, and is supported by Amazon, NSF, and DOE. Our project website contains our full team, publications, and tutorials. Our work has been featured by Science, Berkeley College of Engineering, abc News, Berkeley Lab, India Times, and Russian Forbes.
Flow: Architecture and Benchmarking for Reinforcement Learning in Traffic Control
Cathy Wu, Aboudy Kreidieh, Kanaad Parvate, Eugene Vinitsky, Alexandre Bayen
IEEE Transactions on Robotics (T-RO). In review.
arXiv / videos / github / project page
Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion
Eugene Vinitsky, Kanaad Parvate, Abdul Rahman Kreidieh, Cathy Wu, Alexandre Bayen
IEEE Intelligent Transportation Systems Conference (ITSC), 2018.
Scalable deep reinforcement learning
In deep reinforcement learning, policy gradient methods are employed for their tremendous flexibility and success in complex decision making tasks, ranging from games to robotic manipulation to our work in traffic control. However, these methods suffer from high variance of gradient estimates, which translates to poor sample complexity and high computational cost. Moreover, the high variance problem is particularly exasperated in problems with long horizons or high-dimensional action spaces, both characteristics of urban systems. To mitigate this issue, we explore a number of approaches to improve the scalability of RL. In particular, we introduce a new bias-free action-dependent control variate (also called baseline) for variance reduction which fully exploits the structural form of the stochastic policy itself. We demonstrate and quantify the benefit of the action-dependent baseline through both theoretical analysis as well as numerical results, including an analysis of the suboptimality of the “optimal” state-dependent baseline. The result is a computationally efficient policy gradient algorithm, which scales to high-dimensional control problems, as demonstrated by a synthetic 2000-dimensional target matching task. Our experimental results indicate that action-dependent baselines allow for faster learning on standard reinforcement learning benchmarks and high-dimensional hand manipulation and synthetic tasks. We additionally show that the general idea of including additional information in baselines for improved variance reduction can be extended to partially observed and multi-agent tasks.
Variance Reduction for Policy Gradient Using Action-Dependent Factorized Baselines
Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, Pieter Abbeel
International Conference on Learning Representations (ICLR), 2018. Oral (2%).
Deep Reinforcement Learning Symposium (NIPS), 2017. Contributed talk.
arXiv / OpenReview
Human-compatible and tractable ridesharing
Mobility is embedded in an overall socioeconomic system, and one major anticipated long-term impact of automated vehicles is induced demand, in which more people travel in response to the newly available roadway capacity. This additional demand on the mobility system may compromise the benefits in road velocity and throughput with the corresponding elevated energy consumption. We therefore study the dynamics of the overall socioeconomic system and in particular, its couplings with the mobility system. To this end, in collaboration with Microsoft, we investigate human mobility preferences based on a user study of employees at a major technology corporation. We identify ridesharing as a promising design paradigm within the mobility system, with the potential to mitigate the effects of induced demand by dramatically improving the throughput (supply). We propose that, with lightly modified existing infrastructure and, crucially, taking into account complex human factors, ridesharing has the potential to dramatically improve (nearly triple) the throughput of the mobility system — in particular through the effective use of high-occupancy vehicle (HOV) lanes. We propose algorithms to solve the allocation problem. In particular, the structure of the ridesharing problem motivates the adaptation of clustering algorithms from machine learning for set partitioning in the combinatorial optimization framework. Our work suggests that a careful synthesis of understanding human behavior, selective changes to the overall system design, and new algorithms can lead to promising advances in urban mobility.
Learning and Optimization for Mixed Autonomy Systems – A Mobility Context
Chapter 8: Human mobility preferences.
Thesis. PhD, Electrical Engineering and Computer Sciences, UC Berkeley, 2018.
 Optimizing the diamond lane: A more tractable carpool problem and algorithms
Cathy Wu, K. Shankari, Ece Kamar, Randy Katz, David Culler, Christos Papadimitriou, Eric Horvitz, Alexandre Bayen
IEEE Intelligent Transportation Systems Conference (ITSC), 2016.
proceedings / pdf
Cellpath: Urban-scale Traffic State Estimation using Cellular Network Data
We cannot control what we cannot measure. Traffic flow estimation is notoriously difficult due to the shortage, unreliability, and cost of sensors such as induction coils embedded underneath roadways. We therefore explored the possibility of using cellular network information to improve estimation of traffic flow in urban-scale networks. We devised a new convex optimization framework and algorithm to exploit the unique structure of cellular network data, in particular its simplex structure. The accuracy, computational efficiency, and versatility of the proposed approach are validated on the I-210 corridor near Los Angeles. We achieve 90% route flow accuracy (as compared to a 50% baseline) with 1033 traffic sensors and 1000 cellular towers covering a large network of highways and arterials with more than 20,000 links. Due to the high accuracy, our work may enable new short-time horizon traffic applications concerning prediction, control, and operations. Our system is open source and was a collaboration with AT&T.
Cellpath: fusion of cellular and traffic sensor data for route flow estimation via convex optimization
Cathy Wu, Jerome Thai, Steve Yadlowsky, Alexei Pozdnoukhov, Alexandre Bayen
Transportation Research: Part C, 2015.
International Symposium on Transportation and Traffic Theory (ISTTT), 2015. Oral (14%).
journal / pdf / github (system) / github (algorithm)
Block simplex signal recovery: a method comparison and an application to routing
Cathy Wu, Alexei Pozdnoukhov, Alexandre Bayen
IEEE Transactions on Intelligent Transportation Systems (T-ITS), 2019.