Data Pooling in Stochastic Optimization for Panel Data

September 28, 2022 10:30 AM Singapore (Registration will open at 10:20 AM.)

Join Zoom Meeting:

Meeting ID: 812 1481 7601
Passcode: 016836


Very often, modern datasets in operations research have a panel structure — we observe data for many of distinct “units”, but for each unit we observe only a handful of relevant data points. As an example, consider a large online retailer where we observe data from thousands of distinct products, but each product typically has only a few sales. The dominant intuition when solving stochastic optimization problems in such settings is that we should “learn from similar units”, e.g., we might use covariates to cluster similar units and pool their data together when solving optimization problems. This intuition (in some form or the other) pervades most modern approaches to contextual stochastic optimization. Conversely, this intuition also suggests that if units are not “similar” in any way, aggregating data can only hurt us and we might as well treat each unit separately.

In the first part of this talk, we show that this intuition is false by exploring the so-called “data pooling phenomenon”. We prove that combining data across units can yield optimization solutions that outperform decoupling, even when there is no a priori structure linking the units and data are drawn independently. Our approach does not require strong distributional assumptions and applies to constrained, possibly non-convex, non-smooth optimization problems such as vehicle-routing, economic lot-sizing or facility location. We compare and contrast our results to a similar phenomenon in statistics (Stein’s Phenomenon), highlighting unique features that arise in the optimization setting that are not present in estimation. We further prove that as the number of problems grows large, Shrunken-SAA learns if pooling can improve upon decoupling and the optimal amount to pool, even if the average amount of data per problem is fixed and bounded. Importantly, we highlight a simple intuition based on stability that highlights when and why data-pooling offers a benefit, elucidating this perhaps surprising phenomenon.

In the second part of this talk, we discuss some ongoing work about how Shrunken-SAA can be used to exploit the special structure of panel data to simultaneously exploit the data-pooling phenomenon and also learn from similar units (when covariates) are informative. We provide some preliminary describing when this approach outperforms traditional approaches to contextual stochastic optimization and outline future directions.

This talk represents joint work with Nathan Kallus, Assistant Professor in the School of Operations Research and Information Engineering and Cornell Tech at Cornell University.

About the Speaker

Vishal Gupta is an Associate Professor of Data Sciences and Operations at the USC Marshall School of Business. Because of his research interests and expertise, he also holds a courtesy appointment in USC Viterbi’s School of Engineering in Industrial and Systems Engineering, and is an affiliate faculty with USC’s Center for AI and Society. Before joining USC, Vishal Gupta completed his B.A. in Mathematics and Philosophy at Yale University, graduating Magna Cum Laude with honors, and completed Part III of the Mathematics Tripos at the University of Cambridge with distinction. He then spent four years working as a “quant” in finance at Barclays Capital, focusing on commodities modelling, derivatives pricing, and risk management. Vishal has received a number of recognitions for his work, including the Wagner Prize for Excellence in the Practice of Advanced Analytics and Operations Research, the Pierskalla Best Paper Prize, the Jagdish Sheth Impact of Research on Practice Award.

For more information about the ESD Seminar, please email


Vishal Gupta (USC Marshall School of Business) - Data Pooling in Stochastic Optimization for Panel Data