research highlight

with background image

AI-driven decision-making

My research develops AI-driven approaches to decision-making, with a particular emphasis on trustworthy and responsible learning methods. This requires an interdisciplinary approach spanning several fields including machine learning, operations research, and statistics.

Research philosophy and methodology

The standard algorithmic development paradigm for decision-making relies on theoretical performance guarantees and as a result, often ignore important operational constraints or are non-performant in practice. While theoretical insights can provide invaluable principles, their successful operationalization requires recognizing and internalizing the limitations of crude approximations and unverifiable assumptions we put in place for mathematical convenience.

My research group identifies central intellectual bottlenecks in real-world problems, and resolve them by building computational and data-centric foundations borne out of mathematical principles. Our research methodology aims to connect two disparate yet complementary worldviews:

  • theoretical and computational tools from statistical learning, optimization, applied probability, and casual inference
  • rigorous empirical benchmarking practices arising from the AI research community’s data-centric approach.

I take inspiration from Von Neumann’s perspective on mathematical sciences, which I paraphrase below:

As a mathematical discipline travels far from its empirical source only indirectly inspired from ideas coming from 'reality', it is beset with grave dangers that it will develop along the line of least resistance and become more and more purely aestheticizing. This need not be bad if the discipline is under the influence of researchers with an exceptionally well-developed taste, but the only general remedy is the rejuvenating return to the source: the reinjection of directly empirical ideas. I am convinced that this is a necessary condition to conserve the freshness and the vitality of the subject, and that this will remain so in the future. (Click here for the [full article](/assets/pdf/TheMathematician.pdf).)

I am fortunate to be able to learn from the well-developed taste of my wonderful colleagues. Concurrent to this personal education, I (try to) inject empirical ideas to formulate research directions to increase the impact of my research.

I. Trustworthy AI

Modern data collection systems acquire data from heterogeneous sources, and classical approaches that optimize average-case performance yield brittle AI systems. They fail to - make good predictions on underrepresented groups - generalize to new environments, even those similar to that seen during training - be robust to adversarial examples and long-tailed inputs. Yes, even the largest models trained on the entirety of the internet! Despite recent successes, lack of understanding on the failure modes of AI systems highlights the need for models that i) reliably work and ii) rigorous evaluation schemes and diagnostics that maintain their quality. We take a holistic “industrial engineering” view of AI systems, studying them from data collection to deployment & monitoring.
Process view of AI: Methodological development in ML largely focuses on model training. Taking a system-level view, my research identifies bottlenecks in the process and algorithms to resolve them.

Building a language defining distribution shifts

Different distribution shifts require different solutions. Understanding *why* model performance worsened is a fundamental step for informing subsequent methodological and operational interventions. Heterogeneity in data helps robustness, but the cost of data collection is often a binding constraint. We build a nuanced modeling language for quantifying data heterogeneity (or lack thereof), and use it to make optimally allocate limited resources in the AI production pipeline. To learn more, watch the following tutorial and take a look at the following two papers.
NeurIPS 2023 Tutorial
Selected Papers
    1. Trustworthy AI
      Jiashuo Liu*Tianyu Wang*Peng Cui, and Hongseok Namkoong
      arXiv:2307.05284 [cs.LG], 2024
      Major revision in Management Science; Conference version appeared in NeurIPS 2023

    Foundations of distributional robustness

    Our vision is to build robust and reliable learning procedures that make decisions with a guaranteed level of performance over its inputs. My Ph.D. thesis built the statistical (missing reference), and computational (Namkoong & Duchi, 2016) foundations of robust machine learning. As robustness is a central topic spanning across multiple fields, my subsequent works have developed robust algorithms for deep learning (missing reference), causal inference (missing reference), reinforcement learning (Namkoong* et al., 2020), and safety evaluation of autonomous vehicles (O’Kelly* et al., 2018). These works have led to new approaches toward fairness by characterizing fundamental connections between robustness and fairness (missing reference). Watch my talk (@ Google Brain, 2021) and the following two representative papers to learn more.
    1. Trustworthy AI
      John C. Duchi, and Hongseok Namkoong
      Annals of Statistics, 2021
    1. Trustworthy AI
      John C. Duchi, Tatsunori Hashimoto, and Hongseok Namkoong
      Operations Research, 2022


    II. Computational frameworks for decision-making

    Decision-making problems in OR/MS concerns the optimal allocation of scarce resources. We build scalable computational frameworks for learning operational decisions by leveraging i) auto-differentiable simulators, and ii) empirically rigorous benchmarking. Our goal is to build a algorithmic development paradigm based on computation rather than theoretical approximations.

    Adaptive experimentation at scale

    Adaptive data collection can significantly improve data efficiency. Standard algorithms are primarily designed to satisfy good upper bounds on their performance (regret bounds), but do not model important operational constraints and are challenging to implement due to infrastructural/organizational difficulties. Instead of the typical theory-driven paradigm, we leverage computational tools and empirical benchmarking for algorithm development. Our proposed framework models practical instances in online platforms and social networks involving a handful of reallocation epochs in which outcomes are measured in batches.
      1. AI-driven Decisions
        Ethan Che, and Hongseok Namkoong
        arXiv:2303.11582 [cs.LG], 2023
        Major revision in Operations Research
      TLDR; Starting with my one-year stint at Meta's adaptive experimentation team, I've been pondering on how bandit algorithms are largely designed by theoreticians to achieve good regret bounds and are rarely used in practice due to the difficulty of implementation and poor empirical performance. In this work, we focus on underpowered, short-horizon, and large-batch problems that typically arise in practice. We use large batch normal approximations to derive an MDP formulation for deriving the optimal adaptive design. Our optimization formulation allows the use of standard computational tools in ML for designing adaptive algorithms. Our approach significantly improves statistical power over standard methods, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings that are difficult for standard methods, involving limited adaptivity, low signal-to-noise ratio, and unknown reward distributions.
      To model short-horizon problems, we must design algorithms that optimize instance-specific constants, instead of relying on regret bounds that only hold in the large horizon limit. We develop a computation-driven adaptive experimentation framework that can flexibly handle batching. Our main observation is that normal approximations, which are universal in statistical inference, can also guide the design of adaptive algorithms. Instead of the typical theory-driven paradigm, we use PyTorch and empirical benchmarking for algorithm development.

      Auto-differentiable discrete event simulation

      Typical operational decision-making problems (e.g., queueing, inventory management) are often distinguished by two characteristics: i) the dynamics of the system are relatively simple and ii) action space is combinatorially large. Despite their flexibility, black-box reinforcement learning methods are unreliable and require prohibitive amounts of data. We develop auto-differentiable simulators that can directly optimize policies at scale and showcase the promise of this algorithmic development paradigm on the benchmark problems we develop.


      III. Robust off-policy learning

      Off-policy methods can learn sequential decision policies using the rich reservoir of previously collected (non-experimental / observational) data. Although engineered approaches to off-policy learning have seen much progress—based on deep learning and simulation optimization—they often produce unreliable policies due to their heuristic nature. For these methods to bear fruit and transform applications where experimentation is costly, it is important to avoid deploying policies whose safety cannot be verified. Engineering progress is predicated on rigorous empirical evaluation. While prediction models can be easily evaluated on previously collected data, assessing decision-making performance requires counterfactual reasoning. Traditional modeling assumptions that allow adjusting prediction models to learn counterfactuals rarely hold in practice. The growth in the nominal volume of data is no panacea: observed data typically only covers a portion of the state-action space, posing challenges in counterfactual learning. Concomitant to unseen data sparsity, shifts in the data distribution are common. Observed decisions depend on unrecorded confounders, and learning good policies requires causal reasoning. Marginalized demographic groups are severely underrepresented; for example, among 10000+ cancer clinical trials the National Cancer Institute funds, fewer than 5% of participants were non-white. Our existing statistical language falls woefully short as it relies on unverifiable (and often false) assumptions, and we lack diagnostics that can identify failure modes. We develop data analysis tools that can i) guarantee robust scientific findings and perhaps more importantly, ii) fail in expected ways by highlighting the fundamental epistemic uncertainty in the data.

      External validity

      While large-scale randomized studies offer a “gold standard” for internal validity, their external validity can be called into question over spatiotemporal changes in the population, particularly when the treatment effect is heterogeneous across the population. The ACCORD and SPRINT trials offer a prominent example: they studied the effect of treatments to lower blood pressure on cardiovascular disease, but reached opposite conclusions despite exceptionally large sample sizes (5-10K). The mechanism behind the difference could not be explained by experts even ex-post. We develop new methods to assess and improve the external validity of RCTs. In particular, we develop sensitivity analysis frameworks that allows researchers to assess the extent to which existing experiments inform the treatment effect in a new target site and quantify an expected range of the policy effect for each new site.
      1. Robust Causality
        Sookyo Jeong, and Hongseok Namkoong
        arXiv:2007.02411 [stat.ML], 2022
        Short version appeared in Conference on Learning Theory 2020; Major revision in Management Science

      Unobserved confounding

      Off-policy methods can learn decision policies using the rich reservoir of previously collected (observational) data. A universal assumption that enable counterfactual reasoning requires observed decisions do not depend on any unrecorded confounders that simultaneously affect future states/rewards. This condition is frequently violated in medicine, e-commerce, and public policy, e.g., emergency department patients often do not have an existing record in the hospital’s electronic health system, leaving essential patient-specific information unobserved in subsequent counterfactual analysis. In the presence of unobserved confounding, even with large samples, it is impossible to precisely estimate the performance of the evaluation policy. To guard against spurious counterfactual evaluations, we propose a worst-case approach where we first posit a realistic notion of bounded unobserved confounding that limits the influence of unrecorded variables on observed decisions and develop corresponding worst-case bounds on the reward.
      1. Robust Causality
        Steve YadlowskyHongseok Namkoong, Sanjay Basu, John Duchi, and 1 more author
        Annals of Statistics, 2022

      Unforeseen data sparsity

      A central challenge in observational analysis is that the effective sample size is difficult to gauge. Even when a nominally large dataset is collected, the effective sample size may be prohibitively small when there is little overlap between trajectories seen under the data-generating and proposed policies. Data sparsity becomes more pronounced in modern problems that involve high-dimensional covariates representations; causal identification becomes difficult on parts of the covariate space with limited effective sample size. Existing observational methods are only valid in the large sample limit and silently fail in practical instances, where the effective sample size is limited. We propose a new inferential framework that provides always-valid uncertainty quantification. Unlike asymptotic methods, we quantify instance-specific uncertainty that accurately scales with the level of overlap Our proposed counterfactual evaluation paradigm i) provides always-valid uncertainty estimates, spurring engineering progress through rigorous empirical evaluations, and ii) guides the optimal design of experiments based on previously collected (observational) data.


      1. AI-driven Decisions
        Gilbert Yang , Yaqin Chen, Thomson Yen, and Hongseok Namkoong
        arXiv:2511.22130 [cs.LG],
      2. AI-driven Decisions
        Tommaso Castellani, Naimeng YeDaksh MittalThomson Yen, and 1 more author
        arXiv:2511.09572 [cs.AI], 2025
      3. Trustworthy AI
        Yuanzhe Ma, and Hongseok Namkoong
        arXiv:2511.22003 [stat.ML], 2025
      4. AI-driven Decisions
        Yanlin QuHongseok Namkoong, and Assaf Zeevi
        arxiv:2510.07208 [cs.LG], 2025
      5. AI-driven Decisions
        Daksh Mittal, Shunri Zheng, Jing Dong, and Hongseok Namkoong
        arxiv:2509.05839 [cs.LG], 2025
      6. Trustworthy AI
        Jiashuo LiuTianyu WangHenry LamHongseok Namkoong, and 1 more author
        arXiv:2505.23565 [cs.LG], 2025
      7. Trustworthy AI
        Yuta Kobayashi, Zilin Jing, Jiayu Yao, Hongseok Namkoong, and 1 more author
        arxiv:2510.12624 [cs.LG], 2025
      8. AI-driven Decisions
        Liang Hu, Jianpeng Jiao, Jiashuo Liu, Yanle Ren, and 19 more authors
        In Proceedings of the Fourteenth International Conference on Learning Representations , 2026
      9. AI-driven Decisions
        Jimmy Wang*, Thomas Zollo*Richard Zemel, and Hongseok Namkoong
        In Proceedings of the 42nd International Conference on Machine Learning , 2025
        Best Paper Award at ICLR 2025 Workshop on Quantify Uncertainty and Hallucination in Foundation Models
      10. Trustworthy AI
        In Advances in Neural Information Processing Systems 38 , 2025
      11. Trustworthy AI
        Ang LiHaozhe ChenHongseok Namkoong, and Tianyi Peng
        In Advances in Neural Information Processing Systems 38 , 2025
      12. AI-driven Decisions
        In Advances in Neural Information Processing Systems 38 , 2025
      13. Trustworthy AI
        Daksh Mittal*Yuanzhe Ma*Shalmali Joshi, and Hongseok Namkoong
        arXiv:2502.06076 [stat.ML], 2025
        Journal version under review; Conference version appeared in NeurIPS 2024
      14. AI-driven Decisions
        Kelly ZhangTiffany CaiHongseok Namkoong, and Daniel Russo
        In Advances in Neural Information Processing Systems 38 , 2025
      15. AI-driven Decisions
        Shalmali Joshi, Iñigo Urteaga, Wouter A C Amsterdam, George Hripcsak, and 16 more authors
        Journal of the American Medical Informatics Association, 2025
      16. Trustworthy AI
        Brian Hsu, Cyrus DiCiccio, Natesh Sivasubramoniapillai, and Hongseok Namkoong
        In Proceedings of the Algorithmic Fairness Through the Lens of Metrics and Evaluation , 2024
      17. Trustworthy AI
        Yibo Zeng*Jiashuo Liu*Henry Lam, and Hongseok Namkoong
        arXiv:2410.07395 [cs.LG], 2024
      18. AI-driven Decisions
        Haozhe ChenAng LiEthan CheTianyi Peng, and 2 more authors
        In Advances in Neural Information Processing Systems 37, Datasets and Benchmark Track , 2024
      19. AI-driven Decisions
        Ethan CheJing Dong, and Hongseok Namkoong
        arXiv:2409.03740 [cs.LG], 2024
        Major revision in Operations Research
      20. Trustworthy AI
        Thomas Zollo*Andrew Siah*Naimeng YeAng Li, and 1 more author
        In In International Conference on Learning Representations, 2025 , 2025
      21. AI-driven Decisions
        Ethan CheDaniel JiangHongseok Namkoong , and Jimmy Wang
        arXiv:2408.04570 [cs.LG], 2024
        Selected for oral presentations at the Econometric Society Interdisciplinary Frontiers: Economics and AI+ML conference and Conference on Digital Experimentation
      22. AI-driven Decisions
        Jimmy Wang, Ethan CheDaniel Jiang, and Hongseok Namkoong
        arXiv:2408.04531 [cs.LG], 2024
      23. Trustworthy AI
        Naimeng Ye, and Hongseok Namkoong
        arXiv:2408.03307 [stat.ML], 2024
      24. Robust Causality
        Tiffany Cai*Yuri Fonseca*, Kaiwen Hou, and Hongseok Namkoong
        arXiv:2405.09493 [stat.ML], 2024
      25. AI-driven Decisions
        Tiffany CaiHongseok NamkoongDaniel Russo, and Kelly Zhang
        arXiv:2405.19466 [cs.LG], 2025
        Selected for presentation at the Econometric Society Interdisciplinary Frontiers: Economics and AI+ML conference
      26. AI-driven Decisions
        Jiung Lee, Hongseok Namkoong, and Yibo Zeng
        arXiv:2406.06855 [math.OC], 2024
        Major revision in Management Science; Selected for presentation at SIG day 2025
      27. Trustworthy AI
        Jiashuo Liu*Tianyu Wang*Peng Cui, and Hongseok Namkoong
        arXiv:2307.05284 [cs.LG], 2024
        Major revision in Management Science; Conference version appeared in NeurIPS 2023
      28. AI-driven Decisions
        Ethan Che, and Hongseok Namkoong
        arXiv:2303.11582 [cs.LG], 2023
        Major revision in Operations Research
      29. Trustworthy AI
        Tiffany CaiHongseok Namkoong, and Steve Yadlowsky
        Operations Research, 2025
        Conference version appeared Symposium on Foundations of Responsible Computing 2023
      30. Robust Causality
        Ari BoyarskyHongseok Namkoong, and Jean Pouget-Abadie
        arXiv:2305.10728 [stat.ME], 2023
        Conference version appeared in ACM conference on Economics and Computation
      31. AI-driven Decisions
        Hongseok Namkoong*Samuel Daulton*, and Eytan Bakshy
        Transactions on Machine Learning Research, 2026
        Selected for an oral presentation at the Neurips 2020 OfflineRL Workshop
      32. Robust Causality
        Sookyo Jeong, and Hongseok Namkoong
        arXiv:2007.02411 [stat.ML], 2022
        Short version appeared in Conference on Learning Theory 2020; Major revision in Management Science
      33. Trustworthy AI
        Mike Li, Daksh MittalHongseok Namkoong, and Shangzhou Xia
        arXiv:2407.01316 [cs.LG], 2024
        Major revision in Mathematics of Operations Research; Short version appeared in NeurIPS 2021
      34. Trustworthy AI
        Hongseok Namkoong*Yuanzhe Ma* , and Peter W. Glynn
        arXiv:2212.06338 [stat.ML], 2025
        To appear in Operations Research
      35. Trustworthy AI
        Mitchell WortsmanGabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, and 7 more authors
        In Proceedings of the 39th International Conference on Machine Learning , 2022
      36. Trustworthy AI
        Mitchell Wortsman*Gabriel Ilharco*, Jong Wook Kim , Mike Li, and 7 more authors
        In Proceedings of the 32nd IEEE Conference on Computer Vision and Pattern Recognition , 2022
        CVPR Best Paper Finalist
      37. Robust Causality
        Steve YadlowskyHongseok Namkoong, Sanjay Basu, John Duchi, and 1 more author
        Annals of Statistics, 2022
      38. Robust Causality
        Hongseok Namkoong*, Ramtin Keramati*, Steve Yadlowsky*, and Emma Brunskill
        In Advances in Neural Information Processing Systems 33 , 2020
      39. Trustworthy AI
        John C. Duchi, Tatsunori Hashimoto, and Hongseok Namkoong
        Operations Research, 2022
      40. Trustworthy AI
        John C. Duchi, and Hongseok Namkoong
        Annals of Statistics, 2021
      41. Trustworthy AI
        John C. Duchi , Peter W. Glynn, and Hongseok Namkoong
        Mathematics of Operations Research, 2021
        APS Best Student Paper Prize
      42. Trustworthy AI
        Aman Sinha*, Hongseok Namkoong*, Riccardo Volpi, and John Duchi
        In International Conference on Learning Representations , 2018
        Selected for a full oral presentation; 2% of submissions
      43. Trustworthy AI
        John C. Duchi, and Hongseok Namkoong
        Journal of Machine Learning Research, 2019
        Conference version won NeurIPS 2017 Best Paper Award
      44. Trustworthy AI
        Riccardo Volpi*, Hongseok Namkoong*John Duchi, Vittorio Murino, and 1 more author
        In Advances in Neural Information Processing Systems 31 , 2018
      45. Trustworthy AI
        Mathew O’Kelly*, Aman Sinha*, Hongseok Namkoong*John Duchi, and 1 more author
        In Advances in Neural Information Processing Systems 31 , 2018
      46. Trustworthy AI
        Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang
        In International Conference on Machine Learning , 2018
        Best Paper Runner-up Award
      47. Hongseok Namkoong, Aman Sinha, Steve Yadlowsky , and John C Duchi
        In International Conference on Machine Learning , 2017
      48. Trustworthy AI
        Hongseok Namkoong , and John C. Duchi
        In Advances in Neural Information Processing Systems 29 , 2016

      References

      1. Trustworthy AI
        Hongseok Namkoong , and John C. Duchi
        In Advances in Neural Information Processing Systems 29 , 2016
      2. Robust Causality
        Hongseok Namkoong*, Ramtin Keramati*, Steve Yadlowsky*, and Emma Brunskill
        In Advances in Neural Information Processing Systems 33 , 2020
      3. Trustworthy AI
        Mathew O’Kelly*, Aman Sinha*, Hongseok Namkoong*John Duchi, and 1 more author
        In Advances in Neural Information Processing Systems 31 , 2018