AI-driven decision-making

My research develops AI-driven approaches to decision-making, with a particular emphasis on trustworthy and responsible learning methods. This requires an interdisciplinary approach spanning several fields including machine learning, operations research, and statistics.

Research philosophy and methodology

The standard algorithmic development paradigm for decision-making relies on theoretical performance guarantees and as a result, often ignore important operational constraints or are non-performant in practice. While theoretical insights can provide invaluable principles, their successful operationalization requires recognizing and internalizing the limitations of crude approximations and unverifiable assumptions we put in place for mathematical convenience.

My research group identifies central intellectual bottlenecks in real-world problems, and resolve them by building computational and data-centric foundations borne out of mathematical principles. Our research methodology aims to connect two disparate yet complementary worldviews:

theoretical and computational tools from statistical learning, optimization, applied probability, and casual inference
rigorous empirical benchmarking practices arising from the AI research community’s data-centric approach.

I take inspiration from Von Neumann’s perspective on mathematical sciences, which I paraphrase below:

As a mathematical discipline travels far from its empirical source only indirectly inspired from ideas coming from 'reality', it is beset with grave dangers that it will develop along the line of least resistance and become more and more purely aestheticizing. This need not be bad if the discipline is under the influence of researchers with an exceptionally well-developed taste, but the only general remedy is the rejuvenating return to the source: the reinjection of directly empirical ideas. I am convinced that this is a necessary condition to conserve the freshness and the vitality of the subject, and that this will remain so in the future. (Click here for the [full article](/assets/pdf/TheMathematician.pdf).)

I am fortunate to be able to learn from the well-developed taste of my wonderful colleagues. Concurrent to this personal education, I (try to) inject empirical ideas to formulate research directions to increase the impact of my research.

I. Trustworthy AI

Modern data collection systems acquire data from heterogeneous sources, and classical approaches that optimize average-case performance yield brittle AI systems. They fail to - make good predictions on underrepresented groups - generalize to new environments, even those similar to that seen during training - be robust to adversarial examples and long-tailed inputs. Yes, even the largest models trained on the entirety of the internet! Despite recent successes, lack of understanding on the failure modes of AI systems highlights the need for models that i) reliably work and ii) rigorous evaluation schemes and diagnostics that maintain their quality. We take a holistic “industrial engineering” view of AI systems, studying them from data collection to deployment & monitoring.

Process view of AI: Methodological development in ML largely focuses on model training. Taking a system-level view, my research identifies bottlenecks in the process and algorithms to resolve them.

Building a language defining distribution shifts

Different distribution shifts require different solutions. Understanding *why* model performance worsened is a fundamental step for informing subsequent methodological and operational interventions. Heterogeneity in data helps robustness, but the cost of data collection is often a binding constraint. We build a nuanced modeling language for quantifying data heterogeneity (or lack thereof), and use it to make optimally allocate limited resources in the AI production pipeline. To learn more, watch the following tutorial and take a look at the following two papers.

NeurIPS 2023 Tutorial

Selected Papers

Trustworthy AI
On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

Jiashuo Liu*, Tianyu Wang*, Peng Cui, and Hongseok Namkoong

arXiv:2307.05284 [cs.LG], 2024

Major revision in Management Science; Conference version appeared in NeurIPS 2023

Abs arXiv Bib Video Slides Website

Different distribution shifts require different interventions, and algorithms must be grounded in the specific shifts they address. Advocating for an inductive approach to research on distributional robustness, we build an empirical testbed, "WhyShift", comprising of natural shifts across 5 tabular datasets and 60,000 model configurations encompassing imbalanced learning algorithms and distributionally robust optimization (DRO) methods. We find Y|X-shifts are most prevalent on our testbed, in stark contrast to the heavy focus on X (covariate)-shifts in the ML literature. We conduct an in-depth empirical analysis of DRO methods and find that the underlying model class (e.g., neural networks, XGBoost) and hyperparameter selection have a first-order impact in practice despite being overlooked by DRO researchers. To further bridge that gap between methodological research and practice, we design case studies that illustrate how such a refined understanding of distribution shifts can enhance both data-centric and algorithmic interventions.
@article{LiuWaCuNa24, title = {On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets}, author = {Liu$*$, Jiashuo and Wang$*$, Tianyu and Cui, Peng and Namkoong, Hongseok}, year = {2024}, journal = {arXiv:2307.05284 [cs.LG]}, url = {https://arxiv.org/abs/2307.05284}, note = {Major revision in Management Science; Conference version appeared in NeurIPS 2023}, }

Foundations of distributional robustness

Our vision is to build robust and reliable learning procedures that make decisions with a guaranteed level of performance over its inputs. My Ph.D. thesis built the statistical (missing reference), and computational (Namkoong & Duchi, 2016) foundations of robust machine learning. As robustness is a central topic spanning across multiple fields, my subsequent works have developed robust algorithms for deep learning (missing reference), causal inference (missing reference), reinforcement learning (Namkoong* et al., 2020), and safety evaluation of autonomous vehicles (O’Kelly* et al., 2018). These works have led to new approaches toward fairness by characterizing fundamental connections between robustness and fairness (missing reference). Watch my talk (@ Google Brain, 2021) and the following two representative papers to learn more.

Trustworthy AI

Learning Models with Uniform Performance via Distributionally Robust Optimization

John C. Duchi, and Hongseok Namkoong

Annals of Statistics, 2021

arXiv Bib Video Code Slides

@article{DuchiNa21,
  author = {Duchi, John C. and Namkoong, Hongseok},
  title = {Learning Models with Uniform Performance via Distributionally
                    Robust Optimization},
  year = {2021},
  volume = {49},
  number = {3},
  pages = {1378-1406},
  journal = {Annals of Statistics},
  url = {https://projecteuclid.org/journals/annals-of-statistics/volume-49/issue-3/Learning-models-with-uniform-performance-via-distributionally-robust-optimization/10.1214/20-AOS2004.full},
}

Trustworthy AI

Distributionally Robust Losses Against Mixture Covariate Shifts

John C. Duchi, Tatsunori Hashimoto, and Hongseok Namkoong

Operations Research, 2022

arXiv Bib Video Code

@article{DuchiHaNa22,
  author = {Duchi, John C. and Hashimoto, Tatsunori and Namkoong, Hongseok},
  title = {Distributionally Robust Losses Against Mixture Covariate Shifts},
  year = {2022},
  journal = {Operations Research},
  url = {https://pubsonline.informs.org/doi/10.1287/opre.2022.2363},
}

II. Computational frameworks for decision-making

Decision-making problems in OR/MS concerns the optimal allocation of scarce resources. We build scalable computational frameworks for learning operational decisions by leveraging i) auto-differentiable simulators, and ii) empirically rigorous benchmarking. Our goal is to build a algorithmic development paradigm based on computation rather than theoretical approximations.

Adaptive experimentation at scale

Adaptive data collection can significantly improve data efficiency. Standard algorithms are primarily designed to satisfy good upper bounds on their performance (regret bounds), but do not model important operational constraints and are challenging to implement due to infrastructural/organizational difficulties. Instead of the typical theory-driven paradigm, we leverage computational tools and empirical benchmarking for algorithm development. Our proposed framework models practical instances in online platforms and social networks involving a handful of reallocation epochs in which outcomes are measured in batches.

AI-driven Decisions

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning

Hongseok Namkoong*, Samuel Daulton*, and Eytan Bakshy

arXiv:2011.14266 [stat.ML], 2024

Selected for an oral presentation at the Neurips 2020 OfflineRL Workshop

arXiv Bib

@article{NamkoongDaBa24,
  title = {Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning},
  author = {Namkoong$*$, Hongseok and Daulton$*$, Samuel and Bakshy, Eytan},
  journal = {arXiv:2011.14266 [stat.ML]},
  year = {2024},
  note = {Selected for an oral presentation at the Neurips 2020 OfflineRL Workshop},
  url = {https://arxiv.org/abs/2011.14266},
}

AI-driven Decisions
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches

Ethan Che, and Hongseok Namkoong

arXiv:2303.11582 [cs.LG], 2023

Major revision in Operations Research

Abs arXiv Bib Video Slides Website

Starting with my one-year stint at Meta’s adaptive experimentation team, I’ve been pondering on how bandit algorithms are largely designed by theoreticians to achieve good regret bounds and are rarely used in practice due to the difficulty of implementation and poor empirical performance. In this work, we focus on underpowered, short-horizon, and large-batch problems that typically arise in practice. We use large batch normal approximations to derive an MDP formulation for deriving the optimal adaptive design. Our formulation allows the use of computational tools for designing adaptive algorithms, a break from the existing theory-driven paradigm. Our approach significantly improves statistical power over standard methods, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings that are difficult for standard methods, involving limited adaptivity, low signal-to-noise ratio, and unknown reward distributions.
@article{CheNa23, title = {Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches}, author = {Che, Ethan and Namkoong, Hongseok}, year = {2023}, journal = {arXiv:2303.11582 [cs.LG]}, note = {Major revision in Operations Research}, url = {https://arxiv.org/abs/2303.11582}, }

TLDR; Starting with my one-year stint at Meta's adaptive experimentation team, I've been pondering on how bandit algorithms are largely designed by theoreticians to achieve good regret bounds and are rarely used in practice due to the difficulty of implementation and poor empirical performance. In this work, we focus on underpowered, short-horizon, and large-batch problems that typically arise in practice. We use large batch normal approximations to derive an MDP formulation for deriving the optimal adaptive design. Our optimization formulation allows the use of standard computational tools in ML for designing adaptive algorithms. Our approach significantly improves statistical power over standard methods, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings that are difficult for standard methods, involving limited adaptivity, low signal-to-noise ratio, and unknown reward distributions.

To model short-horizon problems, we must design algorithms that optimize instance-specific constants, instead of relying on regret bounds that only hold in the large horizon limit. We develop a computation-driven adaptive experimentation framework that can flexibly handle batching. Our main observation is that normal approximations, which are universal in statistical inference, can also guide the design of adaptive algorithms. Instead of the typical theory-driven paradigm, we use PyTorch and empirical benchmarking for algorithm development.

Auto-differentiable discrete event simulation

Typical operational decision-making problems (e.g., queueing, inventory management) are often distinguished by two characteristics: i) the dynamics of the system are relatively simple and ii) action space is combinatorially large. Despite their flexibility, black-box reinforcement learning methods are unreliable and require prohibitive amounts of data. We develop auto-differentiable simulators that can directly optimize policies at scale and showcase the promise of this algorithmic development paradigm on the benchmark problems we develop.

III. Robust off-policy learning

Off-policy methods can learn sequential decision policies using the rich reservoir of previously collected (non-experimental / observational) data. Although engineered approaches to off-policy learning have seen much progress—based on deep learning and simulation optimization—they often produce unreliable policies due to their heuristic nature. For these methods to bear fruit and transform applications where experimentation is costly, it is important to avoid deploying policies whose safety cannot be verified. Engineering progress is predicated on rigorous empirical evaluation. While prediction models can be easily evaluated on previously collected data, assessing decision-making performance requires counterfactual reasoning. Traditional modeling assumptions that allow adjusting prediction models to learn counterfactuals rarely hold in practice. The growth in the nominal volume of data is no panacea: observed data typically only covers a portion of the state-action space, posing challenges in counterfactual learning. Concomitant to unseen data sparsity, shifts in the data distribution are common. Observed decisions depend on unrecorded confounders, and learning good policies requires causal reasoning. Marginalized demographic groups are severely underrepresented; for example, among 10000+ cancer clinical trials the National Cancer Institute funds, fewer than 5% of participants were non-white. Our existing statistical language falls woefully short as it relies on unverifiable (and often false) assumptions, and we lack diagnostics that can identify failure modes. We develop data analysis tools that can i) guarantee robust scientific findings and perhaps more importantly, ii) fail in expected ways by highlighting the fundamental epistemic uncertainty in the data.

External validity

While large-scale randomized studies offer a “gold standard” for internal validity, their external validity can be called into question over spatiotemporal changes in the population, particularly when the treatment effect is heterogeneous across the population. The ACCORD and SPRINT trials offer a prominent example: they studied the effect of treatments to lower blood pressure on cardiovascular disease, but reached opposite conclusions despite exceptionally large sample sizes (5-10K). The mechanism behind the difference could not be explained by experts even ex-post. We develop new methods to assess and improve the external validity of RCTs. In particular, we develop sensitivity analysis frameworks that allows researchers to assess the extent to which existing experiments inform the treatment effect in a new target site and quantify an expected range of the policy effect for each new site.

Robust Causality

Assessing External Validity via Worst-case Subpopulation Treatment Effects

Sookyo Jeong, and Hongseok Namkoong

arXiv:2007.02411 [stat.ML], 2022

Short version appeared in Conference on Learning Theory 2020

arXiv Bib Video Code Slides

@article{JeongNa22,
  title = {Assessing External Validity via Worst-case Subpopulation Treatment Effects},
  author = {Jeong, Sookyo and Namkoong, Hongseok},
  journal = {arXiv:2007.02411 [stat.ML]},
  year = {2022},
  note = {Short version appeared in Conference on Learning Theory 2020},
  url = {https://arxiv.org/abs/2007.02411},
}

Unobserved confounding

Off-policy methods can learn decision policies using the rich reservoir of previously collected (observational) data. A universal assumption that enable counterfactual reasoning requires observed decisions do not depend on any unrecorded confounders that simultaneously affect future states/rewards. This condition is frequently violated in medicine, e-commerce, and public policy, e.g., emergency department patients often do not have an existing record in the hospital’s electronic health system, leaving essential patient-specific information unobserved in subsequent counterfactual analysis. In the presence of unobserved confounding, even with large samples, it is impossible to precisely estimate the performance of the evaluation policy. To guard against spurious counterfactual evaluations, we propose a worst-case approach where we first posit a realistic notion of bounded unobserved confounding that limits the influence of unrecorded variables on observed decisions and develop corresponding worst-case bounds on the reward.

Robust Causality

Bounds on the Conditional and Average Treatment Effect with Unobserved Confounding Factors

Steve Yadlowsky, Hongseok Namkoong, Sanjay Basu, John Duchi, and 1 more author

Annals of Statistics, 2022

arXiv Bib Video

@article{YadlowskyNaBaDuTi22,
  title = {Bounds on the Conditional and Average Treatment Effect 
                    with Unobserved Confounding Factors},
  author = {Yadlowsky, Steve and Namkoong, Hongseok and Basu, Sanjay and Duchi, John and Tian, Lu},
  journal = {Annals of Statistics},
  volume = {50},
  number = {5},
  pages = {2587--2615},
  year = {2022},
  url = {https://projecteuclid.org/journals/annals-of-statistics/volume-50/issue-5/Bounds-on-the-conditional-and-average-treatment-effect-with-unobserved/10.1214/22-AOS2195.full},
  slide = {YadlowskyNaBaDuTi22-slides.pdf}
}

Unforeseen data sparsity

A central challenge in observational analysis is that the effective sample size is difficult to gauge. Even when a nominally large dataset is collected, the effective sample size may be prohibitively small when there is little overlap between trajectories seen under the data-generating and proposed policies. Data sparsity becomes more pronounced in modern problems that involve high-dimensional covariates representations; causal identification becomes difficult on parts of the covariate space with limited effective sample size. Existing observational methods are only valid in the large sample limit and silently fail in practical instances, where the effective sample size is limited. We propose a new inferential framework that provides always-valid uncertainty quantification. Unlike asymptotic methods, we quantify instance-specific uncertainty that accurately scales with the level of overlap Our proposed counterfactual evaluation paradigm i) provides always-valid uncertainty estimates, spurring engineering progress through rigorous empirical evaluations, and ii) guides the optimal design of experiments based on previously collected (observational) data.

AI-driven Decisions

A Broader View of Thompson Sampling

Yanlin Qu, Hongseok Namkoong, and Assaf Zeevi

arxiv:2510.07208 [cs.LG], 2025

arXiv Bib

@article{QuNaZe25,
  title = {A Broader View of Thompson Sampling},
  author = {Qu, Yanlin and Namkoong, Hongseok and Zeevi, Assaf},
  journal = {arxiv:2510.07208 [cs.LG]},
  year = {2025},
  url = {https://arxiv.org/abs/2510.07208}
}

AI-driven Decisions

Data-driven Stochastic Modeling using Autoregressive Sequence Models

Daksh Mittal, Shunri Zheng, Jing Dong, and Hongseok Namkoong

arxiv:2509.05839 [cs.LG], 2025

arXiv Bib

@article{MittalZhDoNa25,
  title = {Data-driven Stochastic Modeling using Autoregressive Sequence Models},
  author = {Mittal, Daksh and Zheng, Shunri and Dong, Jing and Namkoong, Hongseok},
  journal = {arxiv:2509.05839 [cs.LG]},
  year = {2025},
  url = {https://arxiv.org/abs/2509.05839}
}

Trustworthy AI

DRO: A Python Library for Distributionally Robust Optimization in Machine Learning

Jiashuo Liu, Tianyu Wang, Henry Lam, Hongseok Namkoong, and 1 more author

arXiv:2505.23565 [cs.LG], 2025

arXiv Bib Code

@article{LiuWaLaNaBl25,
  title = {{DRO}: A {P}ython Library for Distributionally Robust Optimization in Machine Learning},
  author = {Liu, Jiashuo and Wang, Tianyu and Lam, Henry and Namkoong, Hongseok and Blanchet, Jose},
  journal = {arXiv:2505.23565 [cs.LG]},
  year = {2025},
  url = {https://arxiv.org/abs/2505.23565},
}

Trustworthy AI

Learning-To-Measure: In-context Active Feature Acquisition

Yuta Kobayashi, Zilin Jing, Jiayu Yao, Hongseok Namkoong, and 1 more author

arxiv:2510.12624 [cs.LG], 2025

arXiv Bib

@article{KobayashiJiYaNaJo25,
  title = {Learning-To-Measure: In-context Active Feature Acquisition},
  author = {Kobayashi, Yuta and Jing, Zilin and Yao, Jiayu and Namkoong, Hongseok and Joshi, Shalmali},
  year = {2025},
  journal = {arxiv:2510.12624 [cs.LG]},
  url = {https://arxiv.org/abs/2510.12624}
}

AI-driven Decisions

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

Liang Hu, Jianpeng Jiao, Jiashuo Liu, Yanle Ren, and 19 more authors

arXiv:2509.13160 [cs.LG], 2025

arXiv Bib Code

@article{HuEtAl,
  title = {FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning},
  author = {Hu, Liang and Jiao, Jianpeng and Liu, Jiashuo and Ren, Yanle and Wen, Zhoufutu and Zhang, Kaiyuan and Zhang, Xuanliang and Gao, Xiang and He, Tianci and Hu, Fei and Liao, Yali and Wang, Zaiyuan and Yang, Chenghao and Yang, Qianyu and Yin, Mingren and Zeng, Zhiyuan and Zhang, Ge and Zhang, Xinyi and Zhao, Xiying and Zhu, Zhenwei and Namkoong, Hongseok and Huang, Wenhao and Tang, Yuwen},
  journal = {arXiv:2509.13160 [cs.LG]},
  year = {2025},
  url = {https://arxiv.org/abs/2509.13160},
}

AI-driven Decisions

Adaptive Elicitation of Latent Information Using Natural Language

Jimmy Wang*, Thomas Zollo*, Richard Zemel, and Hongseok Namkoong

In Proceedings of the 42nd International Conference on Machine Learning , 2025

Best Paper Award at ICLR 2025 Workshop on Quantify Uncertainty and Hallucination in Foundation Models

arXiv Bib Code

@inproceedings{WangZoZeNa25,
  title = {Adaptive Elicitation of Latent Information Using Natural Language},
  author = {Wang$*$, Jimmy and Zollo$*$, Thomas and Zemel, Richard and Namkoong, Hongseok},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year = {2025},
  url = {https://arxiv.org/abs/2504.04204},
  note = {Best Paper Award at ICLR 2025 Workshop on Quantify Uncertainty and Hallucination in Foundation Models}
}

Trustworthy AI
Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework

Thomson Yen*, Andrew Wei Tung Siah*, Haozhe Chen, Tianyi Peng, and 2 more authors

In Advances in Neural Information Processing Systems 38 , 2025

Abs arXiv Bib Code

A central challenge in AI is deciding how best to combine different data sources when training large models, as the performance of these models strongly depends on the composition of their training data. Historically, choosing optimal data mixtures has relied on intuition or expensive trial-and-error, limiting systematic progress even for resource-rich corporations. This research introduces a significant intellectual innovation: a novel Bayesian optimization framework based on probabilistic scaling laws that explicitly models uncertainty in how small-scale experiments inform large-scale training decisions. Enabled by substantial computational resources afforded by Empire AI, this framework adaptively selects data mixtures, model sizes, and training durations, efficiently balancing cost and information gain. By replacing brittle deterministic scaling laws with a flexible, probabilistic approach, this research enables optimal data curation through principled decision-making. Ultimately, this work sets a new scientific foundation for data mixture optimization, broadly advancing the efficiency, accessibility, and effectiveness of AI development across sectors.
@inproceedings{YenSiChPeGuNa25, title = {Data Mixture Optimization: A Multi-fidelity Multi-scale Bayesian Framework}, author = {Yen$*$, Thomson and Siah$*$, Andrew Wei Tung and Chen, Haozhe and Peng, Tianyi and Guetta, Daniel and Namkoong, Hongseok}, booktitle = {Advances in Neural Information Processing Systems 38}, year = {2025}, url = {https://arxiv.org/abs/2503.21023}, }
Trustworthy AI
LLM Generated Persona is a Promise with a Catch

Ang Li, Haozhe Chen, Hongseok Namkoong, and Tianyi Peng

In Advances in Neural Information Processing Systems 38 , 2025

Abs arXiv Bib Code

Large language model (LLM)-generated digital twins—synthetic personas that simulate real-world human behavior—have the potential to replace costly, time-intensive surveys with scalable, dynamic simulations. These “digital twins” can provide unprecedented insights into consumer behavior, political trends, and social dynamics, enabling businesses and researchers to test ideas, optimize products, and forecast societal shifts with greater speed and precision. However, this promise comes with a fundamental challenge: current LLM-driven persona generation methods lack rigor, introducing systematic biases that can misrepresent population-level behaviors and lead to flawed conclusions. This study reveals that widely used heuristic approaches to persona creation fail to capture the complexity of real-world diversity, sometimes producing distortions significant enough to alter election forecasts and public opinion studies. Addressing these flaws requires a scientific approach to persona generation—grounded in empirical validation, methodological innovation, and interdisciplinary collaboration—to ensure these digital twins accurately reflect human populations. This work was made possible by the computational infrastructure Empire AI provides, providing an order of magnitude more computational power than previous studies, enabling large-scale analysis of biases and reliability in LLM-generated personas.
@inproceedings{LiChNaPe25, title = {LLM Generated Persona is a Promise with a Catch}, author = {Li, Ang and Chen, Haozhe and Namkoong, Hongseok and Peng, Tianyi}, booktitle = {Advances in Neural Information Processing Systems 38}, year = {2025}, url = {https://arxiv.org/abs/2503.16527}, }

AI-driven Decisions

Architectural and Inferential Inductive Biases For Exchangeable Sequence Modeling

Daksh Mittal*, Ang Li*, Tzu-Ching Yen*, Daniel Guetta, and 1 more author

In Advances in Neural Information Processing Systems 38 , 2025

arXiv Bib

@inproceedings{MittalLiYeGuNa25,
  title = {Architectural and Inferential Inductive Biases For Exchangeable Sequence Modeling},
  author = {Mittal$*$, Daksh and Li$*$, Ang and Yen$*$, Tzu-Ching and Guetta, Daniel and Namkoong, Hongseok},
  booktitle = {Advances in Neural Information Processing Systems 38},
  year = {2025},
  url = {https://arxiv.org/abs/2503.01215}
}

Trustworthy AI

A Planning Framework for Adaptive Labeling

Daksh Mittal*, Yuanzhe Ma*, Shalmali Joshi, and Hongseok Namkoong

arXiv:2502.06076 [stat.ML], 2025

Journal version under review; Conference version appeared in NeurIPS 2024

arXiv Bib

@article{MittalMaJoNa25,
  title = {A Planning Framework for Adaptive Labeling},
  author = {Mittal$*$, Daksh and Ma$*$, Yuanzhe and Joshi, Shalmali and Namkoong, Hongseok},
  journal = {arXiv:2502.06076 [stat.ML]},
  year = {2025},
  note = {Journal version under review; Conference version appeared in NeurIPS 2024},
  url = {http://arxiv.org/abs/2502.06076}
}

AI-driven Decisions

Contextual Thompson Sampling via Generation of Missing Data

Kelly Zhang, Tiffany Cai, Hongseok Namkoong, and Daniel Russo

In Advances in Neural Information Processing Systems 38 , 2025

arXiv Bib

@inproceedings{ZhangCaNaRu25,
  title = {Contextual Thompson Sampling via Generation of Missing Data},
  author = {Zhang, Kelly and Cai, Tiffany and Namkoong, Hongseok and Russo, Daniel},
  booktitle = {Advances in Neural Information Processing Systems 38},
  year = {2025},
  url = {https://arxiv.org/abs/2502.07064}
}

AI-driven Decisions

AI as an intervention: improving clinical outcomes relies on a causal approach to AI development and validation

Shalmali Joshi, Iñigo Urteaga, Wouter A C Amsterdam, George Hripcsak, and 16 more authors

Journal of the American Medical Informatics Association, 2025

Bib

@article{JoshiEtAl25,
  author = {Joshi, Shalmali and Urteaga, Iñigo and van Amsterdam, Wouter A C and Hripcsak, George and Elias, Pierre and Recht, Benjamin and Elhadad, Noémie and Fackler, James and Sendak, Mark P and Wiens, Jenna and Deshpande, Kaivalya and Wald, Yoav and Fiterau, Madalina and Lipton, Zachary and Malinsky, Daniel and Nayan, Madhur and Namkoong, Hongseok and Park, Soojin and Vogt, Julia E and Ranganath, Rajesh},
  title = {{AI} as an intervention: improving clinical outcomes relies on a causal approach to {AI} development and validation},
  journal = {Journal of the American Medical Informatics Association},
  volume = {32},
  number = {3},
  pages = {589-594},
  year = {2025},
  url = {https://academic.oup.com/jamia/article/32/3/589/7945189}
}

Trustworthy AI

From Models to Systems: A Comprehensive Framework for AI System Fairness in Compositional Recommender Systems

Brian Hsu, Cyrus DiCiccio, Natesh Sivasubramoniapillai, and Hongseok Namkoong

In Proceedings of the Algorithmic Fairness Through the Lens of Metrics and Evaluation , 2024

arXiv Bib

@inproceedings{HsuDiSiNa24,
  title = {From Models to Systems: A Comprehensive Framework for AI System Fairness in Compositional Recommender Systems},
  author = {Hsu, Brian and DiCiccio, Cyrus and Sivasubramoniapillai, Natesh and Namkoong, Hongseok},
  year = {2024},
  booktitle = {Proceedings of the Algorithmic Fairness Through the Lens of Metrics and Evaluation},
  pages = {8--37},
  url = {https://proceedings.mlr.press/v279/hsu25a.html},
  volume = {279},
  series = {Proceedings of Machine Learning Research},
  publisher = {PMLR},
}

Trustworthy AI

LLM Embeddings Improve Test-time Adaptation to Tabular Y|X-Shifts

Yibo Zeng*, Jiashuo Liu*, Henry Lam, and Hongseok Namkoong

arXiv:2410.07395 [cs.LG], 2024

arXiv Bib Code

@article{ZengLiLaNa24,
  title = {{LLM} Embeddings Improve Test-time Adaptation to Tabular {$Y|X$}-Shifts},
  author = {Zeng$*$, Yibo and Liu$*$, Jiashuo and Lam, Henry and Namkoong, Hongseok},
  journal = {arXiv:2410.07395 [cs.LG]},
  year = {2024},
  url = {https://arxiv.org/abs/2410.07395},
}

AI-driven Decisions

QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers

Haozhe Chen, Ang Li, Ethan Che, Tianyi Peng, and 2 more authors

In Advances in Neural Information Processing Systems 37, Datasets and Benchmark Track , 2024

arXiv Bib Code

@inproceedings{ChenLiChPeDoNa24,
  title = {{QGym}: Scalable Simulation and Benchmarking of Queuing Network Controllers},
  author = {Chen, Haozhe and Li, Ang and Che, Ethan and Peng, Tianyi and Dong, Jing and Namkoong, Hongseok},
  booktitle = {Advances in Neural Information Processing Systems 37, Datasets and Benchmark Track},
  year = {2024},
  url = {https://arxiv.org/abs/2410.06170},
}

AI-driven Decisions

Differentiable Discrete Event Simulation for Queuing Network Control

Ethan Che, Jing Dong, and Hongseok Namkoong

arXiv:2409.03740 [cs.LG], 2024

Major revision in Operations Research

arXiv Bib Code

@article{CheDoNa24,
  title = {Differentiable Discrete Event Simulation for Queuing Network Control},
  author = {Che, Ethan and Dong, Jing and Namkoong, Hongseok},
  journal = {arXiv:2409.03740 [cs.LG]},
  year = {2024},
  url = {https://arxiv.org/abs/2409.03740},
  note = {Major revision in Operations Research},
}

Trustworthy AI

PersonalLLM: Tailoring LLMs to Individual Preferences

Thomas Zollo*, Andrew Siah*, Naimeng Ye, Ang Li, and 1 more author

In In International Conference on Learning Representations, 2025 , 2025

arXiv Bib Code

@inproceedings{ZolloSiYeLiNa24,
  title = {{PersonalLLM}: Tailoring {LLMs} to Individual Preferences},
  author = {Zollo$*$, Thomas and Siah$*$, Andrew and Ye, Naimeng and Li, Ang and Namkoong, Hongseok},
  booktitle = {In International Conference on Learning Representations, 2025},
  year = {2025},
  url = {https://www.arxiv.org/abs/2409.20296},
}

AI-driven Decisions
Optimization-Driven Adaptive Experimentation

Ethan Che, Daniel Jiang, Hongseok Namkoong , and Jimmy Wang

arXiv:2408.04570 [cs.LG], 2024

Selected for oral presentations at the Econometric Society Interdisciplinary Frontiers: Economics and AI+ML conference and Conference on Digital Experimentation

Abs arXiv Bib Code Slides

Adaptivity can significantly improve efficiency of experimentation, but it is challenging to implement even at large online platforms with mature experimentation systems. As a result, many real-world experiments are deliberately implemented with large batches and a handful of opportunities to update the sampling allocation as a way to reduce operational costs of experimentation.
In this work, we focus on adaptive experiments with limited adaptivity (short horizons T < 10). Bandit algorithms focusing on long-horizon settings are tailored to provide regret guarantees for each specific case, and we find they often underperform static A/B tests on practical problem instances with batched feedback, non-stationarity, multiple objectives and constraints, and personalization.
In response, we develop a mathematical programming framework for developing adaptive experimentation algorithms. Instead of the problem-specific research paradigm (akin to an optimization solver developed for a particular linear program), we ask the modeler to write down a flexible optimization formulation and use modern machine learning systems to (heuristically) solve for adaptive designs. Since a naive formulation of the adaptive experimentation problem as a dynamic program is intractable, we propose a batched view of the experimentation process. We model the uncertainty around batch-level sufficient statistics necessary to make allocation decisions, instead of attempting to model unit-level outcomes whose distributions are commonly unknown and leads to intractable dynamic programs with combinatorial action spaces.
Sequential Gaussian approximations is the main intellectual vehicle powering our mathematical programming framework. CLT-based normal approximations are universal in statistical inference, and a sequential variant we prove provides a simple optimization formulation that lends itself to modern computational tools. Through extensive empirical evaluation, we observe that even a preliminary and heuristic solution approach can provide major robustness benefits. Unlike bespoke methods (e.g., Thompson sampling variants), our mathematical programming framework provides consistent gains over static randomized control trials and exhibits robust performance across problem instances.
@article{CheJiNaWa24, title = {Optimization-Driven Adaptive Experimentation}, author = {Che, Ethan and Jiang, Daniel and Namkoong, Hongseok and Wang, Jimmy}, journal = {arXiv:2408.04570 [cs.LG]}, year = {2024}, note = {Selected for oral presentations at the Econometric Society Interdisciplinary Frontiers: Economics and AI+ML conference and Conference on Digital Experimentation}, url = {https://arxiv.org/abs/2408.04570}, }

AI-driven Decisions

AExGym: Benchmarks and Environments for Adaptive Experimentation

Jimmy Wang, Ethan Che, Daniel Jiang, and Hongseok Namkoong

arXiv:2408.04531 [cs.LG], 2024

arXiv Bib Code

@article{WangChJiNa24,
  title = {{AExGym}: Benchmarks and Environments for Adaptive Experimentation},
  author = {Wang, Jimmy and Che, Ethan and Jiang, Daniel and Namkoong, Hongseok},
  journal = {arXiv:2408.04531 [cs.LG]},
  year = {2024},
  url = {https://arxiv.org/abs/2408.04531},
}

Trustworthy AI
Exchangeable Sequence Models Quantify Uncertainty Over Latent Concepts

Naimeng Ye, and Hongseok Namkoong

arXiv:2408.03307 [stat.ML], 2024

Abs arXiv Bib

AI models are omni-present yet extrapolate in unexpected ways, posing a significant barrier to robust and fair systems. Building AI systems that can articulate their own uncertainty has been a longstanding challenge in ML, such probabilistic reasoning capability is key to bounding downside risk (e.g., delegating to human experts) and continually improving system performance by gathering data to resolve uncertainty. Despite recent advances in large language models, uncertainty quantification remains a challenge, with methods attempting to leverage these deep neural networks—such as Bayesian neural networks—frequently facing scalability limitations.
This work takes an important conceptual step towards building large-scale AI systems that can reason about uncertainty through natural language. We revisit De Finetti’s view of uncertainty coming from missing observations rather than latent parameters, which allows us to pose learning to do statistical inference as a prediction problem involving masked inputs. This formal connection between autoregressive generation with probabilistic reasoning allows pre-trained sequence models to express their epistemic uncertainty on underlying concepts, and refine their beliefs as they gather more information.
Our findings open a promising avenue for addressing uncertainty in complex, data-rich settings in a scalable way. We are excited by how this work leverages a timeless insight to inform a timely topic: guiding the next generation of AI systems.
1. As internet data depletes, the pace of progress in LLM capabilities has been widely observed to slow down (even in public media). This suggests that the limited paradigm of pre-training on passively scraped web data has reached its full potential. To move forward, the authors believe that the next generation of AI systems must be able to understand tasks on which they suffer high uncertainty, and actively gather data in order to continually improve their performance.
2. Since scalable uncertainty quantification poses a key intellectual bottleneck, we resolve this by going back to De Finetti’s insight developed in the 1920s. We believe the connection between Bayesian inference and autoregressive generation provides the groundwork for building LLMs with probabilistic reasoning capabilities.

Taken together, our work showcases how principled scientific insights have the potential to shape the design of even the largest scale AI systems.
@article{YeNa24, title = {Exchangeable Sequence Models Quantify Uncertainty Over Latent Concepts}, author = {Ye, Naimeng and Namkoong, Hongseok}, journal = {arXiv:2408.03307 [stat.ML]}, year = {2024}, url = {https://arxiv.org/abs/2408.03307}, selected = false }
Robust Causality
Constrained Learning for Causal Inference and Semiparametric Statistics

Tiffany Cai*, Yuri Fonseca*, Kaiwen Hou, and Hongseok Namkoong

arXiv:2405.09493 [stat.ML], 2024

Abs arXiv Bib

Causal inference provides the foundation of decision-making in sciences and industry alike, and our work addresses a longstanding gap between practical performance and theoretical guarantees in causal inference. Machine learning-based methods can provide a powerful way to control for confounding, and the de facto standard approach is to use debiased estimators, which enjoy guarantees like statistical efficiency and double robustness; examples include one-step estimation (i.e. augmented inverse propensity weighting (AIPW)) and targeted maximum likelihood estimation (TMLE).
However, in practice, these estimators have been observed to be unstable when there is limited overlap between treatment and control, necessitating ad hoc adjustments such as truncating propensity scores. In contrast, naive plug-in estimators using an ML model can be more stable but lack these desirable asymptotic properties. This trade-off can make it difficult to choose an estimator and ultimately, to reach a conclusion regarding the treatment effect.
We propose a novel framework that combines the best of both worlds: we derive the best plug-in estimator that is debiased, retaining the stability of plug-ins while enjoying statistical efficiency and double robustness. Our estimation framework is based on a constrained optimization problem and can incorporate flexible modern ML techniques, including controlling for text-based confounders using LLMs. Empirically, we demonstrate our approach over a range of examples, and observe that it outperforms standard debiased methods when there is limited overlap.
As low overlap settings are a persistent challenge in practice, we expect these results will be of interest to a broad spectrum of researchers, including practitioners in statistics, economics, and machine learning. We are unusually excited by how our framework provides a novel and pragmatic approach to a longstanding challenge in causal inference. By introducing an entirely new constrained optimization framework for semiparametric estimation, we hope to spur further progress in developing robust but theoretically grounded estimators.
@article{CaiFoHoNa24, title = {Constrained Learning for Causal Inference and Semiparametric Statistics}, author = {Cai$*$, Tiffany and Fonseca$*$, Yuri and Hou, Kaiwen and Namkoong, Hongseok}, journal = {arXiv:2405.09493 [stat.ML]}, year = {2024}, url = {https://arxiv.org/abs/2405.09493} }

AI-driven Decisions

Active Exploration via Autoregressive Generation of Missing Data

Tiffany Cai, Hongseok Namkoong, Daniel Russo, and Kelly Zhang

arXiv:2405.19466 [cs.LG], 2025

Selected for presentation at the Econometric Society Interdisciplinary Frontiers: Economics and AI+ML conference

arXiv Bib Slides

@article{CaiNaRuZh25,
  title = {Active Exploration via Autoregressive Generation of Missing Data},
  author = {Cai, Tiffany and Namkoong, Hongseok and Russo, Daniel and Zhang, Kelly},
  journal = {arXiv:2405.19466 [cs.LG]},
  year = {2025},
  note = {Selected for presentation at the Econometric Society
                    Interdisciplinary Frontiers: Economics and AI+ML
                    conference},
  url = {https://arxiv.org/abs/2405.19466},
}

AI-driven Decisions
Design and Scheduling of an AI-based Queueing System

Jiung Lee, Hongseok Namkoong, and Yibo Zeng

arXiv:2406.06855 [math.OC], 2024

Major revision in Management Science; Selected for presentation at SIG day 2025

Abs arXiv Bib

Recent advances in AI present significant opportunities to rethink the design of service systems with AI at the forefront. Even in the era of LLMs, managing a workforce of human agents (“servers”) is a crit- ical problem. Crowdsourcing workers are vital for aligning LLMs with human values (e.g., ChatGPT) and in many domains, the cost of human annotation is a binding constraint (e.g., medical diagnosis from radiologists). This work models and analyzes modern service systems involving human reviewers and state-of-the-art AI models. A key intellectual challenge in managing con- gestion within such service systems is endogeneity. Prediction is never the goal, and the link between predictive performance and downstream decision-making performance is not straightforward due to endogeneity. Our work crystallizes how classical tools from queueing theory provide managerial insights into the design of AI-based service systems.
@article{LeeNaZe24, title = {Design and Scheduling of an AI-based Queueing System}, author = {Lee, Jiung and Namkoong, Hongseok and Zeng, Yibo}, year = {2024}, journal = {arXiv:2406.06855 [math.OC]}, url = {https://arxiv.org/abs/2406.06855}, note = {Major revision in Management Science; Selected for presentation at SIG day 2025} }
Trustworthy AI
On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

Jiashuo Liu*, Tianyu Wang*, Peng Cui, and Hongseok Namkoong

arXiv:2307.05284 [cs.LG], 2024

Major revision in Management Science; Conference version appeared in NeurIPS 2023

Abs arXiv Bib Video Slides Website

Different distribution shifts require different interventions, and algorithms must be grounded in the specific shifts they address. Advocating for an inductive approach to research on distributional robustness, we build an empirical testbed, "WhyShift", comprising of natural shifts across 5 tabular datasets and 60,000 model configurations encompassing imbalanced learning algorithms and distributionally robust optimization (DRO) methods. We find Y|X-shifts are most prevalent on our testbed, in stark contrast to the heavy focus on X (covariate)-shifts in the ML literature. We conduct an in-depth empirical analysis of DRO methods and find that the underlying model class (e.g., neural networks, XGBoost) and hyperparameter selection have a first-order impact in practice despite being overlooked by DRO researchers. To further bridge that gap between methodological research and practice, we design case studies that illustrate how such a refined understanding of distribution shifts can enhance both data-centric and algorithmic interventions.
@article{LiuWaCuNa24, title = {On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets}, author = {Liu$*$, Jiashuo and Wang$*$, Tianyu and Cui, Peng and Namkoong, Hongseok}, year = {2024}, journal = {arXiv:2307.05284 [cs.LG]}, url = {https://arxiv.org/abs/2307.05284}, note = {Major revision in Management Science; Conference version appeared in NeurIPS 2023}, }
AI-driven Decisions
Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches

Ethan Che, and Hongseok Namkoong

arXiv:2303.11582 [cs.LG], 2023

Major revision in Operations Research

Abs arXiv Bib Video Slides Website

Starting with my one-year stint at Meta’s adaptive experimentation team, I’ve been pondering on how bandit algorithms are largely designed by theoreticians to achieve good regret bounds and are rarely used in practice due to the difficulty of implementation and poor empirical performance. In this work, we focus on underpowered, short-horizon, and large-batch problems that typically arise in practice. We use large batch normal approximations to derive an MDP formulation for deriving the optimal adaptive design. Our formulation allows the use of computational tools for designing adaptive algorithms, a break from the existing theory-driven paradigm. Our approach significantly improves statistical power over standard methods, even when compared to Bayesian bandit algorithms (e.g., Thompson sampling) that require full distributional knowledge of individual rewards. Overall, we expand the scope of adaptive experimentation to settings that are difficult for standard methods, involving limited adaptivity, low signal-to-noise ratio, and unknown reward distributions.
@article{CheNa23, title = {Adaptive Experimentation at Scale: A Computational Framework for Flexible Batches}, author = {Che, Ethan and Namkoong, Hongseok}, year = {2023}, journal = {arXiv:2303.11582 [cs.LG]}, note = {Major revision in Operations Research}, url = {https://arxiv.org/abs/2303.11582}, }
Trustworthy AI
Diagnosing Model Performance Under Distribution Shift

Tiffany Cai, Hongseok Namkoong, and Steve Yadlowsky

Operations Research, 2025

Conference version appeared Symposium on Foundations of Responsible Computing 2023

Abs arXiv Bib Video Code Slides

Recent advances in AI present significant opportunities to rethink the design of service systems with AI at the forefront. Even in the era of LLMs, managing a workforce of human agents (“servers”) is a crit- ical problem. Crowdsourcing workers are vital for aligning LLMs with human values (e.g., ChatGPT) and in many domains, the cost of human annotation is a binding constraint (e.g., medical diagnosis from radiologists). This work models and analyzes modern service systems involving human reviewers and state-of-the-art AI models. A key intellectual challenge in managing con- gestion within such service systems is endogeneity. Prediction is never the goal, and the link between predictive performance and downstream decision-making performance is not straightforward due to endogeneity. Our work crystallizes how classical tools from queueing theory provide managerial insights into the design of AI-based service systems.
@article{CaiNaYa25, title = {Diagnosing Model Performance Under Distribution Shift}, author = {Cai, Tiffany and Namkoong, Hongseok and Yadlowsky, Steve}, year = {2025}, journal = {Operations Research}, note = {Conference version appeared Symposium on Foundations of Responsible Computing 2023}, url = {https://arxiv.org/abs/2303.02011}, }

Robust Causality

Modeling Interference via Experiment Rollout

Ari Boyarsky, Hongseok Namkoong, and Jean Pouget-Abadie

arXiv:2305.10728 [stat.ME], 2023

Conference version appeared in ACM conference on Economics and Computation

arXiv Bib Slides

@article{BoyarskyNaPo23,
  title = {Modeling Interference via Experiment Rollout},
  author = {Boyarsky, Ari and Namkoong, Hongseok and Pouget-Abadie, Jean},
  year = {2023},
  journal = {arXiv:2305.10728 [stat.ME]},
  note = {Conference version appeared in ACM conference on Economics and Computation},
  url = {https://arxiv.org/abs/2305.10728},
}

AI-driven Decisions

Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning

Hongseok Namkoong*, Samuel Daulton*, and Eytan Bakshy

arXiv:2011.14266 [stat.ML], 2024

Selected for an oral presentation at the Neurips 2020 OfflineRL Workshop

arXiv Bib

@article{NamkoongDaBa24,
  title = {Distilled Thompson Sampling: Practical and Efficient Thompson Sampling via Imitation Learning},
  author = {Namkoong$*$, Hongseok and Daulton$*$, Samuel and Bakshy, Eytan},
  journal = {arXiv:2011.14266 [stat.ML]},
  year = {2024},
  note = {Selected for an oral presentation at the Neurips 2020 OfflineRL Workshop},
  url = {https://arxiv.org/abs/2011.14266},
}

Robust Causality

Assessing External Validity via Worst-case Subpopulation Treatment Effects

Sookyo Jeong, and Hongseok Namkoong

arXiv:2007.02411 [stat.ML], 2022

Short version appeared in Conference on Learning Theory 2020

arXiv Bib Video Code Slides

@article{JeongNa22,
  title = {Assessing External Validity via Worst-case Subpopulation Treatment Effects},
  author = {Jeong, Sookyo and Namkoong, Hongseok},
  journal = {arXiv:2007.02411 [stat.ML]},
  year = {2022},
  note = {Short version appeared in Conference on Learning Theory 2020},
  url = {https://arxiv.org/abs/2007.02411},
}

Trustworthy AI

Evaluating Model Performance Under Worst-case Subpopulations

Mike Li, Hongseok Namkoong, and Shangzhou Xia

arXiv:2407.01316 [cs.LG], 2024

Major revision in Mathematics of Operations Research; Short version appeared in NeurIPS 2021

arXiv
Trustworthy AI

Minimax Optimal Estimation of Stability Under Distribution Shift

Hongseok Namkoong*, Yuanzhe Ma* , and Peter W. Glynn

arXiv:2212.06338 [stat.ML], 2025

To appear in Operations Research

arXiv Code

Trustworthy AI

Model Soups: Averaging Weights of Multiple Fine-tuned Models Improves Accuracy Without Increasing Inference Time

Mitchell Wortsman, Gabriel Ilharco, Samir Yitzhak Gadre, Rebecca Roelofs, and 7 more authors

In Proceedings of the 39th International Conference on Machine Learning , 2022

arXiv Bib Video Code

@inproceedings{WortsmanIlGaRoGoMoNaFaCaKoSc22,
  title = {Model Soups: Averaging Weights of Multiple Fine-tuned Models Improves Accuracy Without Increasing Inference Time},
  author = {Wortsman, Mitchell and Ilharco, Gabriel and Gadre, Samir Yitzhak and Roelofs, Rebecca and Gontijo-Lopes, Raphael and Morcos, Ari S and Namkoong, Hongseok and Farhadi, Ali and Carmon, Yair and Kornblith, Simon and Schmidt, Ludwig},
  booktitle = {Proceedings of the 39th International Conference on Machine Learning},
  year = {2022},
  url = {https://proceedings.mlr.press/v162/wortsman22a/wortsman22a.pdf},
}

Trustworthy AI

Robust Fine-tuning of Zero-shot Models

Mitchell Wortsman*, Gabriel Ilharco*, Jong Wook Kim , Mike Li, and 7 more authors

In Proceedings of the 32nd IEEE Conference on Computer Vision and Pattern Recognition , 2022

CVPR Best Paper Finalist

arXiv Bib Video Code

@inproceedings{WortsmanIlLiKiHaFaNaSc22,
  title = {Robust Fine-tuning of Zero-shot Models},
  author = {Wortsman$*$, Mitchell and Ilharco$*$, Gabriel and Kim, Jong Wook and Li, Mike and Kornblith, Simon and Roelofs, Rebecca and Gontijo-Lopes, Raphael and Hajishirzi, Hannaneh and Farhadi, Ali and Namkoong, Hongseok and Schmidt, Ludwig},
  booktitle = {Proceedings of the 32nd IEEE Conference on Computer Vision and Pattern Recognition},
  note = {CVPR Best Paper Finalist},
  year = {2022},
  url = {https://openaccess.thecvf.com/content/CVPR2022/papers/Wortsman_Robust_Fine-Tuning_of_Zero-Shot_Models_CVPR_2022_paper.pdf},
}

Robust Causality

Bounds on the Conditional and Average Treatment Effect with Unobserved Confounding Factors

Steve Yadlowsky, Hongseok Namkoong, Sanjay Basu, John Duchi, and 1 more author

Annals of Statistics, 2022

arXiv Bib Video

@article{YadlowskyNaBaDuTi22,
  title = {Bounds on the Conditional and Average Treatment Effect 
                    with Unobserved Confounding Factors},
  author = {Yadlowsky, Steve and Namkoong, Hongseok and Basu, Sanjay and Duchi, John and Tian, Lu},
  journal = {Annals of Statistics},
  volume = {50},
  number = {5},
  pages = {2587--2615},
  year = {2022},
  url = {https://projecteuclid.org/journals/annals-of-statistics/volume-50/issue-5/Bounds-on-the-conditional-and-average-treatment-effect-with-unobserved/10.1214/22-AOS2195.full},
  slide = {YadlowskyNaBaDuTi22-slides.pdf}
}

Robust Causality

Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

Hongseok Namkoong*, Ramtin Keramati*, Steve Yadlowsky*, and Emma Brunskill

In Advances in Neural Information Processing Systems 33 , 2020

Bib Code

@inproceedings{NamkoongKeYaBr20,
  title = {Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding},
  author = {Namkoong$*$, Hongseok and Keramati$*$, Ramtin and Yadlowsky$*$, Steve and Brunskill, Emma},
  booktitle = {Advances in Neural Information Processing Systems 33},
  year = {2020},
  url = {https://proceedings.neurips.cc/paper/2020/file/da21bae82c02d1e2b8168d57cd3fbab7-Paper.pdf},
  slide = {YadlowskyNaBaDuTi22-slides.pdf}
}

Trustworthy AI

Distributionally Robust Losses Against Mixture Covariate Shifts

John C. Duchi, Tatsunori Hashimoto, and Hongseok Namkoong

Operations Research, 2022

arXiv Bib Video Code

@article{DuchiHaNa22,
  author = {Duchi, John C. and Hashimoto, Tatsunori and Namkoong, Hongseok},
  title = {Distributionally Robust Losses Against Mixture Covariate Shifts},
  year = {2022},
  journal = {Operations Research},
  url = {https://pubsonline.informs.org/doi/10.1287/opre.2022.2363},
}

Trustworthy AI

Learning Models with Uniform Performance via Distributionally Robust Optimization

John C. Duchi, and Hongseok Namkoong

Annals of Statistics, 2021

arXiv Bib Video Code Slides

@article{DuchiNa21,
  author = {Duchi, John C. and Namkoong, Hongseok},
  title = {Learning Models with Uniform Performance via Distributionally
                    Robust Optimization},
  year = {2021},
  volume = {49},
  number = {3},
  pages = {1378-1406},
  journal = {Annals of Statistics},
  url = {https://projecteuclid.org/journals/annals-of-statistics/volume-49/issue-3/Learning-models-with-uniform-performance-via-distributionally-robust-optimization/10.1214/20-AOS2004.full},
}

Trustworthy AI

Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach

John C. Duchi , Peter W. Glynn, and Hongseok Namkoong

Mathematics of Operations Research, 2021

APS Best Student Paper Prize

arXiv Bib Video

@article{DuchiGlNa21,
  title = {Statistics of Robust Optimization: A Generalized Empirical
      Likelihood Approach},
  author = {Duchi, John C. and Glynn, Peter W. and Namkoong, Hongseok},
  year = {2021},
  volume = {46},
  number = {3},
  pages = {946-969},
  journal = {Mathematics of Operations Research},
  note = {APS Best Student Paper Prize},
  url = {https://pubsonline.informs.org/doi/10.1287/moor.2020.1085},
}

Trustworthy AI

Certifiable Distributional Robustness with Principled Adversarial Training

Aman Sinha*, Hongseok Namkoong*, Riccardo Volpi, and John Duchi

In International Conference on Learning Representations , 2018

Selected for a full oral presentation; 2% of submissions

arXiv Bib Video Code

@inproceedings{SinhaNaVoDu18,
  title = {Certifiable Distributional Robustness with Principled Adversarial Training},
  author = {Sinha$*$, Aman and Namkoong$*$, Hongseok and Volpi, Riccardo and Duchi, John},
  booktitle = {International Conference on Learning Representations},
  year = {2018},
  note = {Selected for a full oral presentation; 2\% of submissions},
  url = {https://arxiv.org/abs/1710.10571},
}

Trustworthy AI

Variance-based regularization with convex objectives

John C. Duchi, and Hongseok Namkoong

Journal of Machine Learning Research, 2019

Conference version won NeurIPS 2017 Best Paper Award

arXiv Bib Video Code

@article{DuchiNa19,
  title = {Variance-based regularization with convex objectives},
  author = {Duchi, John C. and Namkoong, Hongseok},
  year = {2019},
  journal = {Journal of Machine Learning Research},
  note = {Conference version won NeurIPS 2017 Best Paper Award},
  url = {https://jmlr.csail.mit.edu/papers/volume20/17-750/17-750.pdf},
  slide = {NamkoongDu17-slides.pdf},
}

Trustworthy AI

Generalizing to Unseen Domains via Adversarial Data Augmentation

Riccardo Volpi*, Hongseok Namkoong*, John Duchi, Vittorio Murino, and 1 more author

In Advances in Neural Information Processing Systems 31 , 2018

arXiv Bib Code

@inproceedings{VolpiNaSeDuMuSa18,
  title = {Generalizing to Unseen Domains via Adversarial Data Augmentation},
  author = {Volpi$*$, Riccardo and Namkoong$*$, Hongseok and Duchi, John and Murino, Vittorio and Savarese, Silvio},
  booktitle = {Advances in Neural Information Processing Systems 31},
  year = {2018},
  url = {https://proceedings.neurips.cc/paper_files/paper/2018/file/1d94108e907bb8311d8802b48fd54b4a-Paper.pdf},
}

Trustworthy AI

Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation

Mathew O’Kelly*, Aman Sinha*, Hongseok Namkoong*, John Duchi, and 1 more author

In Advances in Neural Information Processing Systems 31 , 2018

arXiv Bib

@inproceedings{OKellySiNaDuTe18,
  title = {Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation},
  author = {O'Kelly$*$, Mathew and Sinha$*$, Aman and Namkoong$*$, Hongseok and Duchi, John and Tedrake, Russ},
  booktitle = {Advances in Neural Information Processing Systems 31},
  year = {2018},
  url = {https://proceedings.neurips.cc/paper_files/paper/2018/file/653c579e3f9ba5c03f2f2f8cf4512b39-Paper.pdf},
}

Trustworthy AI

Fairness Without Demographics in Repeated Loss Minimization

Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang

In International Conference on Machine Learning , 2018

Best Paper Runner-up Award

arXiv Bib Code

@inproceedings{HashimotoSrNaLi18,
  title = {Fairness Without Demographics in Repeated Loss Minimization},
  author = {Hashimoto, Tatsunori and Srivastava, Megha and Namkoong, Hongseok and Liang, Percy},
  booktitle = {International Conference on Machine Learning},
  year = {2018},
  note = {Best Paper Runner-up Award},
  url = {https://proceedings.mlr.press/v80/hashimoto18a/hashimoto18a.pdf},
}

Adaptive sampling probabilities for non-smooth optimization

Hongseok Namkoong, Aman Sinha, Steve Yadlowsky , and John C Duchi

In International Conference on Machine Learning , 2017

Bib Code

@inproceedings{NamkoongSiYaDu17,
  title = {Adaptive sampling probabilities for non-smooth optimization},
  author = {Namkoong, Hongseok and Sinha, Aman and Yadlowsky, Steve and Duchi, John C},
  booktitle = {International Conference on Machine Learning},
  pages = {2574--2583},
  year = {2017},
  url = {https://proceedings.mlr.press/v70/namkoong17a/namkoong17a.pdf},
}

Trustworthy AI

Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences

Hongseok Namkoong , and John C. Duchi

In Advances in Neural Information Processing Systems 29 , 2016

Bib

@inproceedings{NamkoongDu16,
  author = {Namkoong, Hongseok and Duchi, John C.},
  title = {Stochastic Gradient Methods for Distributionally
    Robust Optimization with $f$-divergences},
  year = {2016},
  booktitle = {Advances in Neural Information Processing Systems 29},
  url = {https://papers.nips.cc/paper_files/paper/2016/hash/4588e674d3f0faf985047d4c3f13ed0d-Abstract.html},
  slide = {NamkoongDu16-slides.pdf}
}

research highlight

AI-driven decision-making

Research philosophy and methodology

Building a language defining distribution shifts

NeurIPS 2023 Tutorial

Selected Papers

Foundations of distributional robustness

Adaptive experimentation at scale

Auto-differentiable discrete event simulation

External validity

Unobserved confounding

Unforeseen data sparsity

References