Code & Data

I am passionate about building empirical foundations for methodological progress. My group leads open-source projects, benchmarks, and data releases based on our research.

The GitHub pins below are the primary index of our open-source work. This page also highlights larger public releases with companion datasets or project pages.

Methodological context: Empirical Rigor Through Benchmarking in Operations Research describes the philosophy behind building reusable empirical artifacts.

Flagship releases

MBABench financial spreadsheet benchmark preview

MBABench

End-to-end financial spreadsheet tasks for evaluating professional-quality agent workflows.

Paper Blog Code Data Website

LatentGym Task 10 cross-task learning example

LatentGym

A controllable testbed for cross-task experiential learning with latent structure.

Paper Blog Code Data

SynthTools top-down tool generation overview

SynthTools

A framework for scaling synthetic tool environments for agent development and evaluation.

Paper Code Data Website

BELA exploratory without context example

In-context experiential learning

A repeated product-recommendation benchmark for studying agents that improve from interaction.

Paper Code Website

Repositories

namkoong-lab/MBABench

namkoong-lab/MBABench

namkoong-lab/LatentGym

namkoong-lab/LatentGym

namkoong-lab/OR-benchmark-experiments

namkoong-lab/OR-benchmark-experiments

namkoong-lab/interactive-benchmark

namkoong-lab/interactive-benchmark

DifferentiableQueue/QueueTorch

DifferentiableQueue/QueueTorch

namkoong-lab/adaptive-elicitation

namkoong-lab/adaptive-elicitation

namkoong-lab/QGym

namkoong-lab/QGym

namkoong-lab/PersonalLLM

namkoong-lab/PersonalLLM

namkoong-lab/LLM-Tabular-Shifts

namkoong-lab/LLM-Tabular-Shifts

namkoong-lab/aexgym

namkoong-lab/aexgym

namkoong-lab/whyshift

namkoong-lab/whyshift

namkoong-lab/disde

namkoong-lab/disde

namkoong-lab/marginal-dro

namkoong-lab/marginal-dro

namkoong-lab/robustopt

namkoong-lab/robustopt

namkoong-lab/dro

namkoong-lab/dro

namkoong-lab/distilled-thompson-sampling

namkoong-lab/distilled-thompson-sampling

ricvolpi/adversarial-feature-augmentation

ricvolpi/adversarial-feature-augmentation

StanfordAI4HI/off_policy_confounding

StanfordAI4HI/off_policy_confounding

mlfoundations/wise-ft

mlfoundations/wise-ft

mlfoundations/model-soups

mlfoundations/model-soups

mlfoundations/open_clip

mlfoundations/open_clip