Code & Data

I am passionate about building empirical foundations for methodological progress. My group leads open-source projects, benchmarks, and data releases based on our research.

The GitHub pins below are the primary index of our open-source work. This page also highlights larger public releases with companion datasets or project pages.

Methodological context: Empirical Rigor Through Benchmarking in Operations Research describes the philosophy behind building reusable empirical artifacts.

Flagship releases

MBABench financial spreadsheet benchmark preview

MBABench

End-to-end financial spreadsheet tasks for evaluating professional-quality agent workflows.

LatentGym Task 10 cross-task learning example

LatentGym

A controllable testbed for cross-task experiential learning with latent structure.

SynthTools top-down tool generation overview

SynthTools

A framework for scaling synthetic tool environments for agent development and evaluation.

BELA exploratory without context example

In-context experiential learning

A repeated product-recommendation benchmark for studying agents that improve from interaction.

Repositories