I am a 5th year PhD. student in the Programming Research Laboratory at Northeastern University advised by Arjun Guha. Currently, I am also working with Todd Gamblin at Lawrence Livermore National Laboratory on extending Spack's solver to better reuse already-built packages. Previously, I worked on evaluting and improving large language models for code generation. I also have worked on building scalable abstractions for probabilistic programs with Steven Holtzen. I am broadly interested in applying programming languages techniques (especially those related to building compilers and DSLs) across the landscape of computing to make programs faster, safer, and more expressive (especially when I can build a system that others can use).
Before coming to Northeastern, I am incredibly lucky to have attended Grinnell College where I received a BA in Computer Science and Mathematics. While at Grinnell, I worked with Professor Jerod Weinman on automated alignment of historical map images to modern GIS data. I also participated in the Rutgers-DIMACS REU program, where I worked with Professor Eric Allender on proving circuit lower bounds for non-interactive statistical zero-knowledge proof protocols.
I am most easily reached by email: gouwar.j (at) northeastern (dot) edu
Bridging the Gap Between Binary and Source Based Package Management in Spack
Super Computing 2025 Show Abstract Binary package managers install software quickly but they limit configurability due to rigid ABI requirements that ensure compatibility between binaries. Source package managers provide flexibility in building software, but compilation can be slow. For example, installing an HPC code with a new MPI implementation may result in a full rebuild. Spack, a widely deployed, HPC-focused package manager, can use source and pre-compiled binaries, but lacks a binary compatibility model, so it cannot mix binaries not built together. We present splicing, an extension to Spack that models binary compatibility between packages and allows seamless mixing of source and binary distributions. Splicing augments Spack's packaging language and dependency resolution engine to reuse compatible binaries but maintains the flexibility of source builds. It incurs minimal installation-time overhead and allows rapid installation from binaries, even for ABI-sensitive dependencies like MPI that would otherwise require many rebuilds. Arxiv |
Scaling Optimization Over Uncertainty via Compilation
OOPSLA 2025 Show Abstract Probabilistic inference is fundamentally hard, yet many tasks require optimization on top of inference, which is even harder. We present a new optimization-via-compilation strategy to scalably solve a certain class of such problems. In particular, we introduce a new intermediate representation (IR), binary decision diagrams weighted by a novel notion of branch-and-bound semiring, that enables a scalable branch-and-bound based optimization procedure. This IR automatically factorizes problems through program structure and prunes suboptimal values via a straightforward branch-and-bound style algorithm to find optima. Additionally, the IR is naturally amenable to staged compilation, allowing the programmer to query for optima mid-compilation to inform further executions of the program. We showcase the effectiveness and flexibility of the IR by implementing two performant languages that both compile to it: dappl and pineappl. dappl is a functional language that solves maximum expected utility problems with first-class support for rewards, decision making, and conditioning. pineappl is an imperative language that performs exact probabilistic inference with support for nested marginal maximum a posteriori (MMAP) optimization via staging. ArxivDOI |
Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs
OOPSLA 2024 Show Abstract Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as a building block for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming languages. Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages, like OCaml and Racket. These languages enjoy dedicated communities (programming languages research; finance) but are not as well represented in training data as more broadly popular languages. This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. Our approach, called MultiPL-T, translates training data from high-resource languages into training data for low-resource languages in the following way. 1) We use a Code LLM to synthesize tests for commented code from a high-resource language, filtering out faulty tests and code with low test coverage. 2) We use a Code LLM to translate code to a target low-resource language, and use tests to validate the translation. We apply this approach to generate tens of thousands of new, validated training items for Racket, OCaml, and Lua from Python. Moreover, we use an open dataset (The Stack) and model (StarCoderBase), which allow us to decontaminate benchmarks and train models on this data without violating the model license. With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase that achieve state-of-the-art performance for Racket, OCaml, and Lua on benchmark problems. For Lua, our fine-tuned model achieves the same performance as StarCoderBase as Python—a very high-resource language–on the MultiPL-E benchmarks. For Racket and OCaml, we double their performance on MultiPL-E, bringing their performance close to higher-resource languages such as Ruby and C#. The MultiPL-T approach is easy to apply to new languages and can immediately be used on any of the 18+ languages that MultiPL-E supports. Moreover, as we show, it is significantly more efficient and effective than alternates such as training longer. ArxivDOIDatasets |
MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation
IEEE Transactions on Software Engineering (TSE) 2023 Show Abstract Large language models have demonstrated the ability to generate both natural language and programming language text. Although contemporary code generation models are trained on corpora with several programming languages, they are tested using benchmarks that are typically monolingual. The most widely used code generation benchmarks only target Python, so there is little quantitative evidence of how code generation models perform on other programming languages. We propose MultiPL-E, a system for translating unit test-driven code generation benchmarks to new languages. We create the first massively multilingual code generation benchmark by using MultiPL-E to translate two popular Python code generation benchmarks to 18 additional programming languages. We use MultiPL-E to extend the HumanEval benchmark (Chen et al., 2021) and MBPP benchmark (Austin et al., 2021) to 18 languages that encompass a range of programming paradigms and popularity. Using these new parallel benchmarks, we evaluate the multi-language performance of three state-of-the-art code generation models: Codex (Chen et al., 2021), CodeGen (Nijkamp et al., 2022) and InCoder (Fried et al., 2022). We find that Codex matches or even exceeds its performance on Python for several other languages. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible, making it straightforward to evaluate new models, benchmarks, and languages. DOICode |
Cryptographic Hardness under Projections for Time-Bounded Kolmogorov Complexity
Theoretical Computer Science 2023 Show Abstract A version of time-bounded Kolmogorov complexity, denoted Recently, some hardness results for As an application, we provide several improved worst-case to average-case
reductions to problems in |
Deformable Part Models for Automatically Georeferencing Historical Map Images
SIGSPATIAL 2019 Show Abstract Libraries are digitizing their collections of maps from all eras, generating increasingly large online collections of historical cartographic resources. Aligning such maps to a modern geographic coordinate system greatly increases their utility. This work presents a method for such automatic georeferencing, matching raster image content to GIS vector coordinate data. Given an approximate initial alignment that has already been projected from a spherical geographic coordinate system to a Cartesian map coordinate system, a probabilistic shape-matching scheme determines an optimized match between the GIS contours and ink in the binarized map image. Using an evaluation set of 20 historical maps from states and regions of the U.S., the method reduces average alignment RMSE by 12%. DOI |