Paroma Varma

varma [dot] paroma [at] gmail [dot] com - @paroma_varma

I’m excited to be spending my time as a co-founder with the amazing team at Snorkel AI! We’re building Snorkel Flow, the data-first platform for building, managing, and monitoring end-to-end AI applications.

I graduated from the Stanford Ph.D. program, where I was advised by Prof. Christopher Ré and affiliated with the DAWN, SAIL, and StatML groups. I was supported by the Stanford Graduate Fellowship and the National Science Foundation Graduate Research Fellowship. My research interests revolved around weak supervision, or using high-level knowledge in the form of noisy labeling sources to efficiently label massive datasets required to train machine learning models. In this context, I’m also interested in using developer exhaust, byproducts of the data analytics pipeline, to simplify complex statistical and search-based problems.

Selected Projects

Multi-Resolution Weak Supervision for Sequential Data
NeurIPS 2019

We present a framework to apply weak supervision to multi-resolution data like videos and time-series data that can handle sequential correlations among supervision sources. We experimentally validate our system over population-level video datasets and gait sensor data. [pdf]

Scene Graph Prediction with Limited Labels
ICCV 2019

Vincent Chen and I use weak supervision to automatically label rare visual relationships in the benchmark Visual Genome dataset. We find that spatial and categorical information are enough to generate training labels that can train state-of-the-art scene graph models. [pdf] [code]

Learning Dependency Structures for Weak Supervision Models
ICML 2019

Fred Sala and I use a robust PCA-based algorithm to learn dependency structures for weak supervision sources without using any labeled data. We take advantage of the sparsity pattern in the structure and improve the sample complexity of existing efforts. [pdf] [code]

Snuba: Automating Weak Supervision to Label Training Data
VLDB 2019

We introduce a weak supervision system that takes as input a small, labeled dataset and a larger unlabeled dataset and assigns training labels to the latter automatically. This method outperforms weak supervision with user-defined heuristics and crowdsourcing in many cases. [pdf] [code]

Babble Labble: Learning from Natural Language Explanations
ACL 2018, NeurIPS 2017 DEMO

Braden Hancock and I use natural language explanations to label training data efficiently. We find that collecting explanations allows us to build high quality training sets much faster than collecting labels alone. [pdf] [code] [blogpost] [demo video]

Efficient Model Search using Log Data

We explore how to use developer exhaust in the form of logs generated while training deep learning models to predict the performance of models with different architectures. [pdf]

Coral: Enriching Statistical Models with Static Analysis
NeurIPS 2017, NeurIPS ML4H 2017, MED-NeurIPS 2017

We introduce a weak supervision framework to efficiently label image and video training data given a small set of user-defined heuristics. We identify correlations among heuristics using static analysis and assign probabilistic labels to training data accordingly. [pdf] [blogpost] [video]

Socratic Learning: Finding Latent Subsets in Training Data
HILDA @ SIGMOD 2017, NeurIPS FILM 2016

We explore how we can find latent subsets in training data that affect the behavior of weak supervision sources. We improve upon existing relation extraction and sentiment analysis tasks and make these latent subsets interpretable for users. [pdf] [workshop] [blogpost] [video]



Multi-Resolution Weak Supervision for Sequential Data
Neural Information Processing Systems (NeurIPS), 2019

Scene Graph Prediction with Limited Labels
Vincent Chen, Paroma Varma, Ranjay Krishna, Michael Bernstein, Christopher Ré, Fei-Fei Li
International Conference on Computer Vision (ICCV), 2019

Learning Dependency Structures for Weak Supervision Models
Paroma Varma+, Fred Sala+, Ann He, Alex Ratner, Christopher Ré
International Conference on Machine Learning (ICML), 2019

Weakly supervised classification of rare aortic valve malformations using unlabeled cardiac MRI sequences
Jason Fries, Paroma Varma, Vincent Chen, Ke Xiao, Heliodoro Tejeda, Saha Priyanka, Jared Dunnmon, Henry Chubb, Shiraz Maskatia, Madalina Fiterau, Scott Delp, Euan Ashley, Christopher Ré and James Priest.
Nature Communications, 2019

Snuba: Automating Weak Supervision to Label Training Data
Paroma Varma and Christopher Ré.
International Conference on Very Large Databases (VLDB), 2019


Training Classifiers with Natural Language Explanations
Braden Hancock, Paroma Varma, Stephanie Wang, Percy Liang and Christopher Ré.
In Association for Computational Linguistics (ACL), 2018

Exploring the Utility of Developer Exhaust
Jian Zhang, Max Lam, Stephanie Wang, Paroma Varma, Luigi Nardi, Kunle Olukotun and Christopher Ré.
In Workshop on Data Management for End-to-End Machine Learning (DEEM) at SIGMOD, 2018


Inferring Generative Model Structure with Static Analysis
Paroma Varma, Bryan He, Payal Bajaj, Imon Banerjee, Nishith Khandwala, Daniel L. Rubin and Christopher Ré.
In Neural Information Processing Systems (NeurIPS), 2017

Automated Training Set Generation for Aortic Valve Classification
Vincent Chen, Paroma Varma, Madalina Fiterau, James Priest and Christopher Ré.
In Machine Learning for Health (ML4H), Neural Information Processing Systems (NeurIPS), 2017

Generating Training Labels for Cardiac Phase-Contrast MRI Images
Vincent Chen, Paroma Varma, Madalina Fiterau, James Priest and Christopher Ré.
In Medical Imaging meets NeurIPS (MED-NeurIPS), 2017

Augmenting Generative Models to Incorporate Latent Subsets in Training Data
Paroma Varma, Bryan He, Dan Iter, Peng Xu, Rose Yu, Christopher De Sa, Christopher Ré

Flipper: A Systematic Approach to Debugging Training Sets
Paroma Varma, Dan Iter, Christopher De Sa and Christopher Ré.
In Workshop on Human-In-the-Loop Data Analytics (HILDA) at SIGMOD, 2017


Socratic Learning
Paroma Varma, Rose Yu, Dan Iter, Christopher De Sa, Christopher Ré
In Future of Interactive Learning Machines Workshop (FILM), Neural Information Processing Systems (NeurIPS), 2016

Efficient 3D Deconvolution Microscopy with Proximal Algorithms
Paroma Varma, Gordon Wetzstein
In Computational Optical Sensing and Imaging, Imaging and Applied Optics, 2016

Nonlinear Optimization Algorithm for Partially Coherent Phase Retrieval and Source Recovery
Jingshan Zhong, Lei Tian, Paroma Varma, Laura Waller
In IEEE Transactions on Computational Imaging, 2016


Source Shape Estimation in Partially Coherent Phase Imaging with Defocused Intensity
Jingshan Zhong, Paroma Varma, Lei Tian, Laura Waller
In Computational Optical Sensing and Imaging, Imaging and Applied Optics, 2015

Design of a Domed LED Illuminator for High-Angle Computational Illumination
Zachary Phillips, Gautam Gunjala, Paroma Varma, Jingshan Zhong, Laura Waller
In Imaging Systems and Applications, 2015

In the Past

Previously, I worked on problems related to computational imaging. As an undergraduate at UC Berkeley, I studied phase retrieval via partial coherence illumination and digital holography in Prof. Laura Waller’s Computational Imaging Lab. I also rotated with Prof. Gordon Wetzstein’s Computational Imaging Group and looked at solving 3D deconvolution problems more efficiently.


At UC Berkeley, I was a teaching assistant for the first offering of EE16A: Designing Information Devices and Systems and helped develop course material for the class as well. I was also a teaching assistant for EE20: Structure and Interpretation of Signals and Systems.

This page was generated using Jekyll and uses CSS from Kevin Burke. Icons made by Freepik from is licensed by CC 3.0 BY