LLM Research
I am currently working on LLM pretraining, finetuning, and retrieval on the Mosaic research team at Databricks.
As a Research Scientist at MosaicML, I was part of the team that pretrained and finetuned the open-source large language models MPT-7B and MPT-30B and DBRX.
Some recent projects include “LoRA Learns Less and Forgets Less” with Dan Biderman, “Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws” with Nikhil Sardana, and “LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms” (NeurIPS 2023 Workshop) with Aditi Jha.
Back when the MosaicML NLP team consisted of only 9 researchers, we did some work on optimizing BERT pretraining. Here is our detailed blog post and report: “MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining” (NeurIPS 2023). We used a lot of the insights from this work to build MPT-7B and MPT-30B.
As a ML Research Intern at MosaicML, I worked on cyclic learning rate schedules for estimating training efficiency. Our work is summarized in this blogpost “Efficiently Estimating Pareto Frontiers with Cyclic Learning Rate Schedules” and this workshop paper Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates.
This talk by Jonathan Frankle gives an overview of some of MosaicML’s early research.
Brain Machine Interfaces and Biological Learning Rules
During my PhD I worked on biologically plausible learning in recurrent neural networks (RNNs), reinforcement learning (RL), and motor control with James M. Murray.
How does neural activity change during motor learning, and what does it say about the underlying mechanisms? In our recent NeurIPS 2022 paper “Distinguishing Learning Rules with Brain Machine Interfaces”, we derive a metric to distinguish between learning rules by observing changes in neural activity during learning, given that the mapping from brain to behavior is known by the experimenter. Because brain-machine interface (BMI) experiments allow for perfect knowledge of this mapping, we focus on modeling a cursor-control BMI task using recurrent neural networks, showing that learning rules can be distinguished in simulated experiments using only observations that a neuroscience experimenter would plausibly have access to.
The Fly Brain
For a large part of my PhD, I worked on a project with Rudy Behnia, Larry Abbott and Jessica Kohn on the neural computation of motion in Drosophila eyes. Our paper “Flexible filtering by neural inputs supports motion computation across states and stimuli” was published in Current Biology. Here is a Current Biology “Dispatch” that summarizes this work: Motion vision: Pinning down motion computation in an ever-changing circuit
Our work is summarized in this research talk:
Some of my pre-PhD work in the Hillman Lab investigated patterns of neural activation and blood flow (i.e. neurovascular coupling) in the rodent cortex.
In a previous life, I wrote a review-style master’s thesis on superconducting qubits for quantum computing.
Introducing DBRX: A New State-of-the-Art Open LLM (see this fun story Inside the Creation of the World’s Most Powerful Open Source AI Model)
Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs
MPT-30B: Raising the bar for open-source foundation models
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
“LoRA Learns Less and Forgets Less” Dan Biderman, Jose Gonzalez Ortiz, Jacob Portes, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, Cody Blakeney, John P. Cunningham [preprint]
“Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws” Nikhil Sardana, Jacob Portes, Sasha Doubov, Jonathan Frankle (ICML 2024) [paper]
“LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms” Aditi Jha, Sam Havens, Jeremey Dohmann, Alex Trott, Jacob Portes (NeurIPS 2023 Workshop) [preprint] [website] [code]
“MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining” Jacob Portes, Alexander R Trott, Sam Havens, DANIEL KING, Abhinav Venigalla, Moin Nadeem, Nikhil Sardana, Daya Khudia, Jonathan Frankle (NeurIPS 2023)
“Distinguishing Learning Rules with Brain Machine Interfaces” Jacob P. Portes, Christian Schmid, James M. Murray (NeurIPS 2022) [preprint] [code]
“Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates” Jacob Portes, Davis Blalock, Cory Stephenson, Jonathan Frankle (“Has it Trained Yet?” NeurIPS 2022 Workshop) [paper] [preprint] [blogpost]
“Flexible Computation in Neural Circuits” Jacob P. Portes (PhD Thesis, 2022) [dissertation]
“Flexible filtering by neural inputs supports motion computation across states and stimuli” Jessica R. Kohn*, Jacob P. Portes*, Matthias P. Christenson, LF Abbott, Rudy Behnia, (Current Biology, 2021) (*equal contribution) [article] [preprint] [code]
“Resting-state hemodynamics are spatiotemporally coupled to synchronized and symmetric neural activity in excitatory neurons” Ying Ma, Mohammed A. Shaik, Mariel G. Kozberg, Sharon H. Kim, Jacob P. Portes, Dmitriy Timmerman, Elizabeth M.C. Hillman (PNAS, 2016) [article]