Research

LLM Research

I am currently working on LLM pretraining, finetuning, and retrieval on the Mosaic research team at Databricks.

As a Research Scientist at MosaicML, I was part of the team that pretrained and finetuned the open-source large language models MPT-7B and MPT-30B and DBRX.

Some recent projects include “LoRA Learns Less and Forgets Less” with Dan Biderman, “Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws” with Nikhil Sardana, and “LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms” (NeurIPS 2023 Workshop) with Aditi Jha.

Back when the MosaicML NLP team consisted of only 9 researchers, we did some work on optimizing BERT pretraining. Here is our detailed blog post and report: “MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining” (NeurIPS 2023). We used a lot of the insights from this work to build MPT-7B and MPT-30B.

As a ML Research Intern at MosaicML, I worked on cyclic learning rate schedules for estimating training efficiency. Our work is summarized in this blogpost “Efficiently Estimating Pareto Frontiers with Cyclic Learning Rate Schedules” and this workshop paper Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates.

This talk by Jonathan Frankle gives an overview of some of MosaicML’s early research.

Brain Machine Interfaces and Biological Learning Rules

During my PhD I worked on biologically plausible learning in recurrent neural networks (RNNs), reinforcement learning (RL), and motor control with James M. Murray.

How does neural activity change during motor learning, and what does it say about the underlying mechanisms? In our recent NeurIPS 2022 paper “Distinguishing Learning Rules with Brain Machine Interfaces”, we derive a metric to distinguish between learning rules by observing changes in neural activity during learning, given that the mapping from brain to behavior is known by the experimenter. Because brain-machine interface (BMI) experiments allow for perfect knowledge of this mapping, we focus on modeling a cursor-control BMI task using recurrent neural networks, showing that learning rules can be distinguished in simulated experiments using only observations that a neuroscience experimenter would plausibly have access to.

The Fly Brain

For a large part of my PhD, I worked on a project with Rudy Behnia, Larry Abbott and Jessica Kohn on the neural computation of motion in Drosophila eyes. Our paper “Flexible filtering by neural inputs supports motion computation across states and stimuli” was published in Current Biology. Here is a Current Biology “Dispatch” that summarizes this work: Motion vision: Pinning down motion computation in an ever-changing circuit

Our work is summarized in this research talk:

drawing

Some of my pre-PhD work in the Hillman Lab investigated patterns of neural activation and blood flow (i.e. neurovascular coupling) in the rodent cortex.

In a previous life, I wrote a review-style master’s thesis on superconducting qubits for quantum computing.


LLM Research

Introducing DBRX: A New State-of-the-Art Open LLM (see this fun story Inside the Creation of the World’s Most Powerful Open Source AI Model)

Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs

MPT-30B: Raising the bar for open-source foundation models

LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms

Publications

  1. “LoRA Learns Less and Forgets Less” Dan Biderman, Jose Gonzalez Ortiz, Jacob Portes, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, Cody Blakeney, John P. Cunningham [preprint]

  2. “Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws” Nikhil Sardana, Jacob Portes, Sasha Doubov, Jonathan Frankle (ICML 2024) [paper]

  3. “LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms” Aditi Jha, Sam Havens, Jeremey Dohmann, Alex Trott, Jacob Portes (NeurIPS 2023 Workshop) [preprint] [website] [code]

  4. “MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining” Jacob Portes, Alexander R Trott, Sam Havens, DANIEL KING, Abhinav Venigalla, Moin Nadeem, Nikhil Sardana, Daya Khudia, Jonathan Frankle (NeurIPS 2023)

  5. “Distinguishing Learning Rules with Brain Machine Interfaces” Jacob P. Portes, Christian Schmid, James M. Murray (NeurIPS 2022) [preprint] [code]

  6. “Fast Benchmarking of Accuracy vs. Training Time with Cyclic Learning Rates” Jacob Portes, Davis Blalock, Cory Stephenson, Jonathan Frankle (“Has it Trained Yet?” NeurIPS 2022 Workshop) [paper] [preprint] [blogpost]

  7. “Flexible Computation in Neural Circuits” Jacob P. Portes (PhD Thesis, 2022) [dissertation]

  8. “Flexible filtering by neural inputs supports motion computation across states and stimuli” Jessica R. Kohn*, Jacob P. Portes*, Matthias P. Christenson, LF Abbott, Rudy Behnia, (Current Biology, 2021) (*equal contribution) [article] [preprint] [code]

  9. “Resting-state hemodynamics are spatiotemporally coupled to synchronized and symmetric neural activity in excitatory neurons” Ying Ma, Mohammed A. Shaik, Mariel G. Kozberg, Sharon H. Kim, Jacob P. Portes, Dmitriy Timmerman, Elizabeth M.C. Hillman (PNAS, 2016) [article]

Master’s Thesis

  1. “Decoherence, Superconducting Qubits, and the Possibility of Scalable Quantum Computing” supervised by Allan Blaer in the Columbia Physics department and with the kind encouragement of Anargyros Papageorgiou in the CS department
    • Abstract Is it possible to implement a fully controllable, unambiguously quantum computer? While most in the field believe that the answer is in the affirmative, uncertainty and skepticism still exist among academics and industry professionals. In particular, decoherence is often spoken of as an insurmountable challenge. This thesis argues that there are no fundamental mathematical or physical properties that would preclude the possibility of implementing a fully controllable quantum computer using superconducting qubits. The proof is in key results from the past 30 years in math, physics and computer science; this thesis is a sketch of these results. It begins with the well known theoretical results that have motivated the field - namely quantum algorithmic speed up and efficient error correction - and continues with an overview of the well developed theory of decoherence, arguing that decoherence has been and can still be significantly reduced. These theoretical results are related to superconducting qubits throughout. The thesis concludes with a summary of recent experimental progress with superconducting qubit circuits.