Linxi "Jim" Fan

Linxi "Jim" Fan

Research Scientist

NVIDIA AI

Hello there!

I am a research scientist at NVIDIA AI. My primary focus is to develop generally capable autonomous agents. To tackle this grand challenge, my research efforts span foundation models, policy learning, robotics, multimodal learning, and large-scale systems. I obtained my Ph.D. degree at Stanford Vision Lab, advised by Prof. Fei-Fei Li. Previously, I did research internships at NVIDIA, Google Cloud AI, OpenAI, Baidu Silicon Valley AI Lab, and Mila-Quebec AI Institute. I was the Valedictorian of Class 2016 and a recipient of the Illig Medal at Columbia University. Feel free to follow me on for latest research announcements and team updates!

News

  • Nov. 2022: has won 🎉 Outstanding Paper Award 🎉 at NeurIPS [announcement]! I am also invited as a Speaker at the 1st NeurIPS Foundation Model for Decision Making (FMDM) workshop — please join us at New Orleans!

  • Oct. 2022: We trained a transformer called VIMA that ingests multimodal prompt and outputs controls for a robot arm. A single agent is able to solve visual goal, one-shot imitation from video, novel concept grounding, visual constraint, etc. Strong scaling with model capacity and data! We open-source everything: code, pretrained models, training dataset, and simulation benchmark. Check out our paper and website!

  • Jun. 2022: has launched! MineDojo is a new framework for building generally capable agents with internet-scale knowledge in Minecraft. Paper, code, and databases are all open access. Check it out today!

Interests
  • Foundation Models
  • General-purpose Agents
  • Reinforcement Learning
  • Robotics
  • Multimodal Learning
  • Large-scale AI Systems
Education
  • Ph.D. in Computer Science, 2016 - 2021

    Stanford University

  • B.S. in Computer Science, 2012 - 2016

    Columbia University, Summa Cum Laude

  • Valedictorian of Class 2016

    Columbia University

Featured

Research Highlights

VIMA
A transformer that ingests multimodal prompt and controls a robot arm for a wide range of manipulation tasks.
MineDojo
NeurIPS Outstanding Paper Award✨. Open-ended generalist agent.
MetaMorph
Learning universal controller over diverse morphologies with transformers
Training and Deploying Visual Agents at Scale
My Stanford Ph.D. thesis advised by Fei-Fei Li. Learn the rules like a pro, so you can break them like an artist — Picasso.
Training and Deploying Visual Agents at Scale

Publications

Visit my Google Scholar page for a comprehensive listing!

*
Pre-Trained Language Models for Interactive Decision-Making
Oral Presentation ✨. Neural Information Processing Systems (NeurIPS), 2022
Pre-Trained Language Models for Interactive Decision-Making
iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes
RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition
European Conference on Computer Vision (ECCV), 2020
RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition
SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning
Whitepaper for SURREAL Distributed RL Framework
SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning
SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark
Conference on Robot Learning (CoRL), 2018
SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark
Deconstructing the Ladder Network Architecture
International Conference on Machine Learning (ICML), 2016
Deconstructing the Ladder Network Architecture
Kernel Approximation Methods for Speech Recognition
Journal of Machine Learning Research (JMLR), 2019
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016
Kernel Approximation Methods for Speech Recognition
Hybrid Ontology-Learning Materials Engineering System for Pharmaceutical Products
Computers & Chemical Engineering Journal, 2017
AIChE Annual Meeting, 2014
Hybrid Ontology-Learning Materials Engineering System for Pharmaceutical Products

Experience

 
 
 
 
 
NVIDIA
Research Scientist
Dec 2021 – Present California
  • Conducting bleeding edge research on foundation models for general-purpose autonomous agents.
  • Leading the MineDojo effort for open-ended agent learning in Minecraft.
  • Mentoring interns on diverse research topics.
  • Collaborating with universities: Stanford, Berkeley, Caltech, MIT, UW, etc.
 
 
 
 
 
NVIDIA
Research Intern
Jun 2020 – Sep 2020 California
  • Proposed SECANT, a state-of-the-art policy learning algorithm for zero-shot generalization of visual agents to novel environments.
  • Paper published at ICML 2021.
 
 
 
 
 
Google Cloud AI
Research Intern
Jun 2018 – Sep 2018 California
  • Created SURREAL, an open-source, full-stack, and high-performance distributed reinforcement learning (RL) framework for large-scale robot learning.
  • Paper published at CoRL 2018. Best Presentation Award finalist.
 
 
 
 
 
Stanford Vision Lab
Ph.D. in Computer Science
Sep 2016 – Sep 2021 California
 
 
 
 
 
OpenAI
Research Intern
Jun 2016 – Mar 2017 California
  • Co-designed World of Bits, an open-domain platform for teaching AI to use the web browser. World of Bits was part of the OpenAI Universe initiative.
  • Paper published at ICML 2017.
 
 
 
 
 
Mila-Quebec AI Institute
Research Assistant
Sep 2015 – Mar 2016 Montréal, Quebec, Canada
  • Systematically analyzed and proposed novel variants of the Ladder Network, a strong semi-supervised deep learning technique.
  • Mentored by Turing Award Laureate Yoshua Bengio.
  • Paper published at ICML 2016.
 
 
 
 
 
Baidu Silicon Valley AI Lab
Research Intern
May 2015 – Sep 2015 California
 
 
 
 
 
Columbia University
Research Assistant
Sep 2013 – Dec 2014 New York City
  • Columbia NLP Group, advised by Prof. Michael Collins. Studied kernel methods for speech recognition. Paper published in Journal of Machine Learning Research.
  • Columbia Vision Lab, advised by Prof. Shree Nayar. Implemented a computer vision system in Matlab to infer astrophysics parameters from galactic images.
  • Columbia CRIS Lab, advised by Prof. Venkat Venkatasubramanian. Developed ML and NLP techniques to automate ontology curation for pharmaceutical engineering. Paper published in Computers & Chemical Engineering.