Linxi "Jim" Fan

Senior Research Scientist
Lead of AI Agents

Hello there!

I am a Senior Research Scientist at NVIDIA and Lead of AI Agents Initiative. My mission is to build generally capable agents across physical worlds (robotics) and virtual worlds (games, simulation). I share insights about AI research & industry extensively on Twitter/X and LinkedIn. Welcome to follow me!

My research explores the bleeding edge of multimodal foundation models, reinforcement learning, computer vision, and large-scale systems. I obtained my Ph.D. degree at Stanford Vision Lab, advised by Prof. Fei-Fei Li. Previously, I interned at OpenAI (w/ Ilya Sutskever and Andrej Karpathy), Baidu AI Labs (w/ Andrew Ng and Dario Amodei), and MILA (w/ Yoshua Bengio). I graduated as the Valedictorian of Class 2016 and received the Illig Medal at Columbia University.

I spearheaded Voyager (the first AI agent that plays Minecraft proficiently and bootstraps its capabilities continuously), MineDojo (open-ended agent learning by watching 100,000s of Minecraft YouTube videos), Eureka (a 5-finger robot hand doing extremely dexterous tasks like pen spinning), and VIMA (one of the earliest multimodal foundation models for robot manipulation). MineDojo won the Outstanding Paper Award at NeurIPS 2022. My works have been widely featured in news media, such as New York Times, Forbes, MIT Technology Review, TechCrunch, The WIRED, VentureBeat, etc.

Fun fact: I was OpenAI’s very first intern in 2016. During that summer, I worked on World of Bits, an agent that perceives the web browser in pixels and outputs keyboard/mouse control. It was way before LLM became a thing at OpenAI. Good old times!

Featured

Research Highlights

Eureka

GPT-4 writes reward functions to teach a 5-finger robot hand how to do extremely dexterous tasks like pen spinning.

Voyager

LLM-powered agent that masters Minecraft by in-context lifelong learning.

VIMA

Multimodal LLM for robot manipulation; unifies diverse robotics tasks in a single prompting framework.

MineDojo

✨NeurIPS Outstanding Paper Award✨. Large-scale open-ended agent learning framework in Minecraft.

Media
Coverage

Publications

Visit my Google Scholar page for a comprehensive listing!

VIMA: General Robot Manipulation with Multimodal Prompts

✨ Oral Presentation ✨. 1st NeurIPS Workshop on Foundation Models for Decision Making (FMDM), 2022

Yunfan Jiang, Agrim Gupta, Zichen Zhang, Guanzhi Wang, Yongqiang Dou, Yanjun Chen, Li Fei-Fei, Anima Anandkumar, Yuke Zhu, Linxi "Jim" Fan

VIMA: General Robot Manipulation with Multimodal Prompts

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

💫✨ Outstanding Paper Award ✨💫. Neural Information Processing Systems (NeurIPS) Dataset & Benchmark, 2022

Linxi "Jim" Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Pre-Trained Language Models for Interactive Decision-Making

✨ Oral Presentation ✨. Neural Information Processing Systems (NeurIPS), 2022

Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi "Jim" Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, Yuke Zhu

Pre-Trained Language Models for Interactive Decision-Making

MetaMorph: Learning Universal Controllers with Transformers

International Conference on Learning Representations (ICLR), 2022

Agrim Gupta, Linxi "Jim" Fan, Surya Ganguli, Li Fei-Fei

MetaMorph: Learning Universal Controllers with Transformers

Training and Deploying Visual Agents at Scale

My Stanford Ph.D. Thesis advised by Prof. Fei-Fei Li. 132 pages in total, enjoy!

Linxi "Jim" Fan

Training and Deploying Visual Agents at Scale

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

International Conference on Machine Learning (ICML), 2021

Linxi "Jim" Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Anima Anandkumar

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021

Bokui Shen, Fei Xia, Chengshu Li, Roberto Martín-Martín, Linxi "Jim" Fan, Guanzhi Wang, Claudia Pérez-D'Arpino, Shyamal Buch, Sanjana Srivastava, Lyne P. Tchapmi, Micael E. Tchapmi, Kent Vainio, Josiah Wong, Li Fei-Fei, Silvio Savarese

iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes

RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition

European Conference on Computer Vision (ECCV), 2020

Linxi "Jim" Fan, Shyamal Buch, Guanzhi Wang, Ryan Cao, Yuke Zhu, Juan Carlos Niebles, Li Fei-Fei

RubiksNet: Learnable 3D-Shift for Efficient Video Action Recognition

SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning

Whitepaper for SURREAL Distributed RL Framework

Linxi "Jim" Fan, Yuke Zhu, Jiren Zhu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, Li Fei-Fei

SURREAL-System: Fully-Integrated Stack for Distributed Deep Reinforcement Learning

SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark

Conference on Robot Learning (CoRL), 2018

Linxi "Jim" Fan, Yuke Zhu, Jiren Zhu, Orien Zeng, Anchit Gupta, Joan Creus-Costa, Silvio Savarese, Li Fei-Fei

SURREAL: Open-Source Reinforcement Learning Framework and Robot Manipulation Benchmark

World of Bits: An Open-Domain Platform for Web-Based Agents

International Conference on Machine Learning (ICML), 2017

Tianlin Shi, Andrej Karpathy, Linxi "Jim" Fan, Jonathan Hernandez, Percy Liang

World of Bits: An Open-Domain Platform for Web-Based Agents

Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin

International Conference on Machine Learning (ICML), 2016

Baidu Silicon Valley AI Lab

Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin

Deconstructing the Ladder Network Architecture

International Conference on Machine Learning (ICML), 2016

Mohammad Pezeshki, Linxi "Jim" Fan, Philemon Brakel, Aaron Courville, Yoshua Bengio

Deconstructing the Ladder Network Architecture

Kernel Approximation Methods for Speech Recognition

Journal of Machine Learning Research (JMLR), 2019
International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016

Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi "Jim" Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha

Kernel Approximation Methods for Speech Recognition

Hybrid Ontology-Learning Materials Engineering System for Pharmaceutical Products

Computers & Chemical Engineering Journal, 2017
AIChE Annual Meeting, 2014

Miguel Francisco Remolona, Matthew Conway, Sriram Balasubramanian, Linxi "Jim" Fan, Ziyan Feng, Tianhao Gu, Hyungtae Kim, Prasad Nirantar, Sarah Panda, Nithin Ranabothu, Neha Rastogi, Venkat Venkatasubramanian

Hybrid Ontology-Learning Materials Engineering System for Pharmaceutical Products

Experience

Research Scientist

NVIDIA

Dec 2021 – Present California

Conducting bleeding edge research on foundation models for general-purpose autonomous agents.
Leading the MineDojo effort for open-ended agent learning in Minecraft.
Mentoring interns on diverse research topics.
Collaborating with universities: Stanford, Berkeley, Caltech, MIT, UW, etc.

Research Intern

NVIDIA

Jun 2020 – Sep 2020 California

Proposed SECANT, a state-of-the-art policy learning algorithm for zero-shot generalization of visual agents to novel environments.
Paper published at ICML 2021.

Research Intern

Google Cloud AI

Jun 2018 – Sep 2018 California

Created SURREAL, an open-source, full-stack, and high-performance distributed reinforcement learning (RL) framework for large-scale robot learning.
Paper published at CoRL 2018. Best Presentation Award finalist.

Ph.D. in Computer Science

Stanford Vision Lab

Sep 2016 – Sep 2021 California

Doctoral advisor: Prof. Fei-Fei Li.
Ph.D. Thesis “Training and Deploying Visual Agents at Scale”.

Research Intern

OpenAI

Jun 2016 – Mar 2017 California

Co-designed World of Bits, an open-domain platform for teaching AI to use the web browser. World of Bits was part of the OpenAI Universe initiative.
Paper published at ICML 2017.

Research Assistant

Mila-Quebec AI Institute

Sep 2015 – Mar 2016 Montréal, Quebec, Canada

Systematically analyzed and proposed novel variants of the Ladder Network, a strong semi-supervised deep learning technique.
Mentored by Turing Award Laureate Yoshua Bengio.
Paper published at ICML 2016.

Research Intern

Baidu Silicon Valley AI Lab

May 2015 – Sep 2015 California

Co-developed DeepSpeech 2, a large-scale end-to-end system that achieved world-class performance on English and Chinese speech recognition.
Mentored by Dario Amodei, Adam Coates, and Andrew Ng.
Paper published at ICML 2016.
DeepSpeech and derivative works have been featured in various media: MIT Technology Review, TechCrunch, Forbes, NPR, VentureBeats, etc.

Research Assistant

Columbia University

Sep 2013 – Dec 2014 New York City

Columbia NLP Group, advised by Prof. Michael Collins. Studied kernel methods for speech recognition. Paper published in Journal of Machine Learning Research.
Columbia Vision Lab, advised by Prof. Shree Nayar. Implemented a computer vision system in Matlab to infer astrophysics parameters from galactic images.
Columbia CRIS Lab, advised by Prof. Venkat Venkatasubramanian. Developed ML and NLP techniques to automate ontology curation for pharmaceutical engineering. Paper published in Computers & Chemical Engineering.

Linxi "Jim" Fan

Senior Research Scientist
Lead of AI Agents

NVIDIA AI

Follow @DrJimFan

Hello there!

Featured

Media
Coverage

Publications

Experience

Contact

Linxi "Jim" Fan

Senior Research ScientistLead of AI Agents

NVIDIA AI

Follow @DrJimFan

Hello there!

Featured

MediaCoverage

Publications

Experience

Contact

Senior Research Scientist
Lead of AI Agents

Media
Coverage