Bio

I am Fan LIU, a graduate student at HKUST(GZ). My research focuses on LLM agents for data science, particularly code generation for data science. Broadly, I study how to build autonomous data science agents that can perceive and reason over data, make effective modeling decisions, and operate reliably in real-world environments. My work spans three main directions:

Data-centric: methods that improve how agents perceive, interpret, reason over, and adapt to data, providing the perceptual foundation for autonomous data science agents. Representative examples include multimodal data perception for scientific discovery ([ICLR 2026] Towards Multimodal Data-Driven Scientific Discovery Powered by LLM Agents) and data governance mechanisms such as unlearning in graph-structured environments ([WWW 2025] Subgraph Federated Unlearning, OpenReview).
Model-centric: methods that strengthen the modeling, reasoning, and decision-making capabilities of autonomous data science agents. Representative examples include agentic mathematical modeling ([NeurIPS 2025] MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem, arXiv, Code) and improving inference-time computation for LLM reasoning ([NeurIPS 2025] Bag of Tricks for Inference-time Computation of LLM Reasoning, arXiv, Code).
Systems-centric: infrastructure that enables autonomous data science agents to operate reliably in real-world environments, including execution infrastructure for data and environment interaction, control infrastructure for long-horizon task execution, and platform infrastructure for scalable training, deployment, and governance. A representative system is DSLighting, an end-to-end data science agent platform (PyPI).

I have published 10+ papers at top-tier venues, including ICLR, NeurIPS, KDD, WWW, ECML PKDD, ICMLW, and TFS, with over 1,000 citations overall. For more details, please refer to [Google Scholar]. If you are interested in my research, feel free to reach out for discussions, collaborations, internship opportunities, or related inquiries.

We recently organized our data agent research into DataNova, a family of autonomous and self-evolving data-science agents for real-world mathematical modeling, multimodal scientific discovery, and end-to-end data analysis.

I am on the job market for postdoctoral and industry positions.

Email: liufanuestc AT DOT com

Latest Blog

什么是 AI Data Scientists？
一篇简短的中英文博客，介绍 AI Data Scientists 的定义、用途、与传统 AutoML 的区别、核心框架、挑战和代表性 benchmark。

What Are AI Data Scientists?
A short bilingual blog post introducing the definition, use cases, differences from traditional AutoML, core framework, challenges, and representative benchmarks of AI Data Scientists.

Read the latest blog / 阅读最新博客

Selected Works

Data Agent Project Family

DataNova

DataNova collects our recent work on autonomous and self-evolving data-science agents, including MM-Agent, multimodal scientific discovery, EvoDS, DSWorld, foundation models for scientific discovery, and DS-Lighting.

Project MM-Agent Multimodal Discovery EvoDS DSWorld

NeurIPS 2025

MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

MM-Agent is an LLM agent framework for real-world mathematical modeling. It decomposes open-ended modeling into problem analysis, model formulation, computational solving, and report generation, enabling end-to-end solutions for real-world mathematical modeling tasks.

Paper Code Demo 606 stars

Data Science Agent Harness

DSLighting

DSLighting is an LLM-driven autonomous data science execution engine that turns task descriptions and datasets into iterative code generation, execution, evaluation, and refinement workflows.

Code Docs 48 stars

Recent Works

Expand / collapse list

(* Equal contribution)

[Arxiv] Zherui Yang, Fan Liu, Hao Liu*. DSWorld: A Data Science World Model for Efficient Autonomous Agents. arXiv, 2026. [arXiv], [pdf], [Code]
[KDD] Zherui Yang, Fan Liu, Yansong Ning and Hao Liu*. EvoDS: Self-Evolving Autonomous Data Science Agent with Capability Learning and Context Management. In Proceedings of the 32nd SIGKDD Conference on Knowledge Discovery and Data Mining, Jeju, South Korea, 2026. [arXiv], [Code] (CCF A)
[ICLR] Fan Liu, Xiaozhao Zeng and Hao Liu. Towards Multimodal Data-Driven Scientific Discovery Powered by LLM Agents. In Proceedings of the Fourteenth International Conference on Learning Representations, Rio de Janeiro, Brazil, 2026. [OpenReview]
[NeurIPS] Fan Liu, Jindong Han, Tengfei Lyu, Weijia Zhang, Zhe-Rui Yang, Lu Dai, Cancheng Liu, Hao Liu, Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition, NeurIPS, 2025. [pdf], [Project] (CCF A) Position, Acceptance rate~6%
[NeurIPS] Fan Liu*, Zherui Yang*, Cancheng Liu, Tianrui Song, Xiaofeng Gao, Hao Liu, MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem, NeurIPS, 2025. [OpenReview], [pdf], [Code], [Demo] (CCF A) 🔥🚀 Our MM-Agent system assists two undergraduate teams awarded F Award in 2025 MCM/ICM (top 2.0% among 27,456 human teams)
[NeurIPS] Fan LIU, Wenshuo Chao, Naiqiang Tan, Hao Liu, Bag of Tricks for Inference-time Computation of LLM Reasoning, NeurIPS D&B, 2025. [OpenReview], [pdf], [Code] (CCF A)
[WWW] Fan LIU, Hao Liu, Subgraph Federated Unlearning, WWW, 2025. [DOI], [OpenReview] (CCF A, Oral)
[Arxiv] Fan LIU, Yue Feng, Zhao Xu, Lixin Su, Xinyu Ma, Dawei Yin, Hao Liu, JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework, Arxiv, 2024. [Project Page], [OpenReview], [pdf], [Code], [Dataset], [Model], [Coverage] 🔥🚀 Model 6000+ Downloads
[NeurIPS] Zhao Xu, Fan LIU, Hao Liu, Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs, NeurIPS D&B, 2024. [pdf], [Code], [Coverage] (CCF A)
[KDD] Fan LIU, Weijia Zhang, Hao Liu, Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic Adversarial Training, KDD, 2023. [arXiv] (CCF A)
[NeurIPS] Fan LIU, Hao Liu, Wenzhao Jiang, Practical Adversarial Attacks on Spatiotemporal Traffic Forecasting Models, NeurIPS, 2022. [pdf], [Blog], [Code] (CCF A)

Education and Experience

Expand / collapse list

2022: Graduate student at HKUST(GZ)
2021: Intern at HKUST(GZ)
2020: Intern at MSRA (StarBridge Program)
2020: B.S. from UESTC
2019: Research visit at UBC

Awards, Acknowledgements, and Services

Expand / collapse list

Reviewer for Conference: ICLR 2024-2025, NeurIPS 2023-2024, KDD 2023-2025, WWW 2025, AISTATS 2025, AdvML-Frontiers (ICML 2023 Workshop), FL4Data-Mining (KDD 2023 Workshop)
Reviewer for Journal: ITS, Transactions On SMC: Systems, Physica A, TFS, TII
TPC member: FL4Data-Mining (KDD 2023 Workshop)
KDD Student Travel Award (2023)
RBM Student Travel Grant (2023)
Outstanding Undergraduate Thesis Award
Outstanding Undergraduate Student
Excellent Student Scholarship (2017-2020)