Sicong Jiang

Shot in Tamarindo, Costa Rica

Sicong Jiang

Final-year PhD @ McGill | Student Researcher @ Google DeepMind

I am a PhD candidate at McGill University and a Research Intern at Google DeepMind (mentored by Jingling Li, Lucia Lopez Rivilla, and Edward Grefenstette), focusing on reward modeling and self-improving agents. Previously, I led research at Abaka AI as a Founding Scientist, where I architected and scaled data pipelines that produced mission-critical datasets for frontier AI labs.

My research addresses the fundamental challenge of building reliable AI agents through the lens of automated evaluation and reward modeling. I develop structured benchmarks and reward models—such as EditReward (ICLR 2026) and AgentThink (EMNLP 2025)—to enhance long-horizon reasoning and robustness. My goal is to create high-fidelity feedback loops that enable agents to self-improve.

📢 Actively looking for full-time Research Scientist / MLE roles starting in Fall 2026. Happy to connect!


Scholar  •   LinkedIn  •   GitHub  •   X  •   WeChat  •   Email  •   Resume

Mar 2026

🚀 Excited to join Google DeepMind as a Research Intern in London, UK.

Feb 2026

🎉 Two papers accepted by CVPR 2026 (one Main + one Findings). Check ChartNet, EgoTL.

Feb 2026

🎉 Two papers accepted by ICRA 2026. Check FASIONAD+, MTRDrive.

Jan 2026

🎉 One paper accepted by ICLR 2026. Check EditReward.

Nov 2025

🎉 One paper accepted (oral) by Bridge Program of AAAI 2026.

Aug 2025

🎉 One paper accepted by EMNLP 2025. Check AgentThink.

Aug 2025

🤝 Joined 2077AI-Foundation—thrilled to contribute to the AI open-source community!

Jul 2025

🚀 Joined Abaka AI as a Founding Technical Member in Palo Alto, California.

Jul 2025

🎉 One paper accepted by ICCV 2025 Foundation Models for AD Workshop. Check VLA4AD Survey.

Mar 2025

✉️ Invited to contribute to Humanity's Last Exam, an AGI reasoning benchmark.

Feb 2025

🎉 One paper accepted by ICLR 2025 Trustworthy LLM Workshop. Check SparseAttack-LLM4TS.

Jan 2025

🎉 One paper accepted by AISTATS 2025. Check Attack-LLM4TS.
* indicates equal contribution. For full list, visit Google Scholar.

AI Agents, Benchmarks & Evaluation

EditReward pipeline

EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing

K. Wu*, S. Jiang*, M. Ku, P. Nie, M. Liu, W. Chen
ICLR 2026
Website  •  Paper  •  GitHub ⭐ 138

AgentThink: Tool-Augmented Reasoning in VLMs for Autonomous Driving

AgentThink: Tool-Augmented Reasoning in VLMs for Autonomous Driving

K. Qian*, S. Jiang*, Y. Zhong*, Z. Luo, Z. Huang, et al.
EMNLP 2025
Website  •  Paper  •  GitHub ⭐ 142

EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks

EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks

L. Liu, D. Li, Y. Liang, S. Jiang, H. Vijay, H. Hu, et al.
CVPR 2026 Findings
Website

Foundation Models: Robustness, Safety & Applications

Survey on Vision-Language-Action Models for Autonomous Driving

A Survey on Vision–Language–Action Models for Autonomous Driving

S. Jiang*, Z. Huang*, K. Qian*, Z. Luo, T. Zhu, et al.
ICCV Workshop, 2025
Paper  •  GitHub ⭐ 579  •  Tech Channel Report

Attack-LLM4TS

Adversarial Vulnerabilities in Large Language Models for Time Series Forecasting

F. Liu*, S. Jiang*, L. Miranda-Moreno, S. Choi, L. Sun
AISTATS 2025
Paper  •  GitHub ⭐ 15

FASIONAD+ framework

FASIONAD+: Enhanced Safety in Autonomous Driving with Adaptive Feedback

Z. Luo*, S. Jiang*, K. Qian*, Z. Huang, J. Miao, et al.
ICRA 2026
Paper

MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving

MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving in Corner Cases

Z. Luo*, K. Qian*, J. Wang, Y. Luo, J. Miao, Z. Fu, Y. Wang, S. Jiang, Z. Huang, et al.
ICRA 2026
Paper

Communication-Aware RL

Communication-Aware Reinforcement Learning for Cooperative Adaptive Cruise Control

S. Jiang, S. Choi, L. Sun
TRB Annual Meeting (Oral), 2024
Paper

Research Intern
Mar 2026 – Present · London, UK

LLM-as-Judge for Open-ended Tasks: Researching on rubric-based LLM evaluators for open-ended outputs and trajectories, and systematically probing judge failure modes to support trustworthy agent self-improvement.

Self-evolving Agent: Building agents that iteratively improve their policies via self-refinement loops, with a focus on reliable feedback signals and long-horizon behavior.

Director of Research
Aug 2025 – Feb 2026 · Palo Alto, CA, United States

Research: As a founding member of the Research team, I lead benchmarking and evaluation for agentic and multimodal LLMs. I led the EditReward (ICLR'26) project and co-developed large-scale benchmarks including SuperGPQA (NeurIPS'25), ChartNet (CVPR'26), EgoTL (CVPR'26) and VeriWeb.

Advanced Dataset & Pipeline Design: Led several zero-to-one pipeline builds—architecting and deploying high-difficulty dataset solutions and production pipelines from scratch across coding, IMO-level math, multimodal data, agentic trajectories, and RL environments. These datasets and pipelines are directly used for model training and evaluation for multiple frontier AI labs.

Core Contributor
Aug 2025 – Mar 2026

As a core contributor, conducting substantial research across benchmarks, datasets, and agent evaluation for the open-source community.

Agent Evaluation: Led research on agent evaluation and training datasets, focusing on long-horizon reasoning, tool use, and self-evolving agent capabilities.

Multimodal Image Datasets: Led multimodal dataset research for image generation, including preference data and evaluation frameworks for alignment and controllability.

Applied Scientist Intern
May 2025 – Aug 2025

Multimodal Data Pipelines: Built data pipelines and multi-stage QA systems for multimodal LLM projects, overseeing large-scale annotation workflows and label consistency.

Dataset Quality & Validation: Conducted analysis and validation to refine annotations and ensure robust datasets for LLM post-training.

Research Assistant
Jan 2022 – May 2025 · Montreal, QC, Canada

AgentThink (Agent Reasoning): Led a collaboration with Xiaomi and Tsinghua on tool-augmented reasoning for vision-language models in autonomous driving, achieving +54% answer accuracy on open-source models.

Adversarial LLM4TS: Developed a black-box attack framework and public benchmarks for LLM-based time-series forecasting, in collaboration with the Amazon Chronos and Nixtla teams.

Research Assistant
Aug 2019 – Dec 2020 · Atlanta, GA, United States

Multi-Agent RL Exploration: Developed a multi-agent search strategy combining MADDPG with frontier-based exploration, and built evaluation benchmarks for exploration efficiency.

Awards

2024

McGill Engineering Doctoral Award (MEDA)

2021

TISED Doctoral Recruitment Award (DRA), McGill University

2019

Outstanding Graduate of Liaoning Province; Most Influential Graduate, Northeastern University

2017

National 1st Prize, China Undergraduate Mathematical Contest in Modeling

2017

1st Class Academic Scholarship, Northeastern University

Academic Service

Workshops Organizer

Conferences Reviewer

  • Advances in Neural Information Processing Systems (NeurIPS)
  • International Conference on Learning Representations (ICLR)
  • International Conference on Artificial Intelligence and Statistics (AISTATS)
  • IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
  • International Conference on Computer Vision (ICCV)
  • Conference on Language Modeling (COLM)
  • Conference on Empirical Methods in Natural Language Processing (EMNLP)
  • Association for the Advancement of Artificial Intelligence (AAAI)
  • IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
  • IEEE International Conference on Robotics and Automation (ICRA)
  • IEEE Intelligent Transportation Systems Conference (ITSC)

Journals Reviewer

  • IEEE Robotics and Automation Letters (RA-L)
  • Transportation Research Part C: Emerging Technologies (TRC)
  • IEEE Transactions on Intelligent Transportation Systems (T-ITS)

I enjoy music by Tyler, the Creator, SZA and Chappell Roan.

Sometimes I also listen to Taylor Swift, Olivia Rodrigo and 9m88.

My favorite influencer is Allywoo on RedNote.

Cat: Bobo, a golden shaded British Shorthair who is good at programming with buttons.

Bobo