Hello! I'm Kaijing Ma

About Me

I recently graduated from Tongji University with a Bachelor's degree in Computer Science and Technology. My name carries the meaning of lush growth and thriving energy, and I am naturally cheerful, optimistic, and curious. I love exploring diverse facets of life — from arts and literature to hands-on experiments, tinkering with robots, and building quirky projects for fun. I am always eager to embrace new challenges and learn through both collaborative teamwork and self-driven exploration.

Research Interests

My research vision focuses on transforming the abstract concept of "intelligence" into engineering systems that are constructible, measurable, and explainable. Specifically, my research interests include:

Natural Language Processing (NLP) — studying language understanding and generation with large-scale models.
Reasoning Capabilities of LLMs — evaluating, enhancing, and systematically improving how LLMs perform logical and multi-step reasoning.
Mechanistic Interpretability — uncovering the underlying mechanisms behind LLM reasoning processes.
AI Safety — designing methods to ensure robust, safe, and aligned AI behavior.
Embodied Intelligence — exploring how AI can interact with and learn from physical environments and robotics systems.

Education

Tongji University Link

B.S. in Computer Science · 2021–2025

Personal Development Timeline

Today

2022

2023

2024

2025

2026

MAP Open Source Community Reasoning Benchmarks (KOR-Bench, SuperGPQA)

SRIAS (Tongji) Pedestrian Re-ID & Cloud-Edge Algorithms

SPAR Project AI Safety & Steganography

MIT CSAIL (Remote) MusicDSL & Computational Design

VEX Robotics Team Leader Tjulib Library & Odometry Positioning

AI Safety Hungary Alignment Trainee

ByteDance Intern Formal Reasoning (Lean 4) & Mech Interp

Stepfun LLM Pretrain

← Drag to scroll | Updates automatically based on current date →

Research Experience

Stepfun — LLM Pretrain Intern Link

Dec 2025 – Present

Working on pretraining algorithms for large-scale models, focusing on optimization, data pipeline design, and scaling model training for enhanced performance and efficiency.

ByteDance — LLM Research Intern Link

Advised by Dr. Wenhao Huang Dr. Ge Zhang

May 2025 – Oct 2025

Worked on formal reasoning and automated theorem proving with large language models (LLMs), constructing large datasets for reasoning experiments and evaluation.

MIT CSAIL CDFG — Research Intern Link

Advised by Prof. Wojciech Matusik (Remote)

Jun 2025 – Present

Developed MusicDSL, a domain-specific language for musical structure, and built middleware connecting DAWs with AI models.

MAP Open Source Community — Intern Link

Advised by Dr. Ge Zhang & other community mentors

Oct 2023 – Present

Focused on designing benchmarks and evaluation tools to systematically assess model reasoning capabilities, and authored detailed research reports supporting team projects and analyses.

Shanghai Institute for Intelligent Autonomous Systems — Research Assistant Link

Jun 2022 – Sep 2023

Developed cloud–edge pedestrian re-identification algorithms, implemented multi-level clustering methods, and authored patents.

SPAR Project — Mentee Link

Feb 2024 – Jun 2024

Implemented a secure steganography system integrating iMEC and GPT-2.

AI Safety Hungary Course — Trainee Link

Feb 2024 – Apr 2024

Studied AI alignment and safety through technical readings and group discussions.

Other Experience

VEX Robotics Laboratory, Tongji University — Program Team Leader

Oct 2022 – Dec 2023

Lab GitHub Program GitHub Videos Engineering Notes

Led the development of the tjulib library, overseeing design and coding of its functionality, and implemented an omni-directional octagonal chassis design and odometry algorithm for full-field positioning.

Publications

Listed in reverse chronological order (most recent first)

Scaling Latent Reasoning via Looped Language Models

Mechanistic Interpretability

Website PDF

Criticlean: Critic-guided reinforcement learning for mathematical formalization

Co-First Author

Website PDF

Seed-Prover: Deep and broad reasoning for automated theorem proving

Data Support

Website PDF

KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

NeurIPS 2025 Spotlight • Game Support

Website PDF

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

NeurIPS 2025 • Leading Author

Website PDF

KOR-Bench: Benchmarking Language Models on Knowledge-Orthogonal Reasoning Tasks

ICLR 2025 • First Author

Website PDF

KARPA: A Training-free Method of Adapting Knowledge Graph as References for LLM’s Reasoning Path Aggregation

ACL Findings 2025 • Second Author

Website PDF

CodeEditorBench: Evaluating Code Editing Capability of Large Language Models

ICLR 2025 DL4C • Co-First Author

Website PDF

SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval

Data Support

Website PDF

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Data Pipeline

Website PDF

Ongoing Long-term Project

OpSynth-MI — Operator Synthesis for Mechanistic Interpretability GitHub

2025 – Present

OpSynth-MI is a long-term, lead-by-me research agenda aimed at developing a systematic operator-based framework for mechanistic interpretability. Using newly defined operators, the project constructs a controllable reasoning sandbox that continuously simulates declarative and procedural knowledge from cognitive psychology, explicitly modeling dependencies between different knowledge types and exploring their compositional interactions. This series of works seeks to progressively uncover and formalize the internal mechanisms of large language models, enhancing transparency and reasoning understanding.

Research Vision & Concept Maps

A collection of selected visual summaries of my research ideas, conceptual frameworks, and long-term directions.

Figure 1: Toward Deeper, Longer, and More Reliable Reasoning: Theory, Mechanisms, and Training Paradigms

This roadmap proposes a systematic framework for advancing AI reasoning via reproducible and interpretable patterns. A methodological loop—“theory → mechanism → training → feedback”—integrates abstract reasoning, mechanism analysis, and training design. Experimentally, a complementary cycle of clean and chaotic data tests capabilities, guides architecture, and drives iterative improvement. Together, these cycles enhance model reasoning in a controlled, interpretable, and reproducible manner, shifting research from engineering-driven practice toward foundational science.

Figure 2: Wiser Agent, Wider World: Customizing the World for Agent Growth

This framework illustrates tiered agent training guided by WorldGPT, where synthetic environments are progressively adapted to the agent's skill level. A scheduler sets objectives, generates multimodal data, and iteratively refines both agent behavior and environment complexity, enabling continuous capability enhancement through feedback loops.

Figure 3: LLM-Empowered Embodied Intelligence for Physical Interaction

This study addresses the semantic gap between physical multimodal data and LLM reasoning in embodied intelligence. We propose a framework that converts sensor data into LLM-interpretable formats, integrates physical rules, and constructs a task-specific reasoning library for single and multi-agent scenarios. A feedback-based mechanism iteratively refines LLM reasoning using execution outcomes, aiming to improve task success rates in complex, dynamic environments.

Competition Awards

Excellence Award — 2023 CCF Software Conference Robotics Large Model and Embodied Intelligence Competition Link

First Prize — Professional Track 1, 2023 AI for Brain Science Collegiate Challenge

First Prize — Creative Group, 2023 Shanghai Female Student Innovation and Entrepreneurship Competition

4th Place — 2023 VEX Robotics World Championships VEX U Design Division Link

Design Award — 2023 China University Students Intelligent Robot Creativity Competition

Languages & Skills

Languages: Chinese (Native) | English (Fluent)

Programming: Python, C/C++, Verilog, JavaScript, HTML/CSS, Assembly

Machine Learning & Deep Learning: PyTorch, TensorFlow, HuggingFace Transformers

Large Language Models: Megatron-LM, vLLM, LLaMA-Factory (SFT), VERL (RL), NNSight (Interpretability), Ray (Distributed)

Robotics & Simulation: Mechanical Assembly, SolidWorks, 3D Printing, ROS, Sensors, Basic Control