Chaojun Xiao


Contact

  xcjthu [at] gmail [dot] com
xiaocj20 [at] mails [dot] tsinghua [dot] edu [dot] cn
  Room 4-506, FIT Building, Tsinghua University
Beijing, 100084, China
  Github
  Google Scholar

About me

Hi! I am a Ph.D. student in the Department of Computer Science and Technology at Tsinghua University. I am advised by Professor Zhiyuan Liu and affiliated with the Natural Language Processing Lab (THUNLP). My research interests lie within the intersection of natural language processing, large-scale language models, and Legal AI. Before I becoming a Ph.D. student, I also received my bachelor degree from Tsinghua University.

PUBLICATIONS

2024

1.   Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun. InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory. NeurIPS 2024. [pdf]
2.   Chaojun Xiao, Yutao Sun, Yuan Yao, Xu Han, Wenbin Zhang, Zhiyuan Liu and Maosong Sun. Fine-Grained Legal Argument-Pair Extraction via Coarse-Grained Pre-training. COLING 2024. [pdf]
3.   Zhengyan Zhang, Chaojun Xiao, Qiujieli Qin, Yankai Lin, Zhiyuan Zeng, Xu Han, Zhiyuan Liu, Ruobing Xie, Maosong Sun, Jie Zhou. Exploring the Benefit of Activation Sparsity in Pre-training. ICML 2024. [pdf]
4.   Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun. Configurable Foundation Models: Building LLMs from a Modular Perspective. Preprint 2024. [pdf]
5.   Cheng Gao*, Chaojun Xiao*, Zhenghao Liu, Huimin Chen, Zhiyuan Liu, Maosong Sun. Enhancing Legal Case Retrieval via Scaling High-quality Synthetic Query-Candidate Pairs. EMNLP 2024.

2023

1.   Chaojun Xiao, Zhengyan Zhang, Xu Han, Chi-Min Chan, Yankai Lin, Zhiyuan Liu, Xiangyang Li, Zhonghua Li, Zhao Cao, Maosong Sun. Plug-and-play document modules for pre-trained models. ACL 2023. [pdf]
2.   Zhengyan Zhang, Zhiyuan Zeng, Yankai Lin, Huadong Wang, Deming Ye, Chaojun Xiao, Xu Han, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou. Plug-and-play knowledge injection for pre-trained language models. ACL 2023. [pdf]
3.   Chaojun Xiao, Ruobing Xie, Yuan Yao, Zhiyuan Liu, Maosong Sun, Xu Zhang, Leyu Lin. UPRec: User-Aware Pre-training for Recommender Systems. AI Open. [pdf]
4.   Chaojun Xiao, Yuqi Luo, Wenbin Zhang, Pengle Zhang, Xu Han, Yankai Lin, Zhengyan Zhang, Ruobing Xie, Zhiyuan Liu, Maosong Sun, Jie Zhou. Variator: Accelerating pre-trained models with plug-and-play compression modules. EMNLP 2023 Findings. [pdf]
5.   Qingquan Li, Yiran Hu, Feng Yao, Chaojun Xiao, Zhiyuan Liu, Maosong Sun, Weixing Shen. MUSER: A Multi-View Similar Case Retrieval Dataset. CIKM 2023 Resources. Best Resource Paper Honorable Mention [pdf]

2022

1.   Feng Yao*, Chaojun Xiao*, Xiaozhi Wang, Zhiyuan Liu, Lei Hou, Cunchao Tu, Juanzi Li, Yun Liu, Weixing Shen, Maosong Sun. LEVEN: A Large-Scale Chinese Legal Event Detection Dataset. ACL Findings 2022.

2021

1.   Yuan Yao, Haoxi Zhong, Zhengyan Zhang, Xu Han, Xiaozhi Wang, Chaojun Xiao, Guoyang Zeng, Zhiyuan Liu, Maosong Sun. Adversarial Language Games for Advanced Natural Language Intelligence. AAAI 2021. Long paper.
2.   Yuzhong Wang, Chaojun Xiao, Shirong Ma, Haoxi Zhong, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun. Equality before the Law: Legal Judgment Consistency Analysis for Fairness. Preprint.
3.   Chaojun Xiao, Xueyu Hu, Zhiyuan Liu, Cunchao Tu, Maosong Sun. Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents. AI Open. SMP Best Paper Award.
4.   Zhengyan Zhang*, Yuxian Gu*, Xu Han*, Shengqi Chen*, Chaojun Xiao*, Zhenbo Sun, Yuan Yao, Fanchao Qi, Jian Guan, Pei Ke, Yanzheng Cai, Guoyang Zeng, Zhixing Tan, Zhiyuan Liu, Minlie Huang, Wentao Han, Yang Liu, Xiaoyan Zhu, Maosong Sun. CPM-2: Large-scale Cost-effective Pre-trained Language Models. AI Open.

2020

1. Chaojun Xiao, Yuan Yao, Ruobing Xie, Xu Han, Zhiyuan Liu, Maosong Sun, Fen Lin and Leyu Lin. Denoising Relation Extraction from Document-level Distant Supervision. EMNLP 2020. Short paper.
2.   Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun. How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence. ACL 2020. Theme paper.
3.   Xu Han, Tianyu Gao, Yankai Lin, Hao Peng, Yaoliang Yang, Chaojun Xiao, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou. More Data, More Relations, More context and More Openness: A Review and Outlook for Relation Extraction. AACL 2020. Long paper.
4.   Haoxi Zhong*, Chaojun Xiao*, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun. JEC-QA: A Legal-Domain Question Answering Dataset. AAAI 2020. Long paper. (* indicates equal contribution).
5.   Zheni Zeng*, Chaojun Xiao*, Yuan Yao, Ruobing Xie, Zhiyuan Liu, Fen Lin, Leyu Lin, Maosong Sun. Knowledge Transfer via Pre-training for Recommendation: A Review and Prospect. Frontiers in Big Data. (* indicates equal contribution).

Before 2020

1.   Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Chaojun Xiao, Zhiyuan Liu, Maosong Sun Legal Judgment Prediction via Topological Learning. EMNLP 2018. Long Paper.
2.   Chaojun Xiao*, Haoxi Zhong*, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Tianyang Zhang, Xianpei Han, Zhen Hu, Heng Wang, Jianfeng Xu. CAIL2019-SCM: A Dataset of Similar Case Matching in Legal Domain. Preprint. (* indicates equal contribution).
3.   Chaojun Xiao*, Haoxi Zhong*, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Yansong Feng, Xianpei Han, Zhen Hu, Heng Wang, Jianfeng Xu. CAIL2018: A Large-Scale Legal Dataset for Judgment Prediction. Preprint. (* indicates equal contribution).


EXPERIENCE

Ph.D. student

Department of Computer Science and Technology,
Tsinghua University, Beijing, China.
August 2020 - Present

Bachelor of Engineering

Department of Computer Science and Technology,
Tsinghua University, Beijing, China.
August 2016 - July 2020

High School

Pingchuan High School, Xingguo, Jiangxi, China.
August 2013 - July 2016

Reviewer

AAAI, WWW, COLING, ACL, EMNLP, NAACL, ACL ARR, NeurIPS.

TA

Towards Artificial General Intelligence, Tsinghua University.
2024
Object-Oriented Programming, Tsinghua University.
2020 - 2022
Media Programming, Tsinghua University.
2019 - 2021.

Awards

PH.D. Student

First-class Tencent Rhino-Bird Elite Training Program Excellent Student.
2023

Second-class Overall Excellence Scholarship, Tsinghua University.
SMP Best Paper Award.
2022

Bachelor

Excellent Graduate, Beijing.
Excellent Graduate, Tsinghua University.
Excellent Graduate, Dept. of CS&T, Tsinghua University.
2020

First-class Price in Challenge Cup Contest, Beijing.
First-class Science and Technology Innovation Excellence Scholarship, Tsinghua University.
2019

First-class Price in Challenge Cup Contest, Tsinghua University.
First-class Overall Excellence Scholarship, Tsinghua University.
2018

First-class Overall Excellence Scholarship, Tsinghua University.
Gaotong Scholarship, Tsinghua University.
2017