Hi, my name is Yu Zhang ([jy tʃɑŋ], 張宇/张宇 in traditional/simplified Chinese).
I am a researcher at Moonshot AI.
Kimi Linear: An Expressive, Efficient Attention Architecture
Yu Zhang, Zongyu Lin, Xingcheng Yao, Jiaxi Hu, Fanqing Meng, Chengyin Liu, Xin Men, Songlin Yang, Zhiyuan Li, Wentao Li, Enzhe Lu, Weizhou Liu, Yanru Chen, Weixin Xu, Longhui Yu, Yejie Wang, Yu Fan, Longguang Zhong, Enming Yuan, Dehao Zhang, Yizhi Zhang, TY Liu, Haiming Wang, Shengjun Fang, Weiran He, Shaowei Liu, Yiwei Li, Jianlin Su, Jiezhong Qiu, Bo Pang, Junjie Yan, Zhejun Jiang, Weixiao Huang, Bohong Yin, Jiacheng You, Chu Wei, Zhengtao Wang, Chao Hong, Yutian Chen, Guanduo Chen, Yucheng Wang, Huabin Zheng, Feng Wang, Yibo Liu, Mengnan Dong, Zheng Zhang, Siyuan Pan, Wenhao Wu, Yuhao Wu, Longyu Guan, Jiawen Tao, Guohong Fu, Xinran Xu, Yuzhi Wang, Guokun Lai, Yuxin Wu, Xinyu Zhou, Zhilin Yang, Yulun Du
Preprint
We introduce Kimi Linear, a novel attention architecture that achieves expressive power comparable to softmax attention while maintaining linear time complexity. Our approach combines the delta rule with gated slot attention to enable efficient sequence modeling. Experimental results demonstrate competitive performance on language modeling benchmarks with significantly improved inference efficiency.
Gated Slot Attention for Efficient Linear-Time Sequence Modeling
Yu Zhang*, Songlin Yang*, Ruijie Zhu, Yue Zhang, Leyang Cui, Yiqiao Wang, Bolun Wang, Freda Shi, Bailin Wang, Wei Bi, Peng Zhou, Guohong Fu
NeurIPS 2024
Linear attention has emerged as a promising alternative to softmax attention for its efficient computation and comparable performance. In this work, we propose Gated Slot Attention (GSA), a new linear attention mechanism that combines the benefits of gated linear attention and slot-based memory. GSA achieves strong performance on language modeling tasks while maintaining linear complexity with respect to sequence length.
Parallelizing Linear Transformers with the Delta Rule over Sequence Length
Songlin Yang, Bailin Wang, Yu Zhang, Yikang Shen, Yoon Kim
NeurIPS 2024
Linear attention mechanisms offer efficient sequence modeling but lack the expressive power of softmax attention. We propose a novel parallelization scheme for linear transformers using the delta rule, enabling efficient training while maintaining the benefits of linear complexity. Our approach allows for effective modeling of sequential dependencies with significantly reduced computational cost.
Scalable MatMul-free Language Modeling
Ruijie Zhu, Yu Zhang, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Peng Zhou, Jason K. Eshraghian
Preprint
Matrix multiplication (MatMul) dominates the computational cost of language models. We propose a scalable MatMul-free language modeling approach that eliminates expensive matrix multiplications while maintaining competitive performance. Our method enables efficient deployment on hardware with limited resources.
Non-autoregressive Text Editing with Copy-aware Latent Alignments
Yu Zhang*, Yue Zhang*, Leyang Cui, Guohong Fu
EMNLP 2023
Non-autoregressive text editing models offer efficient text generation by predicting edits rather than tokens. We propose copy-aware latent alignments to improve the accuracy of edit-based models by better handling copy operations. Our approach achieves state-of-the-art results on text editing benchmarks with significantly faster inference.
Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments
Yu Zhang, Qingrong Xia, Shilin Zhou, Yong Jiang, Guohong Fu, Min Zhang
COLING 2022
Semantic role labeling (SRL) is typically treated as a sequence labeling or span prediction task. We explore an alternative formulation where SRL is cast as dependency parsing, revealing latent tree structures inside semantic arguments. Our approach leverages dependency parsing techniques to capture hierarchical relationships between arguments.
Fast and Accurate End-to-End Span-based Semantic Role Labeling as Word-based Graph Parsing Best Paper Award
Shilin Zhou, Qingrong Xia, Zhenghua Li, Yu Zhang, Yu Hong, Min Zhang
COLING 2022
Span-based semantic role labeling has achieved great success but often requires complex pipeline architectures. We reformulate SRL as word-based graph parsing, enabling fast and accurate end-to-end processing. Our approach won the Best Paper Award at COLING 2022 for its elegant formulation and strong empirical results.
Fast and Accurate Neural CRF Constituency Parsing
Yu Zhang*, Houquan Zhou*, Zhenghua Li
IJCAI 2020
Constituency parsing is a fundamental NLP task requiring both accuracy and efficiency. We propose a fast and accurate neural CRF approach that achieves state-of-the-art results with significantly reduced computational cost. Our method combines neural encoding with efficient CRF inference for optimal performance.
Efficient Second-Order TreeCRF for Neural Dependency Parsing
Yu Zhang, Zhenghua Li, Min Zhang
ACL 2020
Dependency parsing benefits from structured prediction with TreeCRFs, but second-order models are computationally expensive. We propose efficient algorithms for second-order TreeCRF inference that maintain accuracy while significantly reducing training time. Our approach enables practical use of rich structural features.
Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing? Best Paper Award
Houquan Zhou*, Yu Zhang*, Zhenghua Li, Min Zhang
NLPCC 2020
Part-of-speech (POS) tagging has long been considered essential for dependency parsing. We investigate whether POS tagging is still necessary in the era of neural networks, providing empirical analysis on the contribution of POS tags to parsing performance. Our findings challenge conventional wisdom and won the Best Paper Award at NLPCC 2020.
HLT@SUDA at SemEval-2019 Task 1: UCCA Graph Parsing as Constituent Tree Parsing
Wei Jiang, Zhenghua Li, Yu Zhang, Min Zhang
SemEval 2019
Universal Conceptual Cognitive Annotation (UCCA) provides a cross-lingual semantic representation. We approach UCCA graph parsing by reducing it to constituent tree parsing, leveraging existing parsing techniques for tree structures. Our system achieved competitive results at the SemEval-2019 shared task.