Research
During the recent rapid development of AI driven by the scaling law, numerous trends highlight the importance of offering more effective and trustworthy probabilistic modeling for generative models. For example,
- Decoder-only large language models (LLMs) have become powerful due to their emerging capabilities through next token distribution prediction.
- From GANs to diffusion models, new generative modeling methods make learning about complex distributions easier and more scalable.
- Safety alignments that bring LLM outputs distribution closer to human values, such as RLHF, are becoming mainstream.
- Speculative decoding improves efficiency without deviating from LLMs' original output distribution through well-crafted “draft&verification” mechanism.
With this in mind, I am passionate about developing novel and reasonable probabilistic modeling frameworks to address practical challenges in generative AI and trustworthy AI, particularly for Large (Vision-)Language Models and Diffusion Models. My research experience mainly focuses on synergistically optimizing the intelligent performance with factuality/efficiency/privacy. Below are some of my research projects:
Selected Research Projects
- SLED@NeurIPS24: We propose a new advanced decoding framework for large language models to combat LLM hallucination by modeling the output distribution closer to the real-world factuality distribution.
- ARTIST@WACV25 Oral Presentation: We utilized disentangled distribution modeling to improve the capability of diffusion models in generating discontinuous distributions, like text in images.
- H-CoT@preprint: We hijack the Chain-of-Thought safety reasoning mechanism to jailbreak large reasoning models, including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking.
- SuffientRAG@ICLR: Our work provided a new lens on LLM responses in RAG systems centered around our notion of sufficient context.
- Coreinfer@preprint: We explore the relationship between prompt semantics and activation neurons distribution to enable adaptive activation sparse inference for LLMs.
- Minkplus@ICLR: We propose a new training detection method for LLM based on a new insight that training samples tend to be local maxima of the modeled distribution.
- FederatedGPT@FLFM-NeurIPS23 Oral Presentation: We pioneered instruction tuning for LLMs in a federated learning way, marking a first attempt at data privacy protection in LLMs.
- MLLM-LLaVA-FL@WACV25: We leveraged large vision-language models, like LLaVA to boost federated learning, enhancing model performance without compromising data privacy.
- ReAugKD@ACL23 Oral Presentation: Utilized probabilistic modeling for knowledge distillation in pretrained language models, enhancing model efficiency and performance.
- Fed-CBS@ICML23: Developed efficient client sampling methods for federated learning, improving both performance and privacy.
- DeCL@NeurIPS22: Employed Bayesian probabilistic modeling to refine contrastive learning techniques.
- SPOS@ICML20, AISTATS20: Introduced novel Bayesian sampling methods aimed at enhancing sample quality.
- cSG-MCMC@ICLR20 Oral Presentation: We proposed cyclical SG-MCMC methods to automatically explore complex multimodal distributions, compelling for Bayesian deep learning.