ZJU NEWSROOM

Charting new frontiers: the journey of HUANG Rongjie in AI

2024-05-16 Global Communications

HUANG Rongjie, a 2021 master's student from the School of Software Technology, is one of only ten recipients worldwide of the ByteDance Scholarship. He focuses on core technologies such as artificial intelligence and cross-modal computing. He has published over ten papers as the first author in CCF-A class conferences and has long served as a reviewer for these top journals. During his graduate studies, he received the National Scholarship four times.

To the surprise of many, Huang's undergraduate major was not related to computer technology or software, but to ocean engineering and technology. When he first entered Zhejiang University, he, like most of his peers, was unsure of what he wanted to do or who he wanted to become. The university's holistic integration of general and specialized education and innovative training system provided him with opportunities. Zhejiang University has always been committed to the deep integration of general education and professional courses, offering over 500 general education courses. Through these courses, HUANG's abilities continuously improved, and he realized his interest in programming.

By chance, HUANG joined the university's Student Artificial Intelligence Association. Influenced by the club's atmosphere, he began experimenting with robot development and gained a deeper understanding of AI. The turning point in his academic journey came when he participated in the National Undergraduate Mathematical Modeling Contest during his sophomore year. This was a brand-new challenge from scratch, and HUANG, along with classmates he met in student organizations, won the provincial first prize in the national competition and later won first prize in the American Mathematical Contest in Modeling.

In his senior year, HUANG's excellent academic performance earned him a cross-disciplinary recommendation to the School of Software Technology, where he joined Professor ZHAO Zhou's team. This team has long been deeply involved in natural language processing, computer vision, and generative artificial intelligence, achieving significant results. After entering the lab, HUANG chose a subfield of AI research—speech/audio generation—as his research direction. This emerging field is not widely studied domestically and is crucial as domestic research lags behind international standards. This motivated HUANG, sparking a new journey of exploration from scratch.

Focusing on singing voice synthesis, and with guidance from senior lab members, HUANG successfully published his first CCF-A class conference paper in his senior year. He then began to think seriously about the specific research field he should specialize in. After careful consideration, he focused on speech and music generation models and multimodal language processing, technologies that play important roles in the currently popular generative AI systems like GPT-4 and Sora.

Specializing in research also means tackling more complex problems. Writing my second paper was quite challenging, and my mindset changed significantly during the process, HUANG said. Instead of retreating in the face of obstacles, he actively adjusted his mindset, continuously improved his development results, revised his paper, and submitted it repeatedly.

HUANG said, Our mentor sets an example for us. He stays late in the lab and guides us in our research. When your research ideas are limited, the entire research team can join in the discussion.

In this supportive environment, HUANG successfully published his second paper. From then on, he progressed rapidly, publishing over ten papers in three years in top conferences such as NeurIPS, ICML, ICLR, and ACL. His work on GitHub has over 10,000 stars and has been cited nearly a thousand times on Google Scholar. HUANG's research has played a significant role in reducing communication barriers between speakers of different languages. He focuses on developing innovative direct speech-to-speech translation systems, including the breakthrough technologies TranSpeech and AV-TranSpeech. These significantly reduce communication barriers in different linguistic environments, providing a platform for sharing, understanding, and communication among people from diverse cultural and linguistic backgrounds, and have been applied by companies such as Tencent, and ByteDance. HUANG is advancing towards creating AI systems that can speak and write proficiently.

In an era of continuous AI development, HUANG Rongjie, along with partners who share the same goal, is committed to researching new and unknown fields.


Adapted and translated from the article written by YANG Luoluo, LI Anqing, ZHANG Mei, WU Huimin

Translator: ZHUGE Jiayuan (’2025, Communication)

Photo: the interviewee

Editor: HAN Xiao ('25 PhD, Education), TIAN Minjie