About
I am a senior algorithm expert at Rhymes.AI, multi-modality group, leading the direction of multi-modal generation. Before that, I am a senior researcher at Microsoft research Asia, multi-modal computing group. We are hiring interns and researchers to work on content creation (diffusion-based generation and image/video manipulation) and low-level vision (image/video restoration and enhancement). If interested, feel free to email me with your CV.
I received my Ph.D. degree in computer science in 2019 from Shanghai Jiao Tong University, under the supervision of Dr. Baining Guo and Prof. Minyi Guo. During my Ph.D., I worked closely with Dr. Baoyuan Wang at Microsoft Research Asia. Before that, I received my B.S. in computer science in 2014 from Shanghai Jiao Tong University, under the supervision of Prof. Yong Yu and Prof. Minyi Guo.
My research interest includes GAN/diffusion-based content creation, image/video restoration and enhancement. I have published about 20 papers at the top international CV/AI conferences such as CVPR, ICCV, ECCV, and NeurIPS.
News
- [2024.11.18] One paper is accepted by TMLR.
- [2024.10.2] Our Allegro text-to-video model is released.
- [2024.09.27] Appointed as Associate Editor for TMM.
- [2024.07.31] One paper is accepted by ToMM.
- [2024.06.30] One paper is accepted by TMLR.
- [2024.02.27] One paper is accepted by CVPR.
- [2024.02.15] One paper is accepted by TIP.
- [2024.01.19] One paper is accepted by CHI.
- [2024.01.16] One paper is accepted by ICLR.
Publications
-
AnyV2V: A Plug-and-Play Framework for Any Video-to-Video Editing Tasks
Max Ku, Cong Wei, Weiming Ren, Huan Yang, Wenhu Chen
-
Allegro: Open the Black Box of Commercial-Level Video Generation Model
Yuan Zhou, Qiuyue Wang, Yuxuan Cai, Huan Yang
-
Prompt-Based Modality Bridging for Unified Text-to-Face Generation and Manipulation
Yiyang Ma, Haowei Kuang, Huan Yang, Jianlong Fu, Jiaying Liu
ToMM [paper]
-
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
Huiguo He, Huan Yang, Zixi Tuo, Yuan Zhou, Qiuyue Wang, Yuhang Zhang, Zeyu Liu, Wenhao Huang, Hongyang Chao, Jian Yin
Preprint [paper]
-
ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation
Weiming Ren, Huan Yang, Ge Zhang, Cong Wei, Xinrun Du, Wenhao Huang, Wenhu Chen
-
Zero-Reference Low-Light Enhancement via Physical Quadruple Priors
Wenjing Wang, Huan Yang, Jianlong Fu, Jiaying Liu
-
Online Streaming Video Super-Resolution with Convolutional Look-Up Table
Guanghao Yin, Zefan Qu, Xinyang Jiang, Shan Jiang, Zhenhua Han, Ningxin Zheng, Xiaohong Liu, Huan Yang, Yuqing Yang, Dongsheng Li, Lili Qiu
TIP [paper]
-
Examining Human Perception of Generative Content Replacement in Image Privacy Protection
Anran Xu, Shitao Fang, Huan Yang, Simo Hosio, Koji Yatani
CHI 2024 [paper]
-
Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution
Yiyang Ma, Huan Yang, Wenhan Yang, Jianlong Fu, Jiaying Liu
ICLR 2024 [paper]
-
MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text
Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo
ACM Multimedia 2023 Technical Demos and Videos [paper]
-
Learning Profitable NFT Image Diffusions via Multiple Visual-Policy Guided Reinforcement Learning
Huiguo He, Tianfu Wang, Huan Yang, Jianlong Fu, Nicholas Jing Yuan, Jian Yin, Hongyang Chao, Qi Zhang
ACM Multimedia 2023 [paper]
-
MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images
Junchen Zhu, Huan Yang, Huiguo He, Wenjing Wang, Zixi Tuo, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu
ACM Multimedia 2023 Brave New Ideas [paper]
-
VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation
Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu, Jiaying Liu
-
NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation
Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan
ACL 2023 Oral [paper]
-
Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution
Zixi Tuo, Huan Yang, Jianlong Fu, Yujie Dun, Xueming Qian
-
Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation
Yiyang Ma, Huan Yang, Wenjing Wang, Jianlong Fu, Jiaying Liu
Preprint [paper]
-
Learning Degradation-Robust Spatiotemporal Frequency-Transformer for Video Super-Resolution
Zhongwei Qiu, Huan Yang, Jianlong Fu, Daochang Liu, Chang Xu, Dongmei Fu
-
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo
-
Fine-Grained Image Style Transfer with Visual Transformers
Jianbo Wang, Huan Yang, Jianlong Fu, Toshihiko Yamasaki, Baining Guo
-
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu
-
Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution
Zhongwei Qiu, Huan Yang, Jianlong Fu, Dongmei Fu
-
AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation
Yiyang Ma, Huan Yang, Bei Liu, Jianlong Fu, Jiaying Liu
-
4D LUT: Learnable Context-Aware 4D Lookup Table for Image Enhancement
Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian
TIP [paper]
-
Rethinking Image and Video Restoration: An Industrial Perspective
Huan Yang
CTSoc-NCT [paper]
-
Language-Guided Face Animation by Recurrent StyleGAN-based Generator
Tiankai Hang, Huan Yang, Bei Liu, Jianlong Fu, Xin Geng, Baining Guo.
TMM [paper]
-
Online Video Super-Resolution with Convolutional Kernel Bypass Graft
Jun Xiao, Xinyang Jiang, Ningxin Zheng, Huan Yang, Yifan Yang, Yuqing Yang, Dongsheng Li, Kin-Man Lam
TMM [paper]
-
Degradation-Guided Meta-Restoration Network for Blind Super-Resolution
Fuzhi Yang, Huan Yang, Yanhong Zeng, Jianlong Fu, Hongtao Lu
Preprint [paper]
-
Learning Trajectory-Aware Transformer for Video Frame Interpolation
Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian
TIP [paper]
-
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo
-
Learning Trajectory-Aware Transformer for Video Super-Resolution
Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian
-
Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers
Yanhong Zeng, Huan Yang, Hongyang Chao, Jianbo Wang, Jianlong Fu
NeurIPS 2021 [paper]
-
Learning Fine-Grained Motion Embedding for Landscape Animation
Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo
ACM Multimedia 2021 Oral [paper]
-
Domain-Aware Universal Style Transfer
Kibeom Hong, Seogkyu Jeon, Huan Yang, Jianlong Fu, Hyeran Byun
-
Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment
Heliang Zheng, Huan Yang, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo
-
Learning Texture Transformer Network for Image Super-Resolution
Fuzhi Yang, Huan Yang, Jianlong Fu, Hongtao Lu, Baining Guo
-
Personalized Exposure Control Using Adaptive Metering and Reinforcement Learning
Huan Yang, Baoyuan Wang, Noranart Vesdapunt, Minyi Guo, Sing Bing Kang
TVCG [paper]
-
Unsupervised Extraction of Video Highlights via Robust Recurrent Auto-Encoders
Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, Baining Guo
ICCV 2015 [paper]
Projects
-
Allegro Text-to-Video Generation Model [project]
We release our text-to-video generation model, named Allegro, aims at open the black box of commercial-level video generation model. Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input. It is fully open-sourced including code, model, technical report.
-
Video Super-Resolution for Edge Browser [project]
We release our new feautre of video super-resolution in Edge. Video super resolution uses machine learning to enhance the quality of video viewed in Microsoft Edge by using graphics card agnostic algorithms to remove blocky compression artifacts and upscale the video resolution, so you can enjoy crisp and clear videos on YouTube and other video streaming platforms without sacrificing bandwidth.
Left: VSR OFF, Right: VSR ON
-
DaVinci Project [project]
The DaVinci project aims to solve the pain points of existing video enhancement and restoration tools, give full play to the advantages of AI technology and lower the threshold for users to process video footage. Currently, the toolkit includes general image super-resolution and conference meeting enhancement features.
Left: Input Low-Quality Image, Right: DaVinci Enhanced Result
Activities
- CVPR Reviewer: 2024, 2023, 2022
- ICCV Reviewer: 2023, 2021
- ECCV Reviewer: 2024, 2022
- NeurIPS Reviewer: 2024, 2023
- ACM MM Reviewer: 2024, 2023
- ICASSP Reviewer: 2023
- ICME Reviewer: 2023, 2022, 2021, 2020
- Journal Reviewer: TIP, IJCV, TMM, TMI, TCI, PR
Talks
- [2023.04.15] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation. Invited Talk at MSRA CVPR 2023 Pre-Workshop. [video]
- [2022.12.23] Transformer Network Design in Low-Level Vision. Invited Talk at PRCV 2022. [slides]
- [2022.04.23] TTVSR: Learning Trajectory-Aware Transformer for Video Super-Resolution. Invited Talk at MSRA CVPR 2022 Pre-Workshop. [video]
- [2021.10.13] CKDN: Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment. Invited Talk at MSRA ICCV 2021 Pre-Workshop. [video]