About

I am a senior algorithm expert at Rhymes.AI, multi-modality group, leading the direction of multi-modal generation. Before that, I am a senior researcher at Microsoft research Asia, multi-modal computing group. We are hiring interns and researchers to work on content creation (diffusion-based generation and image/video manipulation) and low-level vision (image/video restoration and enhancement). If interested, feel free to email me with your CV.

I received my Ph.D. degree in computer science in 2019 from Shanghai Jiao Tong University, under the supervision of Dr. Baining Guo and Prof. Minyi Guo. During my Ph.D., I worked closely with Dr. Baoyuan Wang at Microsoft Research Asia. Before that, I received my B.S. in computer science in 2014 from Shanghai Jiao Tong University, under the supervision of Prof. Yong Yu and Prof. Minyi Guo.

My research interest includes GAN/diffusion-based content creation, image/video restoration and enhancement. I have published about 20 papers at the top international CV/AI conferences such as CVPR, ICCV, ECCV, and NeurIPS.

News

  • [2024.11.18] One paper is accepted by TMLR.
  • [2024.10.2] Our Allegro text-to-video model is released.
  • [2024.09.27] Appointed as Associate Editor for TMM.
  • [2024.07.31] One paper is accepted by ToMM.
  • [2024.06.30] One paper is accepted by TMLR.
  • [2024.02.27] One paper is accepted by CVPR.
  • [2024.02.15] One paper is accepted by TIP.
  • [2024.01.19] One paper is accepted by CHI.
  • [2024.01.16] One paper is accepted by ICLR.
- \[2023.08.28\] One paper is accepted by TPAMI. - \[2023.08.05\] One paper is accepted by TIP. - \[2023.08.03\] Receive the Enterprise Technology Innovation Award in ChinaMM 2023. - \[2023.07.26\] One paper is accepted by ACM Multimedia 2023. - \[2023.07.25\] One paper is accepted by ACM Multimedia 2023 Technical Demos and Videos. - \[2023.07.16\] One paper is accepted by ACM Multimedia 2023 Brave New Ideas. - \[2023.07.14\] One paper is accepted by ICCV 2023. - \[2023.06.16\] One paper is accepted by TIP. - \[2023.05.02\] One paper is accepted by ACL 2023. - \[2023.04.15\] Invited Talk at MSRA CVPR 2023 Pre-Workshop. - \[2023.03.08\] Video Super-Resolution feature is online in Edge Browser. - \[2023.02.28\] One paper is accepted by CVPR 2023. - \[2023.02.17\] Two papers are accepted by TMM.

Publications

  • AnyV2V: A Plug-and-Play Framework for Any Video-to-Video Editing Tasks

    Max Ku, Cong Wei, Weiming Ren, Huan Yang, Wenhu Chen

    TMLR [paper] [code]

  • Allegro: Open the Black Box of Commercial-Level Video Generation Model

    Yuan Zhou, Qiuyue Wang, Yuxuan Cai, Huan Yang

    Technical Report [paper] [code]

  • Prompt-Based Modality Bridging for Unified Text-to-Face Generation and Manipulation

    Yiyang Ma, Haowei Kuang, Huan Yang, Jianlong Fu, Jiaying Liu

    ToMM [paper]

  • DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion

    Huiguo He, Huan Yang, Zixi Tuo, Yuan Zhou, Qiuyue Wang, Yuhang Zhang, Zeyu Liu, Wenhao Huang, Hongyang Chao, Jian Yin

    Preprint [paper]

  • ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation

    Weiming Ren, Huan Yang, Ge Zhang, Cong Wei, Xinrun Du, Wenhao Huang, Wenhu Chen

    TMLR [paper] [code]

  • Zero-Reference Low-Light Enhancement via Physical Quadruple Priors

    Wenjing Wang, Huan Yang, Jianlong Fu, Jiaying Liu

    CVPR 2024 [paper] [code]

  • Online Streaming Video Super-Resolution with Convolutional Look-Up Table

    Guanghao Yin, Zefan Qu, Xinyang Jiang, Shan Jiang, Zhenhua Han, Ningxin Zheng, Xiaohong Liu, Huan Yang, Yuqing Yang, Dongsheng Li, Lili Qiu

    TIP [paper]

  • Examining Human Perception of Generative Content Replacement in Image Privacy Protection

    Anran Xu, Shitao Fang, Huan Yang, Simo Hosio, Koji Yatani

    CHI 2024 [paper]

  • Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution

    Yiyang Ma, Huan Yang, Wenhan Yang, Jianlong Fu, Jiaying Liu

    ICLR 2024 [paper]

  • MobileVidFactory: Automatic Diffusion-Based Social Media Video Generation for Mobile Devices from Text

    Junchen Zhu, Huan Yang, Wenjing Wang, Huiguo He, Zixi Tuo, Yongsheng Yu, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu, Jiebo Luo

    ACM Multimedia 2023 Technical Demos and Videos [paper]

  • Learning Profitable NFT Image Diffusions via Multiple Visual-Policy Guided Reinforcement Learning

    Huiguo He, Tianfu Wang, Huan Yang, Jianlong Fu, Nicholas Jing Yuan, Jian Yin, Hongyang Chao, Qi Zhang

    ACM Multimedia 2023 [paper]

  • MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images

    Junchen Zhu, Huan Yang, Huiguo He, Wenjing Wang, Zixi Tuo, Wen-Huang Cheng, Lianli Gao, Jingkuan Song, Jianlong Fu

    ACM Multimedia 2023 Brave New Ideas [paper]

  • VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation

    Wenjing Wang, Huan Yang, Zixi Tuo, Huiguo He, Junchen Zhu, Jianlong Fu, Jiaying Liu

    Under Review [paper] [data]

  • NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

    Shengming Yin, Chenfei Wu, Huan Yang, Jianfeng Wang, Xiaodong Wang, Minheng Ni, Zhengyuan Yang, Linjie Li, Shuguang Liu, Fan Yang, Jianlong Fu, Gong Ming, Lijuan Wang, Zicheng Liu, Houqiang Li, Nan Duan

    ACL 2023 Oral [paper]

  • Learning Data-Driven Vector-Quantized Degradation Model for Animation Video Super-Resolution

    Zixi Tuo, Huan Yang, Jianlong Fu, Yujie Dun, Xueming Qian

    ICCV 2023 [paper] [code]

  • Unified Multi-Modal Latent Diffusion for Joint Subject and Text Conditional Image Generation

    Yiyang Ma, Huan Yang, Wenjing Wang, Jianlong Fu, Jiaying Liu

    Preprint [paper]

  • Learning Degradation-Robust Spatiotemporal Frequency-Transformer for Video Super-Resolution

    Zhongwei Qiu, Huan Yang, Jianlong Fu, Daochang Liu, Chang Xu, Dongmei Fu

    TPAMI [paper] [code]

  • MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

    Ludan Ruan, Yiyang Ma, Huan Yang, Huiguo He, Bei Liu, Jianlong Fu, Nicholas Jing Yuan, Qin Jin, Baining Guo

    CVPR 2023 [paper] [code]

  • Fine-Grained Image Style Transfer with Visual Transformers

    Jianbo Wang, Huan Yang, Jianlong Fu, Toshihiko Yamasaki, Baining Guo

    ACCV 2022 [paper] [code]

  • Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

    Yuchong Sun, Hongwei Xue, Ruihua Song, Bei Liu, Huan Yang, Jianlong Fu

    NeurIPS 2022 [paper] [code]

  • Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution

    Zhongwei Qiu, Huan Yang, Jianlong Fu, Dongmei Fu

    ECCV 2022 [paper] [code]

  • AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

    Yiyang Ma, Huan Yang, Bei Liu, Jianlong Fu, Jiaying Liu

    ACM Multimedia 2022 Oral [paper] [code]

  • 4D LUT: Learnable Context-Aware 4D Lookup Table for Image Enhancement

    Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

    TIP [paper]

  • Rethinking Image and Video Restoration: An Industrial Perspective

    Huan Yang

    CTSoc-NCT [paper]

  • Language-Guided Face Animation by Recurrent StyleGAN-based Generator

    Tiankai Hang, Huan Yang, Bei Liu, Jianlong Fu, Xin Geng, Baining Guo.

    TMM [paper]

  • Online Video Super-Resolution with Convolutional Kernel Bypass Graft

    Jun Xiao, Xinyang Jiang, Ningxin Zheng, Huan Yang, Yifan Yang, Yuqing Yang, Dongsheng Li, Kin-Man Lam

    TMM [paper]

  • Degradation-Guided Meta-Restoration Network for Blind Super-Resolution

    Fuzhi Yang, Huan Yang, Yanhong Zeng, Jianlong Fu, Hongtao Lu

    Preprint [paper]

  • Learning Trajectory-Aware Transformer for Video Frame Interpolation

    Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

    TIP [paper]

  • Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

    Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo

    CVPR 2022 [paper] [code]

  • Learning Trajectory-Aware Transformer for Video Super-Resolution

    Chengxu Liu, Huan Yang, Jianlong Fu, Xueming Qian

    CVPR 2022 Oral [paper] [code]

  • Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers

    Yanhong Zeng, Huan Yang, Hongyang Chao, Jianbo Wang, Jianlong Fu

    NeurIPS 2021 [paper]

  • Learning Fine-Grained Motion Embedding for Landscape Animation

    Hongwei Xue, Bei Liu, Huan Yang, Jianlong Fu, Houqiang Li, Jiebo Luo

    ACM Multimedia 2021 Oral [paper]

  • Domain-Aware Universal Style Transfer

    Kibeom Hong, Seogkyu Jeon, Huan Yang, Jianlong Fu, Hyeran Byun

    ICCV 2021 [paper] [code]

  • Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment

    Heliang Zheng, Huan Yang, Jianlong Fu, Zheng-Jun Zha, Jiebo Luo

    ICCV 2021 [paper] [code]

  • Learning Texture Transformer Network for Image Super-Resolution

    Fuzhi Yang, Huan Yang, Jianlong Fu, Hongtao Lu, Baining Guo

    CVPR 2020 [paper] [code]

  • Personalized Exposure Control Using Adaptive Metering and Reinforcement Learning

    Huan Yang, Baoyuan Wang, Noranart Vesdapunt, Minyi Guo, Sing Bing Kang

    TVCG [paper]

  • Unsupervised Extraction of Video Highlights via Robust Recurrent Auto-Encoders

    Huan Yang, Baoyuan Wang, Stephen Lin, David Wipf, Minyi Guo, Baining Guo

    ICCV 2015 [paper]

Projects

  • Allegro Text-to-Video Generation Model [project]

    We release our text-to-video generation model, named Allegro, aims at open the black box of commercial-level video generation model. Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input. It is fully open-sourced including code, model, technical report.

    Allegro_Demo

  • Video Super-Resolution for Edge Browser [project]

    We release our new feautre of video super-resolution in Edge. Video super resolution uses machine learning to enhance the quality of video viewed in Microsoft Edge by using graphics card agnostic algorithms to remove blocky compression artifacts and upscale the video resolution, so you can enjoy crisp and clear videos on YouTube and other video streaming platforms without sacrificing bandwidth.

    EdgeVSR_Demo

    Left: VSR OFF, Right: VSR ON

  • DaVinci Project [project]

    The DaVinci project aims to solve the pain points of existing video enhancement and restoration tools, give full play to the advantages of AI technology and lower the threshold for users to process video footage. Currently, the toolkit includes general image super-resolution and conference meeting enhancement features.

    DaVinci_ISR_Demo

    Left: Input Low-Quality Image, Right: DaVinci Enhanced Result

Activities

  • CVPR Reviewer: 2024, 2023, 2022
  • ICCV Reviewer: 2023, 2021
  • ECCV Reviewer: 2024, 2022
  • NeurIPS Reviewer: 2024, 2023
  • ACM MM Reviewer: 2024, 2023
  • ICASSP Reviewer: 2023
  • ICME Reviewer: 2023, 2022, 2021, 2020
  • Journal Reviewer: TIP, IJCV, TMM, TMI, TCI, PR

Talks

  • [2023.04.15] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation. Invited Talk at MSRA CVPR 2023 Pre-Workshop. [video]
  • [2022.12.23] Transformer Network Design in Low-Level Vision. Invited Talk at PRCV 2022. [slides]
  • [2022.04.23] TTVSR: Learning Trajectory-Aware Transformer for Video Super-Resolution. Invited Talk at MSRA CVPR 2022 Pre-Workshop. [video]
  • [2021.10.13] CKDN: Learning Conditional Knowledge Distillation for Degraded-Reference Image Quality Assessment. Invited Talk at MSRA ICCV 2021 Pre-Workshop. [video]