NJU

Ying Tai's Homepage

Ying Tai's profile picture 

Dr. Ying Tai (邰颖)

Associate Professor (PhD Advisor)

Nanjing University (Suzhou Campus)

1520 Taihu Road, Suzhou, P.R. China

Email: yingtai(at)nju.edu.cn; tyshiwo(at)gmail.com

Google Scholar | Github / Github (Group) | Scopus

Public office hour: For undergraduate students in NJU, feel free to drop by my office at Room 522, Nanyong Building (West) every Wednesday from 10am to 11am.

Opening positions (RA招生简介): 招科研助理(RA)岗位1名,多模态视觉生成方向。需要本科/硕士应届毕业,有一定科研经历优先,感兴趣的同学欢迎邮件联系我。

Biography

I am currently an Associate Professor at School of Intelligence Science and Technology, Nanjing University (Suzhou Campus). Previously, I was a Principal Researcher and Team Lead at Tencent Youtu Lab, where I spent more than 6 wonderful years, leading two teams developing novel vision algorithms that are applied in several products, e.g., Virtual Background feature in Tencent Meeting, High-fidelity face generation APIs in Tencent Cloud and Talking Face Generation for digital human product. Also, our team conducted cutting-edge research works that are published in top-tier AI conferences.

I got my Ph.D. degree from the Department of Computer Science and Engineering, Nanjing University of Science & Technology (NUST) in 2017, and my advisor is Prof. Jian Yang. In 2016, I spent 6 wonderful months as a visiting student at Prof. Xiaoming Liu's lab in Michigan State University.

My research interests include Frontier Generative AI research and applications based on advanced large vision and language models. Specifically, I work on

Recent Research Projects

openvid 

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

arXiv | Daily paper on HF (#2 of the day) | Demo on Hugging Face | Code GitHub stars

TL;DR: RAG is a tuning-free Regional-Aware text-to-image Generation framework on top of DiT-based model (FLUX.1-dev), with two novel components, Regional Hard Binding and Regional Soft Refinement, for precise and harmonious regional control.

RAG has been demonstrated to outperform Flux-1.dev (current top t2i model), SD3 (ICML'24) and RPG (ICML'24) in complex compositional generation, excelling in aesthetics, text-image alignment, and precise control.

Repainting Capability of RAG: Modify specific regions without affecting others.

RAG can be easily incorperated with the acceleration method Hyper-Flux and various other Lora models.

Prompt: "On the left, Einstein is painting the Mona Lisa; in the center, Elon Reeve Musk is participating in the U.S. presidential election; on the right, Trump is hosting a Tesla product launch"

openvid 

OpenVid-1M: A Large-Scale Dataset for High-Quality Text-to-Video Generation

Dataset | arXiv | Daily paper on HF (#2 of the day) | Code GitHub stars | Models | Demo (High-res)

TL;DR: OpenVid-1M is a high-quality text-to-video dataset designed for research institutions to enhance video quality, featuring high aesthetics, clarity, and resolution. It can be used for direct training or as a quality tuning complement to other video datasets. It can also be used in other video generation task (video super-resolution, frame interpolation, etc)

We carefully curate 1 million high-quality video clips with expressive captions to advance text-to-video research, in which 0.4 million videos are in 1080P resolution (termed OpenVidHD-0.4M) .

OpenVid-1M is cited, discussed or used in several recent works, including video diffusion models MarDini, Allegro, T2V-Turbo-V2, Pyramid Flow; long video generation model with AR model ARLON; visual understanding and generation model VILA-U; and Frame interpolation model Framer.

OpenVid-1M dataset was downloaded over 30,000 times on Huggingface last month, placing it in the top 1% of all video datasets (as of Nov. 2024).

Past Projects on Generative AI

News

  • 09/2024 – 2 papers (3D facial texture modeling and Low light enhancement via Mamba structure) accepted by NeurIPS 2024

  • 09/2024 – World's Top 2% Scientists (both Career and Single year) by Stanford University

  • 07/2024 – We released the website/dataset/codes/models/arxiv of OpenVid-1M (a high-quality text-to-video dataset to enhance video quality, featuring high aesthetics, clarity, and resolution).

  • 07/2024 – 1 paper (Efficient Subject-driven Generation) accepted by ECCV 2024

  • 06/2024 – I will be Area Chair for WACV 2025

  • 04/2024 – Being included in Research.com 2023 Ranking of Best Scientists in Computer Science (#9590 in the world, #1022 in China and #12 in NJU)

  • 04/2024 – We released the codes and pretrained models of AddSR (accelerating inference speed of diffusion-based model for super-resolution).

  • 03/2024 – Two papers ( PortraitBooth (text to portrait generation) and FaceChain-ImagineID (audio to talking face generation)) accepted by CVPR'24 (8 consecutive years since 2017 :))

  • 12/2023 – Two recent papers are released: PortraitBooth (CVPR'24) and FaceX (arXiv'24) (general model for popular facial editing tasks)

  • 12/2023 – 1 paper accepted by ICASSP'24

  • 10/2023 – 2022 World's Top 2% Scientists by Stanford University ( Ranked 5th in Tencent)

  • 09/2023 – 1 paper ( WaveletVFI) accepted by IEEE Transactions on Image Processing 2023

  • 08/2023 – I joined Nanjing University (Suzhou Campus)

  • 07/2023 – I will be an Associate Editor for Image and Vision Computing

  • 07/2023 – 1 paper accepted by ICCV'23

  • 05/2023 – I will be an Area Chair for WACV 2024

  • 03/2023 – 3 papers accepted by CVPR'23

  • 11/2022 – 2 papers accepted by AAAI'23 (1 Oral and 1 Poster)

  • 09/2022 – 1 paper accepted by ACM Transactions on Graphics 2022

  • 07/2022 – 5 papers accepted by ECCV'22

  • 06/2022 – Our CDSR on blind super resolution is accepted by ACM MM'22, with the acceptance rate to be 27.9%

  • 06/2022 – Our AutoGAN-Synthesizer on MRI reconstruction is accepted by MICCAI'22

  • 05/2022 – I will be Area Chairs for WACV 2023 and FG 2023

  • 04/2022 – Our HifiHead on high-fidelity Neural Head Synthesis is accepted by IJCAI'22, with the acceptance rate to be 15%

  • 03/2022 – Our face recognition work CurricularFace (CVPR'20) is inlcuded in 2022 AI index report from Stanford University

  • 03/2022 – 5 papers accepted by CVPR'22, with the acceptance rate to be 25.3%

  • 03/2022 – First Prize of Progress in Science and Technology of Jiangsu Province (4/11), “Image restoration and robust recognition: theory and algorithms”

  • 02/2022 – I will be an Area Chair for ECCV 2022

  • 12/2021 – 3 papers accepted by AAAI'22 (1 Oral and 2 Posters), with the acceptance rate to be 15%

  • 09/2021 – 2 papers on blind SR and ViT accepted by NeurIPS'21, with the acceptance rate to be 26%

  • 07/2021 – 2 papers on crowd counting accepted by ICCV'21 (1 Oral and 1 Poster), with the acceptance rate to be 25.9%

  • 07/2021 – Our ASFD on face detection is accepted by ACM MM'21

  • 04/2021 – 4 papers accepted by IJCAI'21, with the acceptance rate to be 13.9%

  • 04/2021 – Our Team Imagination is the winner of CVPR NTIRE 2021 Challenge on Video Spatial-Temporal Super-Resolution

  • 03/2021 – 3 papers accepted by CVPR'21 (1 Oral and 2 Posters), with the acceptance rate to be 23.7%

  • 12/2020 – 4 papers accepted by AAAI'21, with the acceptance rate to be 21%

  • 09/2020 – Training codes of RealSR are available in Tencent official github account [Tencent-RealSR].

  • 07/2020 – 6 papers accepted by ECCV'20, with the acceptance rate to be 27%

  • 05/2020 – Our RealSR model (Team name: Impressionism) won both tracks of CVPR NTIRE 2020 Challenge on Real-World Super-Resolution

  • 02/2020 – 3 papers accepted by CVPR'20, with the acceptance rate to be 22.1%

  • 11/2019 – 2 papers (Action Proposal & Action Recognition) accepted by AAAI'20, with the acceptance rate to be 20.6%. The code of our DBG is released at [ActionDetection-DBG], which achieves Top 1 performance on ActivityNet Challenge 2019 on Temporal Action Proposals

  • 02/2019 – Our DSFD on face detection is accepted by CVPR'19, with the acceptance rate to be 25.2%

  • 11/2018 – 2 papers (face alignement & adaptive metric learning) accepted by AAAI'19, with the acceptance rate to be ONLY 16.2%

  • 10/2018 – We released a novel Dual Shot Face Detector (DSFD) framework that achieves Top 1 performance on all FIVE settings of WIDER FACE (Easy/Medium/Hard) and FDDB (Discrete/Continuous) datasets

  • 07/2018 – 1 paper accepted by ECCV'18

  • 02/2018 – 1 paper accepted by CVPR'18 (SPOTLIGHT Presentation)

  • 07/2017 – 1 paper accepted by ICCV'17 (SPOTLIGHT Presentation)

  • 03/2017 – 1 paper accepted by CVPR'17