T2M-GPT+:Text-to-Motion Generation with Discrete Representations and Large Language Models

Jianrong Zhang1,*Yangsong Zhang2,*Xiaodong Cun3Xi Shen4Xiaojian Shen5Hehe Fan1Yi Yang1,†

* Equal Contribution   Corresponding Author   1 Zhejiang University   2 Ant Group   3 Tencent AI Lab   4 Intellindust   5 Jilin University

Paper GitHub

Visual Results


[1] Zhang et al. Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv, 2022.

[2] Zhang et al. T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations. CVPR, 2023.

[3] Tevet et al. Human motion diffusion model. ICLR, 2023.

[4] Chen et al. Executing your Commands via Motion Diffusion in Latent Space. CVPR, 2023.

Text Prompting: a man walks forward slowly, then turns around.

MotionDiffuse [1]

T2M-GPT [2]

MDM [3]

MLD [4]

004320_pred.gif
004320_pred.gif
004320_pred.gif
004320_pred.gif

Ours (T2M-GPT+)

Ours (T2M-GIT+)

Ground-Truth

004320_gt.gif
004320_pred.gif
004320_pred.gif

Text Prompting: person appears to be running in straight line then jumps over something and continues running.

MotionDiffuse [1]

T2M-GPT [2]

MDM [3]

MLD [4]

004602_pred.gif
004602_pred.gif
004602_pred.gif
004602_pred.gif

Ours (T2M-GPT+)

Ours (T2M-GIT+)

Ground-Truth

004602_gt.gif
004602_pred.gif
004602_pred.gif

Text Prompting: a person marches in place while raising its arms, then it takes a break, then it starts running in place with raised arms.

MotionDiffuse [1]

T2M-GPT [2]

MDM [3]

MLD [4]

000363_pred.gif
000363_pred.gif
000363_pred.gif
000363_pred.gif

Ours (T2M-GPT+)

Ours (T2M-GIT+)

Ground-Truth

000363_gt.gif
000363_pred.gif
000363_pred.gif

Text Prompting: a person walks in a right bend direction.

MotionDiffuse [1]

T2M-GPT [2]

MDM [3]

MLD [4]

001448_pred.gif
001448_pred.gif
001448_pred.gif
001448_pred.gif

Ours (T2M-GPT+)

Ours (T2M-GIT+)

Ground-Truth

001448_gt.gif
001448_pred.gif
001448_pred.gif

Text Prompting: a person walks straight forward, turns, then runs back.

MotionDiffuse [1]

T2M-GPT [2]

MDM [3]

MLD [4]

001617_pred.gif
001617_pred.gif
001617_pred.gif
001617_pred.gif

Ours (T2M-GPT+)

Ours (T2M-GIT+)

Ground-Truth

001617_gt.gif
001617_pred.gif
001617_pred.gif

Text Prompting: a person carries something it their hands and then uses their right hand to throw it.

MotionDiffuse [1]

T2M-GPT [2]

MDM [3]

MLD [4]

001752_pred.gif
001752_pred.gif
001752_pred.gif
001752_pred.gif

Ours (T2M-GPT+)

Ours (T2M-GIT+)

Ground-Truth

001752_gt.gif
001752_pred.gif
001752_pred.gif

Text Prompting: hands in fighting position while the left foot kicks aggressively up and over.

MotionDiffuse [1]

T2M-GPT [2]

MDM [3]

MLD [4]

001840_pred.gif
001840_pred.gif
001840_pred.gif
001840_pred.gif

Ours (T2M-GPT+)

Ours (T2M-GIT+)

Ground-Truth

001840_gt.gif
001840_pred.gif
001840_pred.gif

Text Prompting: a person does jumping jacks.

MotionDiffuse [1]

T2M-GPT [2]

MDM [3]

MLD [4]

000551_pred.gif
000551_pred.gif
000551_pred.gif
000551_pred.gif

Ours (T2M-GPT+)

Ours (T2M-GIT+)

Ground-Truth

000551_gt.gif
000551_pred.gif
000551_pred.gif

Text Prompting: someone waits a moment and jumps to the right.

MotionDiffuse [1]

T2M-GPT [2]

MDM [3]

MLD [4]

009654_pred.gif
009654_pred.gif
009654_pred.gif
009654_pred.gif

Ours (T2M-GPT+)

Ours (T2M-GIT+)

Ground-Truth

009654_gt.gif
009654_pred.gif
009654_pred.gif

Text Prompting: the person is running over a vault.

MotionDiffuse [1]

T2M-GPT [2]

MDM [3]

MLD [4]

M009924_pred.gif
M009924_pred.gif
M009924_pred.gif
M009924_pred.gif

Ours (T2M-GPT+)

Ours (T2M-GIT+)

Ground-Truth

M009924_gt.gif
M009924_pred.gif
M009924_pred.gif

© This webpage was in part inspired from this template.