BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Australia/Melbourne
X-LIC-LOCATION:Australia/Melbourne
BEGIN:DAYLIGHT
TZOFFSETFROM:+1000
TZOFFSETTO:+1100
TZNAME:AEDT
DTSTART:19721003T020000
RRULE:FREQ=YEARLY;BYMONTH=4;BYDAY=1SU
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:19721003T020000
TZOFFSETFROM:+1100
TZOFFSETTO:+1000
TZNAME:AEST
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20260114T163708Z
LOCATION:Meeting Room C4.8\, Level 4 (Convention Centre)
DTSTART;TZID=Australia/Melbourne:20231213T181100
DTEND;TZID=Australia/Melbourne:20231213T182100
UID:siggraphasia_SIGGRAPH Asia 2023_sess147_papers_509@linklings.com
SUMMARY:What is the Best Automated Metric for Text to Motion Generation?
DESCRIPTION:Jordan Voas, Yili Wang, Qixing Huang, and Raymond Mooney (Univ
 ersity of Texas at Austin)\n\nThere is growing interest in generating skel
 eton-based human motions from natural language descriptions. While most ef
 forts have focused on developing better neural architectures for this task
 , there has been no significant work on determining the proper evaluation 
 metric. Human evaluation is the ultimate accuracy measure for this task, a
 nd automated metrics should correlate well with human quality judgments. S
 ince descriptions are compatible with many motions, determining the right 
 metric is critical for evaluating and designing effective generative model
 s. This paper systematically studies which metrics best align with human e
 valuations and proposes new metrics that align even better. Our findings i
 ndicate that none of the metrics currently used for this task show even a 
 moderate correlation with human judgments on a sample level. However, for 
 assessing average model performance, commonly used metrics such as R-Preci
 sion and less-used coordinate errors show strong correlations. Additionall
 y, several recently developed metrics are not recommended due to their low
  correlation compared to alternatives. We also introduce a novel metric ba
 sed on a multimodal BERT-like model,  MoBERT, which offers strongly human-
 correlated sample-level evaluations while maintaining near-perfect model-l
 evel correlation. Our results demonstrate that this new metric exhibits ex
 tensive benefits over all current alternatives.\n\nRegistration Category: 
 Full Access\n\nSession Chair: Sheng Li (Peking University)\n\n
URL:https://asia.siggraph.org/2023/full-program?id=papers_509&sess=sess147
END:VEVENT
END:VCALENDAR
