MuTual

Multi-Turn Dialogue Reasoning Challenge

What is MuTual?

MuTual is a retrieval-based dataset for Multi-Turn dialogue reasoning, which is modified from Chinese high school English listening comprehension test data. The goal of the MuTual challenge is to evaluate the reasoning ability in chatbots. MuTual Paper


Download

Evaluation

Once you are satisfied with your model performance on the dev set, you are encouraged to send your decode output to cuileyang@zju.edu.cn with your dev performance and methods to get the official scores on the test sets.

Data Sample

Dialogue

M: Ma'am, you forgot your phone.

F: Oh, thanks, I couldn't live without this little thing.

M: I know what you mean. It is of great significance to you. So did you enjoy your dinner?

F: Oh yes, everything was just perfect. It's so hard to take the whole family out to eat, but your restaurant was perfect. Johnny had his own place to play in and I had time to talk with my sisters and their husbands.

Response

A. Thanks for your compliment for the restaurant. ✔

B. I’m sorry that you don’t have a good time.

C. Goodbye brother! Love you.

D. Hurry up honey, or we will be late for the dinner.

Have Questions?

Ask us questions at our Github issues page or contact cuileyang@westlake.edu.cn.

Leaderboard

MuTual MuTual plus
Rank Model R@1 R@2 MRR R@1 R@2 MRR

Human Performance (non-native speaker)

0.938 0.971 0.964 0.930 0.972 0.961

1

Aug 14, 2021
BIDeN-v3

Anonymous

0.930 0.983 0.962 - - -

2

May 16, 2021
Anonymous

Anonymous

0.921 0.985 0.958 0.810 0.946 0.896

3

Jun 26, 2021
Anonymous

Anonymous

0.919 0.985 0.957 - - -

4

Jan 2, 2021
ELECTRA+DAPO

Anonymous

0.916 0.988 0.956 0.836 0.955 0.910

5

Sep 2, 2020
MDFN

SJTU & Huawei Noah' s Ark Lab

0.916 0.984 0.956 - - -

6

Jun 26, 2021
Anonymous

Anonymous

0.915 0.982 0.954 - - -

7

Aug 4, 2020
GRN-v2

NEU & Alibaba

0.915 0.983 0.954 0.841 0.957 0.913

8

Aug 04, 2021
BIDeN-v2

Anonymous

0.914 0.977 0.953 - - -

9

Sep 14, 2021
CF-DialReas

Anonymous

0.913 0.986 0.954 0.735 0.904 0.849

10

Aug 17, 2021
MUSN-v2

Anonymous

0.912 0.983 0.953 - - -

11

Apr 6, 2021
DDGM

WHU

0.911 0.980 0.952 - - -

12

Dec 21, 2020
Anonymous

ECNU

0.910 0.981 0.951 0.826 0.950 0.904

13

Nov 13, 2020
Anonymous

UCSB

0.909 0.977 0.950 - - -

14

Jul 27, 2020
GRN-v1

Anonymous

0.903 0.976 0.947 - - -

15

Jun 26, 2021
MUSN

Anonymous

0.900 0.976 0.945 - - -

16

Apr 28, 2020
UMN

Anonymous

0.870 0.973 0.930 - - -

17

Apr 27, 2020
RoBERTa + OCN

Pattern Recognition Center, WeChat AI

0.867 0.958 0.926 - - -

18

Apr 21, 2020
RoBERTa+

Northeastern University

0.825 0.953 0.904 - - -

19

Apr 21, 2020
DRRC-1

PKU

0.771 0.914 0.869 - - -

20

Apr 10, 2020
RoBERTa

ZJU & MSRA & Westlake

0.713 0.892 0.836 0.626 0.866 0.787

21

Sep 4, 2020
Anonymous

Anonymous

0.693 0.875 0.822 - - -

22

Apr 10, 2020
DialogConv

Anonymous

0.622 0.854 0.782 - - -

23

Sep 14, 2021
RoBERTa-MC

ZJU & MSRA & Westlake

0.686 0.887 0.822 0.643 0.845 0.792