MuTual is a retrieval-based dataset for Multi-Turn dialogue reasoning, which is modified from Chinese high school English listening comprehension test data. The goal of the MuTual challenge is to evaluate the reasoning ability in chatbots. MuTual Paper
Once you are satisfied with your model performance on the dev set, you are encouraged to send your decode output to cuileyang@zju.edu.cn with your dev performance and methods to get the official scores on the test sets.
A. Thanks for your compliment for the restaurant. ✔
B. I’m sorry that you don’t have a good time.
C. Goodbye brother! Love you.
D. Hurry up honey, or we will be late for the dinner.
Ask us questions at our Github issues page or contact cuileyang@westlake.edu.cn.
MuTual | MuTual plus | ||||||
---|---|---|---|---|---|---|---|
Rank | Model | R@1 | R@2 | MRR | R@1 | R@2 | MRR |
Human Performance (non-native speaker) | 0.938 | 0.971 | 0.964 | 0.930 | 0.972 | 0.961 | |
1 Oct 21, 2022 |
BIDeN-v3
SJTU | 0.930 | 0.983 | 0.962 | 0.845 | 0.958 | 0.914 |
2 May 16, 2021 |
Anonymous
Anonymous | 0.921 | 0.985 | 0.958 | 0.810 | 0.946 | 0.896 |
3 Jun 26, 2021 |
Anonymous
Anonymous | 0.919 | 0.985 | 0.957 | - | - | - |
4 Jan 2, 2021 |
ELECTRA+DAPO
Anonymous | 0.916 | 0.988 | 0.956 | 0.836 | 0.955 | 0.910 |
5 Sep 2, 2020 |
MDFN
SJTU & Huawei Noah' s Ark Lab | 0.916 | 0.984 | 0.956 | - | - | - |
6 Jun 26, 2021 |
Anonymous
Anonymous | 0.915 | 0.982 | 0.954 | - | - | - |
7 Aug 4, 2020 |
GRN-v2
NEU & Alibaba | 0.915 | 0.983 | 0.954 | 0.841 | 0.957 | 0.913 |
8 Aug 04, 2021 |
BIDeN-v2
Anonymous | 0.914 | 0.977 | 0.953 | - | - | - |
9 Sep 14, 2021 |
CF-DialReas
Anonymous | 0.913 | 0.986 | 0.954 | 0.735 | 0.904 | 0.849 |
10 Aug 17, 2021 |
MUSN-v2
Anonymous | 0.912 | 0.983 | 0.953 | - | - | - |
11 Apr 6, 2021 |
DDGM
WHU | 0.911 | 0.980 | 0.952 | - | - | - |
12 Dec 21, 2020 |
Anonymous
ECNU | 0.910 | 0.981 | 0.951 | 0.826 | 0.950 | 0.904 |
13 Nov 13, 2020 |
Anonymous
UCSB | 0.909 | 0.977 | 0.950 | - | - | - |
14 Jul 27, 2020 |
GRN-v1
Anonymous | 0.903 | 0.976 | 0.947 | - | - | - |
15 Jun 26, 2021 |
MUSN
Anonymous | 0.900 | 0.976 | 0.945 | - | - | - |
16 Apr 28, 2020 |
UMN
Anonymous | 0.870 | 0.973 | 0.930 | - | - | - |
17 Apr 27, 2020 |
RoBERTa + OCN
Pattern Recognition Center, WeChat AI | 0.867 | 0.958 | 0.926 | - | - | - |
18 Apr 21, 2020 |
RoBERTa+
Northeastern University | 0.825 | 0.953 | 0.904 | - | - | - |
19 Apr 21, 2020 |
DRRC-1
PKU | 0.771 | 0.914 | 0.869 | - | - | - |
20 Apr 10, 2020 |
RoBERTa
ZJU & MSRA & Westlake | 0.713 | 0.892 | 0.836 | 0.626 | 0.866 | 0.787 |
21 Sep 4, 2020 |
Anonymous
Anonymous | 0.693 | 0.875 | 0.822 | - | - | - |
22 Apr 10, 2020 |
DialogConv
Anonymous | 0.622 | 0.854 | 0.782 | - | - | - |
23 Sep 14, 2021 |
RoBERTa-MC
ZJU & MSRA & Westlake | 0.686 | 0.887 | 0.822 | 0.643 | 0.845 | 0.792 |