MuTual

What is MuTual?

MuTual is a retrieval-based dataset for Multi-Turn dialogue reasoning, which is modified from Chinese high school English listening comprehension test data. The goal of the MuTual challenge is to evaluate the reasoning ability in chatbots. MuTual Paper

Download

Download Training Set Download Dev Set Download Test Set

Evaluation

Once you are satisfied with your model performance on the dev set, you are encouraged to send your decode output to cuileyang@zju.edu.cn with your dev performance and methods to get the official scores on the test sets.

Evaluation Script Sample Prediction

Data Sample

Dialogue

M: Ma'am, you forgot your phone.

F: Oh, thanks, I couldn't live without this little thing.

M: I know what you mean. It is of great significance to you. So did you enjoy your dinner?

F: Oh yes, everything was just perfect. It's so hard to take the whole family out to eat, but your restaurant was perfect. Johnny had his own place to play in and I had time to talk with my sisters and their husbands.

Response

A. Thanks for your compliment for the restaurant. ✔

B. I’m sorry that you don’t have a good time.

C. Goodbye brother! Love you.

D. Hurry up honey, or we will be late for the dinner.

Have Questions?

Ask us questions at our Github issues page or contact cuileyang@westlake.edu.cn.

Star

Leaderboard

		MuTual			MuTual plus
Rank	Model	R@1	R@2	MRR	R@1	R@2	MRR
	Human Performance (non-native speaker)	0.938	0.971	0.964	0.930	0.972	0.961
1 Oct 21, 2022	BIDeN-v3 SJTU	0.930	0.983	0.962	0.845	0.958	0.914
2 May 16, 2021	Anonymous Anonymous	0.921	0.985	0.958	0.810	0.946	0.896
3 Jun 26, 2021	Anonymous Anonymous	0.919	0.985	0.957	-	-	-
4 Jan 2, 2021	ELECTRA+DAPO Anonymous	0.916	0.988	0.956	0.836	0.955	0.910
5 Sep 2, 2020	MDFN SJTU & Huawei Noah' s Ark Lab	0.916	0.984	0.956	-	-	-
6 Jun 26, 2021	Anonymous Anonymous	0.915	0.982	0.954	-	-	-
7 Aug 4, 2020	GRN-v2 NEU & Alibaba	0.915	0.983	0.954	0.841	0.957	0.913
8 Aug 04, 2021	BIDeN-v2 Anonymous	0.914	0.977	0.953	-	-	-
9 Sep 14, 2021	CF-DialReas Anonymous	0.913	0.986	0.954	0.735	0.904	0.849
10 Aug 17, 2021	MUSN-v2 Anonymous	0.912	0.983	0.953	-	-	-
11 Apr 6, 2021	DDGM WHU	0.911	0.980	0.952	-	-	-
12 Dec 21, 2020	Anonymous ECNU	0.910	0.981	0.951	0.826	0.950	0.904
13 Nov 13, 2020	Anonymous UCSB	0.909	0.977	0.950	-	-	-
14 Jul 27, 2020	GRN-v1 Anonymous	0.903	0.976	0.947	-	-	-
15 Jun 26, 2021	MUSN Anonymous	0.900	0.976	0.945	-	-	-
16 Apr 28, 2020	UMN Anonymous	0.870	0.973	0.930	-	-	-
17 Apr 27, 2020	RoBERTa + OCN Pattern Recognition Center, WeChat AI	0.867	0.958	0.926	-	-	-
18 Apr 21, 2020	RoBERTa+ Northeastern University	0.825	0.953	0.904	-	-	-
19 Apr 21, 2020	DRRC-1 PKU	0.771	0.914	0.869	-	-	-
20 Apr 10, 2020	RoBERTa ZJU & MSRA & Westlake	0.713	0.892	0.836	0.626	0.866	0.787
21 Sep 4, 2020	Anonymous Anonymous	0.693	0.875	0.822	-	-	-
22 Apr 10, 2020	DialogConv Anonymous	0.622	0.854	0.782	-	-	-
23 Sep 14, 2021	RoBERTa-MC ZJU & MSRA & Westlake	0.686	0.887	0.822	0.643	0.845	0.792