Skip to content

Challenge Leaderboard

Provisional 2025 Challenge Leaderboard

The entries below are submissions to the 2025 BEHAVIOR challenge. We will migrate the leaderboard to HuggingFace in the future with more details, including task-specific statistics.

About Q-score

We rank policies by Q-score. Q-score measures how much of a task's goal condition a policy satisfies by computing the fraction of completed sub-goals and choosing the best-matched goal clause. It awards partial credit, so policies that make meaningful progress score higher even without full completion. This makes Q-score a smoother, more reliable way to compare policies across BEHAVIOR tasks than a binary success rate.

Rank Team Affiliation Track Release Full Task Success Rate β˜… Q Score Date
Public Validation Held-out Test Public Validation Held-out Test
1Robot Learning CollectiveIndependentStandardcode
report
0.11200.12400.26050.259920251114
2CometNVIDIA ResearchStandardcode
report
0.14400.11400.18300.251420251117
3SimpleAI RobotBeijing Simple AI Technology Co LtdStandard0.14000.10800.19430.159120251117
4The North StarHuawei CRI EAI TeamStandard0.12800.07600.17020.120420251116
5Embodied IntelligenceIndependentPrivileged0.06200.05200.11100.094720251117
6RAPPERGISTPrivileged0.05200.075020251117
7tobiAlzonovaStandard0.03600.071720251117
8MRMRPrivileged0.03400.051220251115
9RACΞLCMUStandard0.01400.014020251116
10Ahri+EFFL+MLVPostech Standard0.01000.010020251117
11Merlin LabsIndependentStandardreport0.00600.009020251117
12LYQRoboticsIndependentStandard0.00800.008020251117
13ACTXiamen UniversityStandard0.00200.003720251116
14StarVLAIndependentStandardcode
report
0.00000.001920251117
15Cloud-DataCloud Data Technology Co LtdStandard0.00000.000020251116
16RobotSimArk1Standard0.00000.000020251116
17EntropyMaximumIndependentStandard0.00000.000020251116
18MagikidMagikidStandard0.00000.000020251116