Skip to content

Challenge Leaderboard

Provisional 2025 Challenge Leaderboard

The entries below are submissions to the 2025 BEHAVIOR challenge. We will migrate the leaderboard to HuggingFace in the future with more details, including task-specific statistics.

About Q-score

We rank policies by Q-score. Q-score measures how much of a task's goal condition a policy satisfies by computing the fraction of completed sub-goals and choosing the best-matched goal clause. It awards partial credit, so policies that make meaningful progress score higher even without full completion. This makes Q-score a smoother, more reliable way to compare policies across BEHAVIOR tasks than a binary success rate.

Rank Team Affiliation Date Track Full Task Success Rate β˜… Q Score
Public Validation Held-out Test Public Validation Held-out Test
1Robot Learning CollectiveIndependent20251114Standard0.11200.12400.26050.2599
2CometNVIDIA Research20251117Standard0.14400.11400.18300.2514
3SimpleAI RobotBeijing Simple AI Technology Co Ltd20251117Standard0.14000.10800.19430.1591
4The North StarHuawei CRI EAI Team20251116Standard0.12800.07600.17020.1204
5Embodied IntelligenceIndependent20251117Privileged0.06200.05200.11100.0947
6RAPPERGIST20251117Privileged0.05200.0750
7tobiAlzonova20251117Standard0.03600.0717
8MRMR20251115Privileged0.03400.0512
9RACΞLCMU20251116Standard0.01400.0140
10Ahri+EFFL+MLVPostech 20251117Standard0.01000.0100
11Merlin LabsIndependent20251117Standard0.00600.0090
12LYQRoboticsIndependent20251117Standard0.00800.0080
13ACTXiamen University20251116Standard0.00200.0037
14StarVLAIndependent20251117Standard0.00000.0019
15Cloud-DataCloud Data Technology Co Ltd20251116Standard0.00000.0000
16RobotSimArk120251116Standard0.00000.0000
17EntropyMaximumIndependent20251116Standard0.00000.0000
18MagikidMagikid20251116Standard0.00000.0000