Challenge Leaderboard
Provisional 2025 Challenge Leaderboard
The entries below are submissions to the 2025 BEHAVIOR challenge. We will migrate the leaderboard to HuggingFace in the future with more details, including task-specific statistics.
About Q-score
We rank policies by Q-score. Q-score measures how much of a task's goal condition a policy satisfies by computing the fraction of completed sub-goals and choosing the best-matched goal clause. It awards partial credit, so policies that make meaningful progress score higher even without full completion. This makes Q-score a smoother, more reliable way to compare policies across BEHAVIOR tasks than a binary success rate.
| Rank | Team | Affiliation | Track | Release | Full Task Success Rate | β Q Score | Date | ||
|---|---|---|---|---|---|---|---|---|---|
| Public Validation | Held-out Test | Public Validation | Held-out Test | ||||||
| 1 | Robot Learning Collective | Independent | Standard | code report | 0.1120 | 0.1240 | 0.2605 | 0.2599 | 20251114 |
| 2 | Comet | NVIDIA Research | Standard | code report | 0.1440 | 0.1140 | 0.1830 | 0.2514 | 20251117 |
| 3 | SimpleAI Robot | Beijing Simple AI Technology Co Ltd | Standard | 0.1400 | 0.1080 | 0.1943 | 0.1591 | 20251117 | |
| 4 | The North Star | Huawei CRI EAI Team | Standard | 0.1280 | 0.0760 | 0.1702 | 0.1204 | 20251116 | |
| 5 | Embodied Intelligence | Independent | Privileged | 0.0620 | 0.0520 | 0.1110 | 0.0947 | 20251117 | |
| 6 | RAPPER | GIST | Privileged | 0.0520 | 0.0750 | 20251117 | |||
| 7 | tobi | Alzonova | Standard | 0.0360 | 0.0717 | 20251117 | |||
| 8 | MR | MR | Privileged | 0.0340 | 0.0512 | 20251115 | |||
| 9 | RACΞL | CMU | Standard | 0.0140 | 0.0140 | 20251116 | |||
| 10 | Ahri+EFFL+MLV | Postech | Standard | 0.0100 | 0.0100 | 20251117 | |||
| 11 | Merlin Labs | Independent | Standard | report | 0.0060 | 0.0090 | 20251117 | ||
| 12 | LYQRobotics | Independent | Standard | 0.0080 | 0.0080 | 20251117 | |||
| 13 | ACT | Xiamen University | Standard | 0.0020 | 0.0037 | 20251116 | |||
| 14 | StarVLA | Independent | Standard | code report | 0.0000 | 0.0019 | 20251117 | ||
| 15 | Cloud-Data | Cloud Data Technology Co Ltd | Standard | 0.0000 | 0.0000 | 20251116 | |||
| 16 | RobotSimArk | 1 | Standard | 0.0000 | 0.0000 | 20251116 | |||
| 17 | EntropyMaximum | Independent | Standard | 0.0000 | 0.0000 | 20251116 | |||
| 18 | Magikid | Magikid | Standard | 0.0000 | 0.0000 | 20251116 | |||