Model performance on HV-MMBench under the Multiple-Choice (MC), True/False (TF), Fill-in-Blank (FIB) and Open-Ended (OE) questions.
# | Model | LLM Params |
Date | MC (%) | TF (%) | FIB (%) | OE (%) | ||||
---|---|---|---|---|---|---|---|---|---|---|---|
Acc | Acc | Precision | Recall | F1 | ScoreF | ScoreO | ScoreG | ||||
VideoLLaMA2
DAMO Academy & Alibaba Group |
7B | 2025-05-22 | 81.7 | 80.4 | 4.31 | 0.55 | 0.97 | 0.19 | 0.35 | 0.56 | |
LLaVA-Video
Bytedance & NTU S-Lab & BUPT |
7B | 2025-05-22 | 88.6 | 81.4 | 17.5 | 2.32 | 4.01 | 0.14 | 0.24 | 0.49 | |
Qwen2-VL
Alibaba Group |
7B | 2025-05-22 | 84.3 | 84.1 | 13.0 | 1.64 | 2.81 | 0.15 | 0.33 | 0.56 | |
Qwen2.5-VL
Alibaba Group |
7B | 2025-05-22 | 86.8 | 88.3 | 16.8 | 2.21 | 3.82 | 0.22 | 0.47 | 0.64 | |
Qwen2.5-VL
Alibaba Group |
32B | 2025-05-22 | 86.8 | 89.9 | 19.7 | 2.48 | 4.33 | 0.19 | 0.51 | 0.69 | |
Intern2.5-VL
Shanghai AI Laboratory |
8B | 2025-05-22 | 85.3 | 79.5 | 6.45 | 0.85 | 1.49 | 0.15 | 0.30 | 0.57 | |
Intern2.5-VL
Shanghai AI Laboratory |
38B | 2025-05-22 | - | - | 6.22 | 0.85 | 1.45 | 0.17 | 0.37 | 0.53 | |
LLaVAOneVision
ByteDance |
7B | 2025-05-22 | 91.1 | 84.9 | 11.7 | 1.52 | 2.66 | - | - | - |
๐จ To submit your results to the leaderboard, please send model responses to this email.