Overall Model Accuracy
Strict accuracy (%), sorted from highest to lowest.
Open-source
Closed-source
Agent
Fine-grained
Fine-grained knowledge acquisition benchmark
From fine-grained recognition to evidence-grounded knowledge acquisition.
FIKA-Bench asks multimodal models and agents to identify unfamiliar fine-grained targets by searching, verifying, and using external evidence, rather than relying on memorized benchmark labels.
Case Studies
Results
Strict accuracy (%), sorted from highest to lowest.
| Rank | System | Type | Public | Real-Life | Overall | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Prod. | Nat. | Trans. | Cult. | Avg. | Prod. | Nat. | Trans. | Cult. | Avg. | ||||
| 1 | Kimi-K2.6 | Open | 0.0 | 26.0 | 44.6 | 6.5 | 21.5 | 23.1 | 40.9 | 33.3 | 32.3 | 31.0 | 25.1 |
| 2 | Gemini-3.1-Flash-Lite | Closed | 3.7 | 20.0 | 37.5 | 1.6 | 16.9 | 25.6 | 36.4 | 29.2 | 19.4 | 26.7 | 20.6 |
| 3 | OpenClaw + Qwen3.5-397B-A17B | Agent | 7.4 | 16.0 | 25.0 | 3.2 | 13.3 | 23.1 | 40.9 | 33.3 | 22.6 | 28.4 | 19.0 |
| 4 | Qwen3.5-397B-A17B | Open | 11.1 | 12.0 | 28.6 | 3.2 | 13.8 | 20.5 | 31.8 | 33.3 | 22.6 | 25.9 | 18.3 |
| 5 | GPT-5-mini | Closed | 3.7 | 12.0 | 41.1 | 0.0 | 15.4 | 15.4 | 45.5 | 25.0 | 6.5 | 20.7 | 17.4 |
| 6 | GLM-5V-Turbo | Closed | 3.7 | 16.0 | 23.2 | 4.8 | 12.8 | 10.3 | 40.9 | 33.3 | 9.7 | 20.7 | 15.8 |
| 7 | Qwen3-VL-235B-A22B | Open | 11.1 | 10.0 | 25.0 | 0.0 | 11.3 | 12.8 | 31.8 | 20.8 | 22.6 | 20.7 | 14.8 |
| 8 | OpenClaw + Qwen3-VL-8B | Agent | 0.0 | 4.0 | 25.0 | 0.0 | 8.2 | 20.5 | 40.9 | 16.7 | 19.4 | 23.3 | 13.8 |
| 9 | OpenClaw + MiniMax-M2.7/Qwen3-VL-8B | Agent | 0.0 | 6.0 | 21.4 | 0.0 | 7.7 | 17.9 | 27.3 | 25.0 | 16.1 | 20.7 | 12.5 |
| 10 | Fine-R1-7B | Fine | 3.7 | 14.0 | 21.4 | 3.2 | 11.3 | 7.7 | 18.2 | 16.7 | 9.7 | 12.1 | 11.6 |
| 11 | OpenCode + MiniMax-M2.7/Qwen3-VL-8B | Agent | 0.0 | 12.0 | 16.1 | 0.0 | 7.7 | 7.7 | 27.3 | 20.8 | 9.7 | 14.7 | 10.3 |
| 12 | Qwen3.5-9B | Open | 0.0 | 2.0 | 17.9 | 0.0 | 5.6 | 7.7 | 45.5 | 12.5 | 6.5 | 15.5 | 9.3 |
| 13 | Qwen3-VL-8B | Open | 0.0 | 6.0 | 17.9 | 0.0 | 6.7 | 7.7 | 27.3 | 12.5 | 9.7 | 12.9 | 9.0 |
| 14 | OpenCode + Qwen3-VL-8B | Agent | 3.7 | 4.0 | 19.6 | 1.6 | 7.7 | 5.1 | 22.7 | 16.7 | 0.0 | 9.5 | 8.4 |
| 15 | VisualRFT-7B | Fine | 3.7 | 8.0 | 17.9 | 0.0 | 7.7 | 5.1 | 4.5 | 4.2 | 6.5 | 5.2 | 6.8 |
It combines visual input, fine-grained labels, external tools, and a leakage-aware closed-book filter. This targets active knowledge acquisition rather than static fine-grained classification alone.
Dataset
Citation
@misc{li2026fikabenchfinegrainedrecognitionfinegrained,
title={FIKA-Bench: From Fine-grained Recognition to Fine-Grained Knowledge Acquisition},
author={Geng Li and Yuxin Peng},
year={2026},
eprint={2605.13193},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.13193}
}