K-MetBench Leaderboard

Public leaderboard for the K-MetBench camera-ready release on expert meteorological reasoning, geo-cultural alignment, and multimodal weather understanding.

K-MetBench evaluates more than 50 language models and vision-language models on 1,774 questions drawn from the Korean National Meteorological Engineer Examination. All models are evaluated under a zero-shot protocol.

The benchmark supports fine-grained analysis through 82 multimodal questions, 141 reasoning questions with expert-verified rationales, and 73 Korean-specific questions, while also spanning five official subject areas: Weather Analysis and Forecast Theory (P1), Meteorological Observation Methods (P2), Atmospheric Dynamics (P3), Climatology (P4), and Atmospheric Physics (P5). Together, these subsets help diagnose gaps in modality understanding, expert reasoning, geo-cultural knowledge, and topic-specific performance in weather-domain evaluation.

Loading leaderboard metrics...

Search models

Results

= Proprietary model. = Korean model. = Vision language model. = Reasoning model. Size = parameter count in billions (B). Acc = Accuracy. Reasoning = Reasoning score (4-20). Geo = Geo-Cultural. Text = Text-Only. Multi = Multimodal. P1 = Weather Analysis & Forecast Theory. P2 = Meteorological Observation Methods. P3 = Atmospheric Dynamics. P4 = Climatology. P5 = Atmospheric Physics.

#	Model	Size	Type	Acc	Reasoning	Geo	Text	Multi	P1	P2	P3	P4	P5

Citation