Refonte Ai logo gradient

LLM Leaderboards

Discover the SEAL LLM Leaderboards for precise and reliable LLM rankings, where leading large language models (LLMs) are evaluated using a rigorous methodology.

Developed by Refonte AiSafety, Evaluations, and Alignment Lab (SEAL), these leaderboards utilize private datasets to guarantee fair and uncontaminated results. Regular updates ensure the leaderboard reflects the latest in AI advancements, making it an essential resource for understanding the performance and safety of top LLMs.

Private Datasets

Private Datasets

Refonte Ai proprietary, private evaluation datasets can’t be gamed, ensuring unbiased and uncontaminated results.

Evolving Competition

Evolving Competition

We periodically update leaderboards with new datasets and models, fostering a dynamic, contest-like environment.

Expert Evaluations

Expert Evaluations

Our evaluations are performed by thoroughly vetted experts using domain specific methodologies, ensuring the highest quality and credibility.

Learn more about ourLLM evaluation methodology

Model
Score95% Confidence
56.85
+6.92/-6.92
56.06
+6.91/-6.91
55.10
+6.96/-6.96
53.03
+6.95/-6.95
51.27
+6.98/-6.98
49.50
+6.96/-6.96
48.49
+6.96/-6.96
40.40
+6.84/-6.84
40.40
+6.84/-6.84
40.10
+6.84/-6.84
37.88
+6.78/-6.78
35.50
+6.57/-6.68
33.50
+6.59/-6.59
32.83
+6.54/-6.54
20.20
+5.59/-5.59
6.09
+3.34/-3.34

Agentic Tool Use (Enterprise)

Learn More

Model
Score95% Confidence
66.43
+5.47/-5.47
64.58
+5.52/-5.52
60.76
+5.64/-5.64
60.28
+5.66/-5.66
59.93
+5.67/-5.67
59.38
+5.67/-5.67
54.17
+5.78/-5.78
52.78
+5.77/-5.78
51.74
+5.77/-5.77
51.39
+5.77/-5.77
50.35
+5.78/-5.78
50.35
+5.78/-5.78
40.42
+5.68/-5.68
37.23
+5.60/-5.60
30.21
+5.30/-5.30
17.42
+4.39/-4.39
Model
Score95% Confidence
87.60
+1.64/-1.63
86.83
+1.67/-1.66
85.50
+2.09/-2.08
85.29
+1.61/-1.61
84.23
+1.52/-1.51
83.91
+2.07/-2.06
81.85
+1.96/-1.96
81.82
+1.94/-1.94
80.77
+1.84/-1.83
80.49
+1.72/-1.72
79.89
+1.69/-1.69
78.52
+2.33/-2.32
78.24
+2.19/-2.19
77.25
+1.96/-1.97
67.97
+2.61/-2.62
57.69
+2.58/-2.57
Model
Score95% Confidence
96.60
+1.02/-1.02
95.68
+1.15/-1.15
95.60
+1.16/-1.16
95.19
+1.21/-1.21
95.10
+1.22/-1.22
94.85
+1.25/-1.25
94.69
+1.27/-1.27
93.94
+1.35/-1.35
93.28
+1.41/-1.41
92.28
+1.51/-1.51
90.54
+1.65/-1.65
90.12
+1.69/-1.69
90.12
+1.69/-1.69
87.47
+1.87/-1.87
79.83
+2.27/-2.27
37.51
+2.73/-2.73
Model
Score95% Confidence
1121
+25/-24
1109
+28/-27
1105
+34/-37
1073
+38/-35
1064
+22/-21
1058
+27/-27
1014
+36/-42
1009
+28/-27
955
+26/-28
949
+21/-23
917
+26/-26
882
+27/-26
882
+27/-28
881
+27/-29

If you’d like to add your model to this leaderboard or a future version, please contact seal@RefonteAi.com. To ensure leaderboard integrity, we require that models can only be featured the FIRST TIME when an organization encounters the prompts.

Terms of use & Privacy Policy