← Back to Vault

Systematic Benchmarking Approach

Tom Spencer · Category: frameworks_and_exercises

When evaluating new AI models, compare against both closed and open-source baselines (e.g., GPT5, Gemini 2.5 Flash) across a representative suite of benchmarks to get meaningful performance insights.