← Back to Vault

Eval Data Ingestion

Tom Spencer · Category: frameworks_and_exercises

Upload the GDP VAL evaluation set to GitHub, use LLM agents to extract sample tasks, and export to CSV or Excel to analyze model performance on real-world, economically valuable tasks.