← Back to Vault

Demos Are Insufficient

Tom Spencer · Category: points_of_view

Evaluating a model based on a handful of online demos is misleading because different tasks reveal different behaviors and no single demo represents general performance.