To help analyze which model and system prompt is working and which one isn't
To help analyze which model and system prompt is working and which one isn't