Interpretability and transparency concerns

Hidden Dangers of Disagreement Prompts (AI Secrets)