Safety Prompts Are Hackable
Tom Spencer · Category: points_of_view
Simple system-level safety prompts can be prompt-injected or hacked, so relying solely on them may not prevent unwanted agent behaviors.
© 2025 The Build. All rights reserved.
Privacy Policy