Output Constraining Attacks: Bypassing Safety via Syntax Coercion
A deep mathematical and architectural analysis of how attackers force LLMs to bypass safety alignments by demanding strict structured output formats like JSON or XML.
A deep mathematical and architectural analysis of how attackers force LLMs to bypass safety alignments by demanding strict structured output formats like JSON or XML.