One doc tagged with "adversarial-attacks"

Bypassing LLM Guardrails

LLMs are trained to understand language, which makes them vulnerable to 'translation attacks.' How Base64, Rot13, and Emoji encodings bypass safety filters.