Invisible Text Attacks: Hiding Prompts from Humans
Hackers use CSS and PDF hacks to hide malicious prompts from humans while keeping them visible to LLMs. Learn how to detect stealth attacks.
Hackers use CSS and PDF hacks to hide malicious prompts from humans while keeping them visible to LLMs. Learn how to detect stealth attacks.
Why safety filters trained on English fail against Russian, Chinese, or low-resource languages. Understanding Cross-Lingual Attacks.
Understanding persona-based attacks on LLMs. How DAN and Developer Mode exploits work and how to detect them.