As artificial intelligence tools become more embedded in daily workflows, cybersecurity experts are warning that attackers are finding new ways to exploit them. Security researchers at Trail of Bits have demonstrated a novel attack technique that embeds malicious prompts within images, which are then revealed when these images are processed by large language models (LLMs).
Hidden instructions emerge through image downscaling
The method leverages the way AI platforms resize images for performance optimisation. Although the malicious prompts are invisible to the human eye in the original image, they become legible to the algorithm when the image is downscaled.
This attack builds on a 2020 study from TU Braunschweig in Germany, which highlighted image scaling as a potential vulnerability in machine learning systems. Trail of Bits has demonstrated that carefully crafted images can manipulate AI platforms, including Gemini CLI, Vertex AI Studio, Google Assistant on Android, and Gemini’s web interface.
In one test, attackers were able to extract Google Calendar data and send it to an external email address without user consent, demonstrating the potential seriousness of this vulnerability. The attack exploits common interpolation techniques such as nearest neighbour, bilinear, or bicubic resampling, where scaling can unintentionally reveal hidden instructions.
During testing, bicubic resampling caused dark image areas to shift and reveal concealed black text, which the LLM interpreted as a valid user command. From the user’s point of view, no unusual activity was visible, yet the AI model acted on these hidden instructions in the background.
Demonstration tool highlights potential threats
To showcase the risks, Trail of Bits created an open-source tool called Anamorpher, which generates images with concealed prompts for various scaling techniques. The researchers emphasised that while this method is highly specialised, it is reproducible, and a lack of security measures could make systems vulnerable.
This vulnerability raises broader concerns about multimodal AI systems, which are increasingly powering everyday tasks. An unsuspecting user could upload a seemingly harmless image that triggers unauthorised access to private information. The researchers warn that this type of attack could enable identity theft if sensitive data is exfiltrated through these hidden prompts.
As AI tools are often integrated with calendars, communication systems, and workflow platforms, the risk extends beyond individual users, potentially threatening organisations that rely heavily on these systems.
Calls for stronger security design in AI systems
The researchers recommend that developers and users take proactive steps to reduce this risk. Suggested measures include restricting input image dimensions, previewing images after scaling, and requiring explicit confirmation before executing sensitive actions.
Traditional security measures such as firewalls and malware scanners are not designed to detect these forms of manipulation, creating an opportunity for attackers to bypass standard defences. Trail of Bits argues that only layered security strategies and robust design principles can reliably defend against these threats.
“The strongest defence, however, is to implement secure design patterns and systematic defences that mitigate impactful prompt injection beyond multimodal prompt injection,” the researchers said.