https://www.darkreading.com/cyberattacks-data-breaches/google-red-team-provides-insight-on-real-world-ai-attacks
<aside> 💡 AI Generated Summary
Google's researchers have identified six specific real-world attacks on AI systems: • -
These attacks can lead to unexpected or malicious results, such as security-evasive phishing attacks or data theft. To counter these threats, Google suggests a combination of adversarial simulations and AI expertise for a robust defense. Traditional security controls can also effectively mitigate risk to AI systems.
</aside>
Source: Parilov via Shutterstock
Google researchers have identified six specific attacks that can occur against real-world AI systems, finding that these common attack vectors demonstrate a unique complexity, That will require a combination of adversarial simulations and the help of AI subject-matter expertise to construct a solid defense, they noted.
The company revealed in a report published this week that its dedicated AI red team has already uncovered various threats to the fast-growing technology, mainly based on how attackers can manipulate the large language models (LLMs) that drive generative AI products like ChatGPT, Google Bard, and more.
The attacks largely result in the technology producing unexpected or even malice-driven results, which can lead to outcomes as benign as the average person's photos showing up on a celebrity photo website, to more serious consequences such as security-evasive phishing attacks or data theft.
Google's findings come on the heels of its release of the Secure AI Framework (SAIF), which the company said is aimed at getting out in front of the AI security issue before it's too late, as the technology already is experiencing rapid adoption, creating new security threats in its wake.
The first group of common attacks that Google ID'd are prompt attacks, which involve "prompt engineering." That's a term that refers to crafting effective prompts that instruct LLMs to perform desired tasks. This influence on the model, when malicious, can in turn maliciously influence the output of an LLM-based app in ways that are not intended, the researchers said.
An example of this would be if someone added a paragraph to an AI-based phishing attack that is invisible to the end user but could direct the AI to classify a phishing email as legitimate. This might allow it to get past email anti-phishing protections and increase the chances that a phishing attack is successful.
Another type of attack that the team uncovered is one called training-data extraction, which aims to reconstruct verbatim training examples that an LLM uses — for example, the contents of the Internet.
In this way, attackers can extract secrets such as verbatim personally identifiable information (PII) or passwords from the data. "Attackers are incentivized to target personalized models, or models that were trained on data containing PII, to gather sensitive information," the researchers wrote.
A third potential AI attack is backdooring the model, whereby an attacker "may attempt to covertly change the behavior of a model to produce incorrect outputs with a specific 'trigger' word or feature, also known as a backdoor," the researchers wrote. In this type of an attack, a threat actor can hide code either in the model or in its output to conduct malicious activity.
A fourth attack type, called adversarial examples, are inputs that an attacker provides to a model to result in a "deterministic, but highly unexpected output," the researchers wrote. An example would be that the model could show an image that clearly shows one thing to the human eye but which the model recognizes as something else entirely. This type of attack could be fairly benign — in a case where someone could train the model to recognize his or her own photo as one deemed worthy of inclusion on a celebrity website — or critical, depending on the technique and intent.
An attacker also could use a data-poisoning attack to manipulate the training data of the model to influence the model's output according to the attacker’s preference — something that also could threaten the security of the software supply chain if developers are using AI to help them develop software. The impact of this attack could be similar to backdooring the model, the researchers noted.
The final type of attack identified by Google's dedicated AI red team is an exfiltration attack, in which attackers can copy the file representation of a model to steal sensitive intellectual property stored in it. They can then that information to generate their own models that can be used to give attackers unique capabilities in custom-crafted attacks.