Enterprise AI Security Guide: Risks, Challenges, and Solutions
Discover more about AI security in enterprise and the related measures to mitigate AI risks.
Large Language Models (LLMs) are being used in multiple domain applications in enterprises – from powerful enterprise search solutions to chatbots and AI agents that assist customers and employees – and are more and more being integrated with proprietary data to enable Retrieval Augmented Generation (RAG), a solution that allows retrieving data from outside a foundation model and augmenting prompts by adding the relevant retrieved data in context.
However, Artificial Intelligence (AI) and Generative Artificial Intelligence (GenAI) expose organizations to new security, ethical, and operational risks.
Questions about biases, adversarial attacks, privacy, and hallucinations, mixed with legal concerns and data security, are emerging, as the key to safe AI deployment. Ensuring the security of AI systems is critical for organisations adopting LLMs to mitigate risks and maintain trust.
Any AI system includes – at a minimum – the data, model, and processes for training, testing, and deploying the models and the infrastructure required for using them. Despite the significant progress that AI and Machine Learning (ML) have made in many different application domains, these technologies are vulnerable to attacks and are subject to many risks for enterprises, that are present at all stages of the ML lifecycle.
The data-driven approach of AI brings unique security and privacy complexities at various stages of ML processes, going beyond traditional security and privacy risks prevalent in standard operational systems.
These security and privacy challenges include the potential for adversarial manipulation of training data, adversarial exploitation of model vulnerabilities to adversely affect the performance of the AI system, and even malicious manipulations, modifications, or mere interaction with models to exfiltrate sensitive information, including personal information present in the data, insights about the model's architecture, or confidential business information.
Addressing these issues early not only ensures compliance with data protection regulations but also avoids costly breaches and helps maintain user confidence, ultimately supporting long-term success in deploying AI solutions.
The present guide will focus on providing an overview for understanding the vulnerabilities and security risks related to LLMs and how to mitigate such risks.
1. Exploring some AI’s Weaknesses
Examining the vulnerabilities inherent in Machine Learning (ML) – that are often overlooked by existing security guidelines – is crucial for understanding how it differs from traditional software applications.
Reliance on Data
A model acts as a function that processes input (i.e., information provided by the user and passed to the system) through a set of rules to produce an output. Unlike traditional software where logic is crafted by developers, an ML model's logic is shaped by its training data.
But this dependence means the opportunity to manipulate the training data is also an opportunity to manipulate the system’s logic.
Even with reliable data sources, predicting how dataset nuances might unpredictably influence a model's response is challenging. This unpredictability can lead to unexpected behavior when faced with specific inputs, offering opportunities to exploit these weaknesses.
Inspection Challenges
We might think we can mitigate unpredictability by inspecting the model once it's been created, to determine its logic.
However, the essence of ML is its ability to autonomously learn and extract information from data, reducing the need for human oversight. This autonomy makes it difficult to interpret a model's actions or understand its internal logic.
It’s often challenging (or even impossible) to fully understand why it’s doing what it’s doing. Consequently, ML components often lack the extensive scrutiny given to standard systems, leading to overlooked vulnerabilities. The opaqueness of trained ML model logic compounds this issue.
The inability to verify models under all input conditions
In efforts to see how a model responds, observing its reactions to different inputs can be an approach. However, this method faces the challenge of not being able to verify the model's behavior across all possible inputs. Testing a model with diverse inputs provides limited insights, as the ultimate goal is to confirm its expected behavior under an exhaustive range of conditions. This becomes nearly unfeasible with models that have billions of parameters and an extensive array of potential inputs.
Assuming one overcomes these obstacles and gains a satisfactory understanding of the model’s behavior, ensuring its performance and security, the next step is deployment.
But here, security considerations become paramount because...
A model and its training data can be reversed engineered
Under specific circumstances, it's feasible to deduce enough about a model to replicate it, which is not only a loss in itself but also empowers attackers in devising further attacks.
In some cases, training data can be inferred or reconstructed through simple queries to a deployed model. If a model was trained on sensitive data, then this data may be leaked.
2. AI Security Risks
From hallucination, where the models generate content that diverges from real-world facts or user inputs, to adversarial attack exploiting vulnerabilities in LLMs by introducing malicious inputs that lead to undesirable outputs, the landscape of AI security is broad and continuously evolving.
For example, the recent advanced reasoning methods have introduced effective mechanisms, enabling models to iteratively build upon prior steps from basic chain-of-thought approaches to advanced reflection mechanisms and multi-step reasoning; such dependence on step-wise reasoning introducing a new type of vulnerabilities in LLMs where manipulation of initial reasoning steps can propagate errors, causing cascading failures throughout the reasoning chain (J.Peng et al., 2024).
This evolving nature presents a fresh array of challenges. Key questions arise, such as the reliability of the foundational model, its operational integrity, and its applicability to specific uses.
Hallucinations
While LLMs excel at generating innovative content, they also have the potential to produce inaccurate, inappropriate, or harmful outputs—a phenomenon widely known as hallucination. When such outputs are relied upon without proper validation, they can cause misinformation dissemination, communication breakdowns, legal challenges, and reputational harm.
Hallucinations in LLMs are broadly classified into two categories: factuality hallucinations and faithfulness hallucinations (Huang, 2023). Factuality hallucinations involve inconsistencies with external facts or the invention of false information, whereas faithfulness hallucinations arise from deviations from instructions, contextual misinterpretations, or logical errors, such as mistakes in mathematical reasoning.
Hallucinations can be due to errors and biases present in the pretraining data that can lead models to learn and even amplify these inaccuracies, or to the training stage, because of insufficient unidirectional representation, issues with the attention mechanism or alignment, etc.
Attacks
While developers of LLMs diligently work towards ensuring their models don’t produce harmful outputs through a process called “safety alignment,” the array of potential attacks on ML is broad, constantly evolving, and spans the entire ML lifecycle – from design and implementation to training, testing, and finally, to deployment in the real world.
These attacks vary in nature and strength, targeting not only the vulnerabilities in ML models but also the weaknesses of the infrastructure in which the AI systems are deployed.
With a prompt injection, attackers may wish to extract the system prompt or reveal training data. Users or researchers uncover new “jailbreaks” (i.e., techniques and prompts that can trick the model into bypassing its safeguards) and are increasingly finding ways to break the security of generative AI programs, such as ChatGPT, especially the process of "alignment", in which the programs are made to stay within guardrails. These methods often exploit the model’s susceptibility to certain linguistic manipulations and extend beyond conventional adversarial inputs (prefix injection, refusal suppression, style injection, etc.).
A critical aspect is the ease of hacking AI, as it often requires no advanced coding skills but can be done through natural language input. This example below shows how easily it can be to jailbreak a model.
Researchers built a model, dubbed Masterkey, to test and reveal potential security weaknesses in chatbots via a jailbreaking!
Attackers can also manipulate the training data for AI systems, thus making the AI system trained on it vulnerable to attacks. Scraping of training data from the Internet also opens up the possibility of data poisoning at scale by hackers to create vulnerabilities that allow for security breaches down the pipeline.
In its Taxonomy and Terminology of Attacks and Mitigations, the National Institute of Standards and Technology (NIST) considers major types of attack, notably evasion, poisoning, privacy and abuse attacks, and also classifies them according to multiple criteria such as the attacker’s goals and objectives, capabilities, and knowledge.
Evasion attacks, which occur after an AI system is deployed, attempt to alter an input to change how the system responds to it.
Examples would include adding markings to stop signs to make an autonomous vehicle misinterpret them as speed limit signs or creating confusing lane markings to make the vehicle veer off the road.
Poisoning attacks occur in the training phase by introducing corrupted data. Since foundation models are most effective when trained on large datasets, it has become common to scrape data from a wide range of public sources. This makes foundation models especially susceptible to poisoning attacks, in which an adversary controls a subset of the training data
A common instance is injecting frequent use of inappropriate language into a chatbot's training conversations, leading it to adopt these as normal language in interactions. Researchers have demonstrated that an attacker can induce targeted failures in models by arbitrarily poisoning only 0.001 % of uncurated web-scale training datasets.
Privacy attacks, which occur during deployment, are attempts to learn sensitive information about the AI or the data it was trained on in order to misuse it.
Attackers can engage a chatbot with a series of legitimate inquiries and analyze the responses to reverse engineer the model, identify its weaknesses, or infer its data sources
Abuse attacks involve the insertion of incorrect information into a source, such as a webpage or online document, that an AI then absorbs.
Unlike poisoning attacks, abuse attacks feed the AI incorrect data from a seemingly trustworthy but compromised source, effectively misguiding the intended use of the AI system
Data Extraction. Underlying many of the security vulnerabilities in LLMs and RAG applications is the fact that data and instructions are not provided in separate channels to the LLM, which allows attackers to use data channels to conduct inference-time attacks that are similar to decades-old SQL injection.
It's important to note that the aforementioned AI security risks are not exhaustive, as there are other potential threats, including violations of model availability. Such disruptions in service can occur when an attacker uses specially crafted inputs to excessively burden the model's computational capabilities or floods the system with a high volume of inputs, resulting in a denial of service and preventing access for legitimate users.
Researchers from MIT published the “AI Risk Repository,” a repository that includes a publicly accessible database with over 700 AI-related risks, which will be expanded and updated to ensure it remains relevant.
Their AI Risk Database provides a comprehensive overview of the AI risk landscape, which is focused on specific risks types to support targeted mitigation, research, and policy development.
It contains detailed records of AI-related risks extracted from a variety of sources, categorized into high-level and mid-level taxonomies:
- The high-level Causal Taxonomy includes attributes such as the entity responsible for the risk (human, AI, or other), the intent (intentional, unintentional, or other), and the timing (pre-deployment, post-deployment, or other);
- The mid-level Domain Taxonomy categorizes risks into 23 specific domains like discrimination, misinformation, malicious use, and human-computer interaction issues.
By making this database publicly accessible, they hope not only to raise awareness of potential AI risks but also to encourage closer collaboration among the various stakeholders involved, whether they be developers, regulators, or everyday users.
The goal is to create a common reference framework that facilitates discussion, research, and the implementation of appropriate preventive measures, thereby enabling a safer and more ethical adoption of AI.
3. Mitigating AI Risks
Mitigation of AI risks mainly depends of the AI system and its stage . Howver, common pratices can limit AI risks and help organizations to detect, identify, and prevent the spread of such risks.
Data management
As AI systems process increasingly data, either at the training or inference stage, the risk of data breaches grows.
At the training stage, as AI models are vulnerable to data privacy attacks, where private information that was used in training can be extracted from the model by malicious users, it is crucial to organizations to curate and delete any personal or confidential data from data sets used for training. Data must also be meticulously scrubbed to remove harmful, biased, fake, or inappropriate content that could otherwise be inadvertently included in training. Annotating and rating the quality of acquired data ensures clarity and guides its appropriate use. MLOps teams should also validate the origins of data sources and ensure alignment between training data and model outputs to prevent inconsistencies.
At the inference stage (i.e., organizations that use internal data as inputs in prompts for GenAI, e.g. through a RAG pipeline), effective data management is crucial in mitigating the risks of potential data loss and safeguarding corporate security.
This includes implementing secure data pipeline but also strong authentication protocols and managing access controls rigorously to ensure end-users access only the information they are authorized to view or perform. Role-based access (RBAC) ensures users can only interact with the data or systems necessary for their tasks, reducing the attack surface. For instance, HR or marketing teams should not have access to sensitive model code, just as developers should not have everyday access to employee or customer data. To further safeguard against vulnerabilities, dynamic and flexible monitoring systems should be employed to identify, log, and block exploits.
Additionally, it's important to regularly review and update data management policies to adapt to new threats and ensure compliance with data protection regulations.
Enterprise data and third-party model APIs
This risk of leaking confidential information to external LLM providers is specific to using external model APIs, e.g., an employee might copy the company’s secret or a user’s private information into a prompt and send it to wherever the model is hosted.
While a third-party API might work well for some enterprises (or some use cases), it’s a non-starter for businesses constrained by strict policies of business data, such as financial institutions or innovative technology companies, which need to protect their intellectual property from leakage. These enterprises can’t easily supply their data to third parties, no matter how capable the model is.
There is always a risk of exposure and compromises whenever employees, using ChatGPT or other AI tools, send data out of the organization.
This is why any enterprise should caution employees against using public chatbots for sensitive information. All information typed into AI tools might be stored and used to continue training the model.
For illustration, Amazon warns employees not to share confidential information with ChatGPT after seeing cases where its answer “closely matches” existing material from inside the company. Samsung Electronics Co also banned the use of popular AI tools like ChatGPT by their employees after discovering staff uploaded sensitive code to the platform, dealing a setback to the spread of such technology in the workplace. Apple and JPMorgan also ban the use of public AI tools.
In the case you allow your workforce to use public models, you should at least craft a usage policy, by indicating for example that they cannot put any identifiable or sensitive data and to turn off their history when using external tools that enable that choice.
Another complication with third-party models is ownership. AI service providers retain model ownership and access to the domain-tuned parameters, creating a host of concerns around vendor lock-in and future usage.
Self-hosted models or trusted, secure, and robust AI tools should be prioritized.
Avoid relying on third-party AI tools that haven’t been thoroughly vetted for security risks and rely on AI tools that handle data according to data protection laws and ensure data integrity, sovereignty, security, and confidentiality. Conduct due diligence on the AI providers, understanding their security protocols, and ensuring that their tools are up to date with the latest security patches and features.
Address input- and output-level risks
It can be necessary to implement proper safeguards and guardrails at the input and output levels. Without them, it might be hard to ensure that the model will respond properly to adversarial inputs and will be protected from efforts to circumvent content policies and safeguard measures.
Without implementation of input guardrails, even advanced models can potentially be manipulated to generate harmful or misleading outputs. While some might find it funny to get some models to make controversial statements, it’s much less fun if a branded AI customer support chatbot does the same thing.
A car dealership added a chatbot to its website and learned the hard way that it should have integrated better guardrails.
Inpute safeguards include for example direct filtering and engineering of the inputs. These approaches include prompt filters (which intend to filter, block, and hard code responses for some inputs until the model can respond in the intended way) and prompt engineering (direct modifications of the user inputs). Modifications may be done in a variety of ways, such as with automated identification and categorization, assistance of the LLM itself, or rules engines.
Output guardrails can be used for:
- quality measurement, by detecting and filtering the generated output of models, e.g., empty or mal formatted responses that need to be completed, non ethical answers that needs to be excluded or, hallucination detection, which is an active area of research, etc.; or
- failure management, to specify the policy to deal with different failure modes.
The downside of this added security is that it can slow down the AI system (added steps and inferences).
Examples: 20 Guardrails to Secure LLM Applications 🔒
Educating and training employees
It's crucial to ensure that all employees are thoroughly educated about the potential risks associated with using AI.
Regular training sessions should be conducted to keep them updated on the latest security practices and aware of how to recognize and handle potential threats.
This education should also include understanding how AI and GenAI work, what their limitations are, and the importance of maintaining ethical standards while using these tools.
Ensuring Safe Usage of AI outputs:
Establishing clear guidelines, protocols, and policies is essential for safely managing content produced by AI and GenAI.
This process includes:
- setting explicit boundaries on acceptable content types, considering the business's nature and stakeholder impact,
- implementing thorough review and approval procedures for AI-generated content, involving both automated screening and human review, particularly for sensitive material,
- labeling and warning mechanisms related to AI outputs;
- Regular monitoring and audits of AI outputs to swiftly identify and mitigate any inappropriate content;
- Training for employees in handling AI outputs and continuous refinement of safety protocols through feedback;
- Transparency with end-users about AI's role in content creation (e.g., watermarking AI output can be implemnted as a mechanism to ensure transparency) to buld trust and ensuring the responsible use of AI in the enterprise.
Maintain human oversight of AI applications
Human oversight can help catch errors, prevent bias, and ensure your brand voice and values are recognizable in your content. Combine AI with human oversight to ensure automated decisions are monitored and editable, and can be overridden when necessary.
While the practices outlined above represent the minimum for mitigating AI risks, organisations must adopt a broader, more dynamic approach to ensure comprehensive security. Experimenting with generative AI amplifies the need to contextualise security within this evolving landscape, recognising not only the fundamental risks but also the unique vulnerabilities these systems introduce.
A tailored, layered security framework is essential. This means adopting a holistic perspective that evaluates vulnerabilities across the entire AI lifecycle and implements mitigation strategies specific to the context. Businesses should develop internal structures that align with their unique operational needs. For example, enterprises running on-premises solutions may prioritise safeguards against physical or network-based intrusions.
The continuous evolution of AI will likely bring forth new breakthroughs and unforeseen challenges.
It is crucial for us to anticipate and prepare for these changes, fostering a culture of continuous learning and adaptation.
Are you looking to implement a secure AI in your organization?
Lampi’s commitment is based on building an effective, reliable generative AI platform that enables customers’ complete ownership of models and complete control of data, with enterprise-grade security measures and adherence to industry best practices.
Contact us to request a demo.