How to Reduce the Risk of Using External AI Models in Your SDLC

Understand how AI models add risk and how to address it.

In the rapidly evolving landscape of artificial intelligence (AI), many organizations are turning to third-party AI models to enhance their software development processes. These models, developed externally, offer many benefits, from accelerating development timelines to introducing cutting-edge functionalities without needing in-house expertise. However, integrating third-party AI models into your organization's software development cycle comes with challenges and risks, giving birth to a new domain – AI Supply Chain Security.

Several security researchers have recently discovered the potential damage of AI supply chain attacks, such as “AI-Jacking” a novel attack path found by Legit Security research team, and the discovery of malicious models in Hugging Face by the JFrog security research team.

In this blog post, we'll delve into the potential pitfalls and considerations businesses must navigate when leveraging external AI models.

Third-Party Model Risks

Malicious models

The potential for integrating malicious models poses a significant threat when using third-party AI components. These models can be deliberately designed to perform harmful actions, such as stealing sensitive data, corrupting systems, or subtly manipulating outputs.

Security vulnerabilities

One of the primary concerns with integrating third-party AI models is the introduction of security vulnerabilities. External models may not adhere to the rigorous security standards your organization maintains, potentially exposing your systems to data breaches and other cyber threats.

Intellectual property and licensing risks

The integration of third-party AI models introduces complexities around intellectual property (IP) rights and licensing. Misunderstanding or violating the terms of use can lead to legal challenges and financial liabilities.

The model’s reputation

The reputation and level of community adoption of third-party AI models serve as critical indicators of their reliability, effectiveness, and safety. Models with a low number of downloads, likes, or limited activity on their repositories may signal various concerns, from lack of effectiveness to potential security risks. This inspection is especially important in environments handling sensitive data or requiring high reliability. Recent examples from the open-source ecosystem show that poorly maintained projects don’t address security issues promptly and are prone to attack.

User and organization reputation

Models uploaded by well-known organizations or users with a history of contributing reliable models are generally more trustworthy. Conversely, using models from sources with questionable practices or a lack of transparency can lead to significant risks, including legal liabilities and damage to your organization’s brand. Thoroughly evaluate the reputation of both the users endorsing the model and the organizations producing them.

Model storage

The method of storing and managing third-party AI models also presents a unique set of risks. One of the most popular serialization formats in machine learning is Pickle, and there are dangerous arbitrary code execution attacks that can be perpetrated when you load a Pickle file.

Training data

Information about the dataset the model was trained on is essential to understand its potential biases and the contexts in which it performs best. This can also help you determine if the model is suitable for your specific use case.

Recommendations

Start with visibility

Organizations must start by finding where, why, and how they use external AI services. Which applications developed in my organization are employing GenAI? Which models are we downloading from community marketplaces like Hugging Face? In which repositories does it happen, and which of them are business-critical? Only after gaining complete visibility into your AI development, including using third-party models, will you be able to tackle risks that revolve around them.

Scan models for vulnerabilities

Before adoption of external models, it's important to conduct thorough security assessments of the AI models and ensure they are safe to use. Make sure they are periodically scanned for vulnerabilities and malicious code. Hugging Face, which is the most popular model hub, runs scans internally via the open-source antivirus ClamAV, and internally developed Pickle import scans. Note that detection of malicious files will NOT lead to the removal of the model by Hugging Face. This only leads to a warning message on the model’s card. Therefore, it is crucial to keep track of any unsafe files found within the model.

Comply with model license

Organizations must carefully review licensing agreements, ensuring that they have the rights to use the AI models in their intended manner and that they understand any restrictions or obligations. If you discover you’re using a model with a restrictive license that does not coincide with its actual usage, replace it.

Keep your eyes on reputation

Thoroughly evaluate the reputation of the models, the users endorsing them, and the organizations producing them. Either stop using low-reputation models and replace them with popular alternatives, or conduct a comprehensive analysis to make sure they are not malicious.

Use Safetensors

Instead of using Pickles, use Hugging Face's Safetensors, which aim to enhance security and privacy by providing a safe serialization format, ensuring that data cannot be maliciously manipulated or inadvertently exposed.

Conclusion

The integration of third-party AI models into software development offers immense potential but comes with a landscape of risks that organizations must navigate carefully. Thorough due diligence, ongoing monitoring, and adherence to best practices are essential steps in harnessing the power of external AI models while safeguarding your organization from AI supply chain attacks.

Need help in understanding where AI models are being used throughout your SDLC and which risks they introduce to your organization? Contact us today!