Secrets are any data that is sensitive to an organization or person and should not be exposed publicly. It can be a password, an access key, an API token, a credit card number, and more. You can read more about the dangers of secrets getting exposed via your source code management (SCM) systems here. But SCMs are not the only services from which secrets can get leaked. Essentially, any service you’re using as part of your software development lifecycle (SDLC) in which data is being stored may be the source of secrets leakage. Our research team investigated how sensitive information can get exposed via AppSec tools that you may use as part of your development pipeline, and in this blog post, we will demonstrate the SonarQube study case.
When your code scanner becomes your code exposer
SonarQube is an open-source SAST platform for managing code quality, providing continuous code inspection and code analysis to identify bugs, vulnerabilities, and code smells in source code (which are characteristic or indicators that possibly indicate a deeper problem) written in various programming languages. It does so by integrating with your SCM and scanning your entire code base.
Ironically, when misconfigured, this kind of code scanner can transform from an application security tool to a risk-imposing tool, which attackers can use to harvest sensitive information.
Publicly exposed SonarQube instances that don’t restrict anonymous access are insecure because they allow anyone with an internet connection to access the instance, including private code, detected issues, and other sensitive information. This could potentially lead to several security risks, such as data breaches, code theft, and, most notably - leaked secrets in code, eventually leading to a broader supply chain attack on the organization.
Additionally, the mere fact that a SonarQube instance is publicly accessible, even if it doesn’t allow anonymous access to all its resources, makes it vulnerable to various types of known CVEs and zero-day vulnerabilities, including authentication bypass and remote code execution attacks - which in turn will result with its resources getting exposed. It requires the user to constantly patch the system for CVEs as soon as they are discovered.
It’s important to note that we think SonarQube is a great tool. We have no problem with SonarQube itself. The problem is with the way it is being set up, and our goal is to help its users understand the risk of doing it wrong.
Scanning public SonarQube instances
The Legit Security Research Team recently looked at public SonarQube instances using tools that allow users to find specific types of internet-connected devices and systems.
First, we extract a list of IPs of SonarQube instances. Given this list, we request the SonarQube API to retrieve the version and verify that these are actual SonarQube instances. From these servers, we used the components search API to identify whether there are components - which is a general name for projects being used - within this server so that we can extract data. Each server with components within was further investigated as we downloaded each of its components' code. This could potentially allow an attacker to extract sensitive information, such as private code and other confidential data from the organization. Given an old version of SonarQube, it might be vulnerable, and exposing it publicly makes it exponentially riskier.
Our investigation revealed that out of over 2200 public SonarQube instances in the wild, 80~ were accessible but had empty data (possibly not using SonarQube limited access for anonymous users), and nearly 200 of them allowed anonymous access, exposing private source code. Scanning this code for secrets using a secret scanner reveals multiple types of secrets, e.g., API tokens, keys, AWS tokens, and sensitive information of people - customers/employees. We have conducted a responsible disclosure process with the organizations that were found to be exposing this sensitive information - probably without their awareness that this was happening.
We collected multiple types of secrets. The distribution of secrets shows the vast majority are generic API keys (allowing access to services such as internal assets used in the development process or production environment). Given the source code, an attacker can understand the usage of these API keys, allowing them to extract more data or even manipulate it. It’s worth noting that there were also a high number AWS access tokens, which, in most occurrences, allow access to an organization’s cloud assets (e.g. S3 bucket) and extract more confidential information. Another dangerous secret type getting leaked was Stripe - a financial infrastructure. Stripe API tokens might allow malicious actors to steal financial resources from an organization. Other types of secrets that were detected include PKCS, RSA, SendGrid API token, and Alibaba access key id, etc.
Risks of Exposed Secrets
There were multiple risks we addressed in this article:
1. Having the SonarQube server publicly accessible.
It allows malicious actors to find a lead for attacking your organization, and when misconfigured - it can be extremely easy to attack your organization.
2. Having anonymous access to SonarQube, which includes the organization’s code. It allows:
- Code theft, including using secrets in code
- Anonymous access may enable attackers to hide their identity and activity, making it more difficult for organizations to detect and respond to attacks.
3. Having secrets in code, and allowing lateral movement.
When all three risks are combined, it is enough to allow a malicious actor to attack your organization.
Mitigations
To mitigate these risks, it is recommended:
1. Restricting access to the SonarQube instance
SonarQube servers scanning private code should generally not be public - having public access to SonarQube may cause public access to the private code, using social engineering for credentials theft, enumerating on popular passwords and even attacking the server if it’s not updated/using a 0-day attack. Generally speaking, public and private should be separated, and when using private code, every instance having access to it should most likely be on a private network.
2. Use appropriate authentication and authorization mechanisms.
Anonymous access to private organization code is highly discouraged, and having access to the SonarQube server, is even worse - as long as having the code, you can see a result of a code scanner and have more information about the code. It is important for organizations to carefully consider the security implications of anonymous access to these types of tools and to put appropriate safeguards in place to protect against potential attacks. This may include implementing strong authentication and access controls, monitoring suspicious activity, and regularly updating and patching the tools to fix vulnerabilities.
3. Secrets in code are highly discouraged
Leading to Lateral movement threats, and can be remediated by using scanners and storing Secrets on encrypted storage with specified authorization.
Including the secrets in the code leads to breaches, which can be remediated using scanners and storing secrets on encrypted storage with specified authorization. For more information on that topic, you can check out our blog.
Don’t Let Secrets Leak Out of SDLC Tools!
To sum up, misconfigurations in an SDLC service, such as SonarQube, allowing anonymous public access, make your organization vulnerable to multiple threats including code theft which eventually can result in secrets getting leaked. Secret theft can cause severe implications, including access to customers’ private data, access to critical production services, and supply chain attacks. It’s important to keep your SDLC assets safe and your code base free from hardcoded secrets.
Legit can help
The Legit Security Platform provides advanced secret detection capabilities that will help you prevent exposing sensitive information, in addition to a wide range of additional security capabilities and benefits. One of those additional capabilities is to alert you to dangerous misconfigurations of SDLC systems and infrastructure across your pre-production development environment. To learn more, feel free to contact us!