Dangerous Regular Expressions

Vickie Li

wrong way sign

Photo by Milan De Clercq on Unsplash

In this post, I will talk about how regex is used in a security context, what can go wrong when regexes are not well composed, and some best practices to follow when using regex as a security measure.

Regex For Security

Regular expressions play a big role in the security world. They are used as a security measure across multiple layers of a corporation’s infrastructure. Below are a few of the most common use cases of regex for security.

Firewall rules

Developers use regex to fine-tune how firewalls behave. For example, you can use regex to create rules to block requests to certain file types, from certain IP addresses, or from certain user-agents that are known to be malicious.

User input validation

Another important use case for regex patterns is validating user input. When an application accepts user input, it opens its doors to a wide range of potential vulnerabilities, like XSS, open redirect, and SQL injection. Regex is used to filter and sanitize user input as a defense mechanism against these attacks.

Malware detection

Lastly, regex is often used to customize the behavior of malware detectors. System administrators can use regex rules to detect potentially dangerous content in files and to quarantine these files accordingly.

Faulty Regexes

Since regex is so prevalent as a security measure, incorrectly deployed regex patterns have the potential to impact many different aspects of a system. So, what can go wrong with these regex patterns? Faulty regex patterns that lead to vulnerabilities are often patterns that fail to consider one or multiple edge cases. This happens a lot in public-facing web applications and leads to a significant number of newly discovered vulnerabilities. Defending a system is a lot harder than attacking it. Often, all an attacker needs to compromise an application is to find a single user input that is incorrectly validated!

Web vulnerabilities caused by faulty regexes

In web applications, regexes are often used to filter and sanitize potentially malicious user input. When these regexes are composed incorrectly, the protection fails and gives hackers a chance to attack the application. Here are a few real-life vulnerabilities that are caused by faulty regexes (the name of the websites are replaced with “examplesite.com”).

SSRF protection via a blacklist

SSRF, or Server Side Request Forgery, is a vulnerability that happens when an attacker is able to send requests on behalf of a server. It allows attackers to “forge” the request signatures of the vulnerable server, therefore assuming a privileged position on a network, bypassing firewall controls, and gaining access to internal services. You can learn more about SSRFs here: Intro to SSRF. examplesite.com allows users to load content from external domains via the “img” URL parameter. A legitimate image request would look like this:

examplesite.com/load?img=https://images.com/puppies.png

The website prevents SSRF by rejecting img parameters that contain certain URLs in a blacklist. The regex used looks like this:

^http?://(127\.|10\.|192\.168\.).*$

This regex pattern checks all user input against a blacklist of local IP addresses and rejects the request if they match. The problem is that the website fails to consider another possible case of a local IP address: “0.0.0.0”, which can be used to refer to the local machine. So, the protection can be bypassed by using the request:

examplesite.com/load?img=https://0.0.0.0

Open redirect filter bypass

Here’s another example of a faulty regex leading to a vulnerability. The application protects against open redirect by requiring URLs to fit the following two criteria: - The URL must contain “examplesite.com/”, - and it should end with an image related extension, such as jpg and png. The faulty regex looks like this:

^.*examplesite\.com\/.*(jpg|jpeg|png)$

The issue with this regex is that it is too permissive and allows for too much flexibility in user input (the two .* in the pattern will match with any number of any character). This open redirect filter can be bypassed by a URL like this:

https://attackersite.com?examplesite.com/abc.png

Regex Safety Best Practices

So, how can developers prevent these mistakes from happening? Regex safety is hard. It is difficult to consider all the cases you’ll need to check for, and you never know what creative ideas hackers are going to come up with! But, it is possible to minimize the potential for attack by following a few regex best practices.

Be strict!

First, be strict when validating user input. When in doubt, use a whitelist instead of a blacklist when filtering for file types, IP addresses, user-agents, and more. Reduce unnecessary flexibility for predictable user input. For example, when the user is asked to input their age, only numerics should be allowed, and the number should not be too large. The length of any user input should also be checked. Strict input validation practices like this might seem like overkill, but it saves you the worry of a variety of potential vulnerabilities.

Don’t publish regex patterns

The second tip is to avoid exposing regex patterns online. Sometimes applications publish their regex patterns because the project is open-sourced, or accidentally expose them because they use the same patterns in both client-side code and server-side code. This makes it easier for attackers to find security holes in the regex pattern and exploit them.

Use validated patterns

Ideally, you should avoid writing your own regex patterns for common use-cases (like username, password validation, and comment boxes). Instead, find validated and secured regex patterns online. These patterns have been vetted and have stood the test of time, so they are often better than custom written regex patterns.

Defense-in-depth

In addition to using safe regex, employ defense-in-depth measures. Defense-in-depth means that you do not use a single protection mechanism and instead use multiple layers of protection to prevent attacks. For example, in addition to rigorous input validation, you can use prepared-statements, the principle of least privileges, and hashed passwords to minimize the impact of a potential SQL injection.

Fuzz testing

Finally, rigorously test your application by supplying it with illegal and unexpected inputs to verify that your regexes are doing their jobs.

Regex Security Resources

Here are a few resources to help you secure your regex patterns.

OWASP Validation Regex Repository

The OWASP Validation Regex Repository is a database of validated and tested regex patterns that you can use. Here, you can find a variety of patterns that could be used to validate usernames, emails, IPs, credit card numbers, and more. Using these regex patterns is a good idea as they are strict validation patterns that don’t allow for most potentially dangerous inputs. Additionally, if you can’t find the patterns you need in the repository, search for them in here:

Regular Expression Library

The Regular Expression Library is an even larger database of already written regex patterns that you can use.

If you need to write your own patterns, consult the OWASP input validation cheatsheet for a few things that you need to consider to make sure that your regexes are safe.

Input Validation

Lastly, remember to always test your regexes against illegal input, regardless of where your patterns come from!

Vickie Li
Investigator of Nerdy Stuff

Vickie Li is a professional investigator of nerdy stuff, with a primary focus on web security. She began her career as a web developer and fell in love with security in the process. Now, she spends her days hunting for vulnerabilities, writing, and blogging about her adventures hacking the web.