Attacking Evil Regex: Understanding Regular Expression Denial of Service Attacks (ReDoS)
Photo by Caozzy Lin on Unsplash
Regex is great! It’s also hard to evaluate.
A bad regex pattern can lead to low performance or even erroneous results. But can it also lead to vulnerabilities?
Poorly designed regex patterns are actually a big source of vulnerabilities in modern web applications. They can lead to failed input validation, leaky firewalls, and even denial of service attacks. Today, let’s talk about how hackers exploit regex evaluators to launch DoS attacks.
What is ReDoS?
ReDoS, which stands for Regular expression Denial of Service, is a type of denial of service attack.
Denial of Service Attacks (DoS)
A denial of service attack is when an attacker makes an online service slow down or become unavailable to its users. This type of attack is widespread on the modern Internet and can lead to financial losses and damages to a business’s reputation. There are many different types of DoS, one of them being ReDoS.
During a ReDoS attack, a hacker produces a denial of service by providing a regex engine with a string that takes a long time to evaluate. Hackers do this by exploiting so-called "evil regex patterns."
Stuck in Crafted Input
Most regex evaluators have an exponential-time worst-case complexity. This means that the time it takes to evaluate a particular string grows exponentially in relation to the input string size.
Attackers can provide a crafted input that forces the evaluator’s worst-case time complexity and causes a ReDoS. When a hacker hits an application with a ReDoS, the application server can hang for a long time and become unavailable to legitimate users.
Typically, regex patterns that can cause an application to get stuck in evaluation have two characteristics. First, they often include the repetition of complex subexpressions (the use of "+" and "*" on complex subexpressions). And second, within these repeated subexpressions, there are additional repetition symbols and expressions that match a suffix of another match. Sound confusing? No worries, we’ll go into examples later in this post!
Evil Regexes
As I mentioned above, exploitable regex patterns are called “evil regexes”. Here’s an example of an evil regex:
^(a+)+$
The subexpression in the parenthesis searches for "a" and its repetition. It also checks if the entire expression is a repetition of any of the subexpressions! Let’s say an attacker supplies an input such as this one:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
In the string above, each additional "a" will cause the regex evaluator to check for the repetitions of an additional subexpression, thus doubling the amount of processing time for the regex evaluator.
Note that attackers can exploit a regex if they can exploit any subexpression of it. For example, an attacker can just as easily attack this regex as the one above:
^(a+)+[a-zA-Z]$
Evil Regexes in the Wild
So where do these evil regexes typically occur in an application?
Sometimes, applications let users inject their own regex patterns for the server to execute. In this case, the attacker can inject an evil regex of her choice and then submit string input that would cause the evil regex to run for a long time.
One example of this is when an application checks if a user’s password contains their username during registration. When a user’s password contains their username, the app would not allow the user to use that password. If the application blindly concatenates the username with a regular expression without sanitizing regex operators, the malicious user can inject evil regex via the username field.
For example, the following function is vulnerable to evil regex injection:
check_password(username, password):
pattern = username
if regex.match(pattern, password):
print("Password contains username. Please provide another password.")
else:
print("Registration complete!")
In this case, if an attacker provides an evil regex pattern as the username and a crafted attack string as the password, she can cause the regex function to run for a long time and hang the server.
username: ^(a+)+$
password: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
Preventing ReDoS
ReDoS has the potential to impact multiple layers of a corporation’s infrastructure. For example, an attacker can launch a client-side ReDoS to hang a specific user’s browser, or use ReDoS to disable a web application firewall, or even down an application server directly. Fortunately, there are many ways that an application can prevent ReDoS from happening.
First is multithreading and preemptive process killing. If an application is multithreaded, it can process other requests at the same time, even if it is taking too long to evaluate a regex expression. You should also halt the execution of any regex evaluation if it is taking longer than expected.
The second is to avoid exposing regex patterns online. Sometimes applications publish their regex patterns because the project is open-sourced, or accidentally expose them because they use the same patterns in both client-side code and server-side code. This makes it easier for attackers to find evil regexes and craft their attack strings accordingly.
Ideally, you should avoid using dangerous regex expressions to evaluate user input altogether. Avoid writing regex patterns that fit the above 'evil regex" criteria and rewrite them as simpler expressions instead. You can also try to find validated and secured regex patterns online instead of writing your own.
Lastly, you should remove the possibility for users to provide their own regex patterns to the evaluator. Sometimes applications provide users with the ability to execute arbitrary regex patterns. In this case, attackers can simply inject a dangerous regex pattern to make the system vulnerable. You can avoid this issue if you prevent users from executing arbitrary regexes. If you need this kind of functionality, use preformatted regexes that allow for minimal customization.