Why Is It So Hard To Prevent Open Redirects?

Vickie Li

In my last post, we talked about how open redirects can allow attackers to steal tokens from OAuth systems. Today, let’s take a deeper dive into open redirects and explore why it’s so prevalent in web applications! Sites often have HTTP or URL parameters that cause the web application to redirect to a specified URL without any user action. Open redirects are a type of vulnerability that happens when an attacker can manipulate the value of this parameter and cause users to be redirected offsite. A common scenario is when a website redirects users to their original location after login. When a user visits their dashboard at “https://example.com/dashboard” but is not logged in, the application redirects them to the login page. Then, it will redirect the user to their dashboard located at https://example.com/dashboard after the login.


During an open redirect attack, users are unintentionally redirected to an external site.


Another common open redirect technique is the referer-based open redirect. Some sites will redirect to the Referer automatically after certain user actions, like login or logout. In this case, attackers can set the Referer header of the request by making the victim visit the target site from an attacker site.

On the attacker's site:
a href="https://example.com/login">Click here to login to example.com</a>

How Do Sites Prevent Open Redirects?

Sites prevent open redirects by validating the URL used to redirect the user. So, the root cause of open redirects is failed URL validation. But why is that so difficult to get right? Here, you can see the components of a URI. How the browser redirects the user depends on how the browser differentiates between these URI components. The job of a URL validator is to accurately predict how the browser will redirect the user, and reject URLs that will result in a redirect offsite.


The most common way URL validators determine redirect URL validity is with a whitelist. They will check the “hostname” portion of the URI to make sure that it matches a determined list of allowed hosts. That sounds straightforward, but the reality is that parsing and decoding a URI is difficult to get right, so validators often have a hard time determining the hostname portion of the URI.

Why Do Open Redirects Still Happen?

So how do URL parsing issues lead to open redirects?

URL Decoding Ambiguities

One source of bugs is the inconsistency between how validators and browsers decode non-ASCII characters in URLs. For example, how should this URL be decoded? Let’s say that this URL has passed URL validation and the validator has determined that “example.com” is the domain name.


Several scenarios could happen. The first one happens when browsers decode non-ASCII characters into question marks.


In this case, “example.com” becomes part of the URL query, not the hostname. And the browser would navigate to “attacker.com” instead. Another behavior seen in browsers is that they will try to decode non-ASCII characters along with surrounding characters. Then, many things could happen. For example, the final destination of the browser could become this URL.


In this case, attackers can register “comample.com” and achieve an open redirect. Another common solution is that browsers will attempt to find a “most alike” character. For example, if the character “�” appears in a URL like this, the validator might determine that the hostname is “example.com”.


But the browser attempts to normalize the URL by converting the special character into a question mark, making “attacker.com” the hostname.


Still, many issues arise when the validator and the browser decode the URL a different number of times. Take this URL for example. “%252f” is the URL double encoded version of “/”.


If the validator does not double decode this URL, the URL will be interpreted as this one, making “example.com” the hostname, and “attacker.com%2f” the username.


On the other hand, if the browser does double decode the URL, the URL will become:


“@example.com” becomes the path portion of the URL and the browser will navigate to attacker.com.

Slash Issues

Most browsers and validators will treat backward slashes as path indicators. However, if either the browser or validator does not implement this, the inconsistency could lead to bugs. For example, this URL is potentially problematic.


If the validator treats the backward slash as a path separator, it will interpret the URL as:


On the other hand, if the browser does not recognize the backward slash as a path separator, it would interpret the hostname to be “example.com\attacker.com”. With “example” being the subdomain name and “com\attacker.com” the base domain name.

Flawed Validator Logic

Another common attack is when attackers can exploit the loopholes of the validator’s logic. For example, to prevent an attack like this from succeeding, the validator might only accept URLs that end with a domain on the whitelist.


And to prevent an attack like this one, the validator determines that URLs must also start with the whitelisted values.


However, both of these rules can be overcome by this URL.


Custom-built URL validators are prone to attacks like these when developers do not consider all edge cases.

Combining Exploit Techniques

Finally, to defeat more sophisticated URL validators, an attacker can combine multiple of the above vectors to defeat layered defenses.


Even More Attack Vectors

There are many more methods attackers use to defeat URL validators. This post provides an overview of the most common ones. Preventing open redirect on sites means that URL validation needs to be done right on every redirect endpoint, which is why it is so hard to get right.

Vickie Li
Investigator of Nerdy Stuff

Vickie Li is a professional investigator of nerdy stuff, with a primary focus on web security. She began her career as a web developer and fell in love with security in the process. Now, she spends her days hunting for vulnerabilities, writing, and blogging about her adventures hacking the web.