Still Scanning IP Addresses? You’re Doing it Wrong
The traditional approach to a vulnerability scan or penetration test is to find the IP addresses that you want tested, throw them in and kick things off.
But doing a test based purely on IP addresses is a BAD IDEA and can often MISS THINGS. The reason being that some protocols, most notably HTTP behave differently depending on how you address them.
Let’s take the Microsoft website as an example. If you perform a DNS lookup of www.microsoft.com you typically see 6 IP addresses, as below:
$ host www.microsoft.com
www.microsoft.com is an alias for www.microsoft.com-c-3.edgekey.net.
www.microsoft.com-c-3.edgekey.net is an alias for www.microsoft.com-c-3.edgekey.net.globalredir. akadns.net.
www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net is an alias for e13678.dspb.akamaiedge.net.
e13678.dspb.akamaiedge.net has address 104.124.13.124
e13678.dspb.akamaiedge.net has IPv6 address 2600:1407:1800:492::356e
e13678.dspb.akamaiedge.net has IPv6 address 2600:1407:1800:485::356e
e13678.dspb.akamaiedge.net has IPv6 address 2600:1407:1800:487::356e
e13678.dspb.akamaiedge.net has IPv6 address 2600:1407:1800:48f::356e
e13678.dspb.akamaiedge.net has IPv6 address 2600:1407:1800:491::356e
So if you open up a browser, and put in http://www.microsoft.com your browser will look up the IP addresses, choose one of the 6 available, connect to it and request the page. Your browser will then show the Microsoft website. If you were to monitor your network traffic, you would see communication between your machine and the addresses above.
But what if you put http://104.124.13.124 into the browser instead? It’s the same address, so it should display the same content right? Well, no…
You get an error rather than the Microsoft site. So what’s happening?
Well your browser is using version 1.1 or later of the HTTP protocol, and one of the features of HTTP/1.1 is virtual hosting. This allows a single IP address to host multiple sites, which is especially important with the legacy IPv4 protocol as there are a severe shortage of addresses. If every website required its own unique address, then the shortage would be even worse.
The way this works is that after establishing a connection, your browser sends a header called “Host” to the server, telling it which site it wants. The server then knows which website you want, out of potentially thousands it might be hosting, and serves you the correct one. Your browser uses the host portion of the URL to determine what to put in the Host: header, so if you visit http://www.microsoft.com then the host header will be “www.microsoft.com”, but when you visit http://104.124.13.124 the host header will be “104.124.13.124”. We can see this in more detail when using the command line tool “curl” which shows exactly the requests and responses:
When requesting http://www.microsoft.com with curl:
$ curl -v http://www.microsoft.com
* Rebuilt URL to: http://www.microsoft.com/
* Trying 104.124.13.124…
* TCP_NODELAY set
* Trying 2600:1407:16:38d::356e…
* TCP_NODELAY set
* Connected to www.microsoft.com (104.124.13.124) port 80 (#0)
> GET / HTTP/1.1
> Host: www.microsoft.com
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Accept-Ranges: bytes
< Content-Type: text/html
< ETag: “6082151bd56ea922e1357f5896a90d0a:1425454794”
< Last-Modified: Wed, 04 Mar 2015 07:39:54 GMT
< Server: AkamaiNetStorage
< Content-Length: 1020
< Date: Wed, 10 Jun 2020 05:46:48 GMT
< Connection: keep-alive
<
<html><head><title>Microsoft Corporation</title><meta http-equiv=”X-UA-Compatible” content=”IE=EmulateIE7”></meta><meta http-equiv=”Content-Type” content=”text/html; charset=utf-8”> </meta><meta name=”SearchTitle” content=”Microsoft.com” scheme=””></meta><meta name=” Descrip tion” content=”Get product information, support, and news from Microsoft.” scheme=””></meta><meta name=” Title” content=”Microsoft.com Home Page” scheme=””></meta><meta name=”Keywords” content=”Microsoft, product, support, help, training, Office, Windows, software, download, trial, preview, demo, business, security, update, free, computer, PC, server, search, download, install, news” scheme=””></meta><meta name=” SearchDescription” content=”Microsoft.com Homepage” scheme=””></meta></head><body><p>Your current User-Agent string appears to be from an automated process, if this is incorrect, please click this link:<a href= ”http://www. microsoft. com/en/us/default.aspx?redir=true”>United States English Microsoft Homepage</a></p></body></html>
* Closing connection 0
When requesting http://104.124.13.124 with curl:
$ curl -v http://104.124.13.124
* Rebuilt URL to: http://104.124.13.124/
* Trying 104.124.13.124…
* TCP_NODELAY set
* Connected to 104.124.13.124 (104.124.13.124) port 80 (#0)
> GET / HTTP/1.1
> Host: 104.124.13.124
> User-Agent: curl/7.54.0
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 400 Bad Request
< Server: AkamaiGHost
< Mime-Version: 1.0
< Content-Type: text/html
< Content-Length: 208
< Expires: Wed, 10 Jun 2020 05:54:54 GMT
< Date: Wed, 10 Jun 2020 05:54:54 GMT
< Connection: close
<
<HTML><HEAD>
<TITLE>Invalid URL</TITLE>
</HEAD><BODY>
<H1>Invalid URL</H1> The requested URL “[no URL]”, is invalid.<p> Reference #9.3f233e17.1591768494.367d68e </BODY></HTML>
* Closing connection 0
The connection goes to the same IP address, but the Host: header is different – resulting in a completely different response by the server.
Most web servers support multiple sites via virtual hosting, and will often implement a fallback when addressed by IP or an unknown hostname. For commercial hosting environments the default will often be an error page or the hosting provider’s website, whereas for self-hosted environments it will often be the default page provided by the webserver. If a web server is only hosting a single site, then it is also possible to disable virtual hosting entirely and present the content irrespective of what “Host” header is supplied.
The SSL/TLS protocol also implements its own mechanism known as Server Name Indication (SNI) which achieves a similar effect. The client device indicates the name of the server it’s trying to connect to, and the server behaves according to how it’s configured for each hostname. For example, it may present a different certificate depending on the supplied hostname or may present a different set of cipher suites.
The HTTPS protocol often uses both HTTP/1.1 virtual hosting AND SNI.
Load balancers often use the above mechanisms to decide which server to forward requests to. You may have a single load balancer, with several different back end servers hosting completely different websites using completely different hosting technology (Apache, IIS, JSP, ASPX, PHP etc) depending on what site name has been requested. Content Delivery Networks (CDNs) use exactly these mechanisms to route requests to their correct clients.
How Does This Apply to Scanning?
Well if the only information you have provided to your scanner is the IP addresses, then your scanner won’t know what hostnames might be accepted by the server and will only be able to retrieve the default content that’s available when you target the IP directly. In the case of Microsoft, your scan would find a web server that only displays error pages and would not find any of the actual website contents.
A lot of pentest and scan reports that I’ve seen over the years show a number of web servers that contain nothing but error pages or default pages. But why would someone put a web server online and then not put any content on it?
The short answer is they wouldn’t. If a web server is online it probably has a purpose and some content, you just haven’t identified how to access it which in many cases is because you don’t know the correct HTTP/1.1 virtual host in order to reach the content. I have also seen many instances where a pentest of IPs had very few findings, yet once you add the hostnames you have some serious vulnerabilities discovered.
There are ways to discover potential hostnames and mapping them to the supplied IPs, but these methods are not perfect, can miss things and can be time-consuming. The ideal scenario, therefore, is to specify all the applicable hostnames when you schedule the test or scan.
However, there are caveats to be aware of when doing a scan using hostnames, and it comes down to the default behavior of scanning tools. Let’s take the common network scanning tool “NMap” and a well-known website “www.yahoo.com” as an example.
If we do a DNS lookup of this site we get the following results:
$ host www.yahoo.com
www.yahoo.com is an alias for new-fp-shed.wg1.b.yahoo.com.
new-fp-shed.wg1.b.yahoo.com has address 72.30.35.9
new-fp-shed.wg1.b.yahoo.com has address 98.138.219.231 new-fp-shed.wg1.b.yahoo.com has address 98.138.219.232 new-fp-shed.wg1.b.yahoo.com has address 72.30.35.10 new-fp-shed.wg1.b.yahoo.com has IPv6 address 2001:4998:58:1836::11 new-fp-shed.wg1.b.yahoo.com has IPv6 address 2001:4998:44:41d::4 new-fp-shed.wg1.b.yahoo.com has IPv6 address 2001:4998:44:41d::3 new-fp-shed.wg1.b.yahoo.com has IPv6 address 2001:4998:58:1836::10
The Yahoo site is hosted by 4 IPv6 addresses and 4 legacy IPv4 addresses. If you open this site in a browser, your browser will pick one of those addresses at random, with a preference for IPv6 on a sufficiently modern OS and connection. But what happens when you feed the name “www.yahoo.com” into common scanning tools?
NMap has a scanning mode called “List Scan” accessed with the -sL option which simply lists the addresses that would be scanned, but does not actually scan them. If we run NMap with a list scan and otherwise default options, we get the following results:
$ nmap -sL www.yahoo.com
Starting Nmap 7.80 ( https://nmap.org ) at 2020-06-10 14:35 +08
Nmap scan report for www.yahoo.com (72.30.35.10)
Other addresses for www.yahoo.com (not scanned): 72.30.35.9 98.138.219.232 98.138.219.231
2001:4998:58:1836::11 2001:4998:44:41d::3 2001:4998:44:41d::4
2001:4998:58:1836::10
rDNS record for 72.30.35.10: media-router-fp2.prod1.media.vip.bf1.yahoo.com
Nmap done: 1 IP address (0 hosts up) scanned in 4.64 seconds
In this instance, NMap has picked an address at random to scan and then warns you that it has “not scanned” the other addresses. If you had specified actual scan options then that’s exactly what would happen
– NMap would scan one of the addresses, and ignore the rest although it does give a warning.
Newer versions of NMap now include an option called “–resolve-all”, specifying this option gets a bit further:
$ nmap –resolve-all -sL www.yahoo.com
Starting Nmap 7.80 ( https://nmap.org ) at 2020-06-10 14:49 +08
Nmap scan report for www.yahoo.com (98.138.219.232)
Other addresses for www.yahoo.com (not scanned): 2001:4998:58:1836::10 2001:4998:44:41d::3 2001:4998:58:1836::11 2001:4998:44:41d::4 rDNS record for 98.138.219.232: media-router-fp2.prod1.media.vip.ne1.yahoo.com
Nmap scan report for www.yahoo.com (98.138.219.231)
Other addresses for www.yahoo.com (not scanned):2001:4998:58:1836::10 2001:4998:44:41d::3 2001:4998:58:1836::11 2001:4998:44:41d::4 rDNS record for 98.138.219.231: media-router-fp1.prod1.media.vip.ne1.yahoo.com
Nmap scan report for www.yahoo.com (72.30.35.9)
Other addresses for www.yahoo.com (not scanned): 2001:4998:58:1836::10 2001:4998:44:41d::3 2001:4998:58:1836::11 2001:4998:44:41d::4 rDNS record for 72.30.35.9: media-router-fp1.prod1.media.vip.bf1.yahoo.com
Nmap scan report for www.yahoo.com (72.30.35.10)
Other addresses for www.yahoo.com (not scanned): 2001:4998:58:1836::10 2001:4998:44:41d::3 2001:4998:58:1836::11 2001:4998:44:41d::4 rDNS record for 72.30.35.10: media-router-fp2.prod1.media.vip.bf1.yahoo.com
Nmap done: 4 IP addresses (0 hosts up) scanned in 2.41 seconds
So now NMap will scan all of the legacy IPv4 addresses, but has still not scanned the IPv6 addresses. This is because NMap operates in either IPv4 or IPv6 mode, but never both at the same time. If you were to rerun the above command with the -6 option you would get the opposite result:
$ nmap –resolve-all -sL -6 www.yahoo.com
Starting Nmap 7.80 ( https://nmap.org ) at 2020-06-10 14:51 +08
Nmap scan report for www.yahoo.com (2001:4998:58:1836::10)
Other addresses for www.yahoo.com (not scanned): 72.30.35.10 72.30.35.9 98.138.219.232 98.138.219.231 rDNS record for 2001:4998:58:1836::10: media-router-fp1.prod1.media.vip.bf1.yahoo.com
Nmap scan report for www.yahoo.com (2001:4998:44:41d::4) Other addresses for www.yahoo.com (not scanned): 72.30.35.10 72.30.35.9 98.138.219.232 98.138.219.231 rDNS record for 2001:4998:44:41d::4: media-router-fp2.prod1.media.vip.ne1.yahoo.com
Nmap scan report for www.yahoo.com (2001:4998:58:1836::11)
Other addresses for www.yahoo.com (not scanned): 72.30.35.10 72.30.35.9 98.138.219.232 98.138.219.231 rDNS record for 2001:4998:58:1836::11: media-router-fp2.prod1.media.vip.bf1.yahoo.com
Nmap scan report for www.yahoo.com (2001:4998:44:41d::3)
Other addresses for www.yahoo.com (not scanned): 72.30.35.10 72.30.35.9 98.138.219.232 98.138.219.231 rDNS record for 2001:4998:44:41d::3: media-router-fp1.prod1.media.vip.ne1.yahoo.com
Nmap done: 4 IP addresses (0 hosts up) scanned in 6.00 seconds
Now it scans the IPv6 addresses, and ignores the IPv4. So to get NMap to scan everything you need to specify –resolve-all and run it twice if your target is dual stack.
The common vulnerability scanner Nessus works in the same way. If you specify a hostname, and that hostname resolves to multiple addresses then only one of them will be scanned at random. To work around this, you need to manually specify the hostname and address mappings when giving your targets to Nessus, for example:
www.yahoo.com[2001:4998:58:1836::10]
www.yahoo.com[2001:4998:44:41d::3]
www.yahoo.com[2001:4998:58:1836::11]
www.yahoo.com[2001:4998:44:41d::4]
www.yahoo.com[72.30.35.9]
www.yahoo.com[72.30.35.10]
www.yahoo.com[98.138.219.232]
www.yahoo.com[98.138.219.231]
This will force Nessus to scan all addresses, using the specified hostnames and is covered in the Nessus documentation at https://docs.tenable. com/nessus/Content/ScanTargets.htm
Nessus, like NMap will also warn you when it has detected additional addresses and not scanned them, but this warning is buried as an informational finding amongst all the many other informational findings reported by Nessus so it’s easy to miss.
The SSL Labs scanner deserves an honourable mention, as it actually does the right thing by default when presented with a hostname:
It resolves all the addresses and scans them all, just like you’d expect.
While often all of the addresses used to host a website will be configured in the same way, this is not always the case and it’s important to perform your scanning against all addresses just in case one of them is configured differently.