PhilipB PhilipB - 3 months ago 25
Python Question

Why does getaddrinfo not return all IP Addresses?

I am trying to get all IP Addresses for: earth.all.vpn.airdns.org
In Python:

def hostnamesToIps(hostnames):
ip_list = []

for hostname in hostnames:
try:
ais = socket.getaddrinfo(hostname, None)
for result in ais:
#print (result[4][0])
ip_list.append(result[-1][0])
except:
eprint("Unable to get IP for hostname: " + hostname)

return list(set(ip_list))


(fyi eprint is a function for printing error).
The Output gives me 29 Addresses.

But when i do:

nslookup earth.all.vpn.airdns.org


I get about 100 entrys.

How to achieve this in python? Why i am not getting all entrys with "getaddrinfo"?

This behaviour only appears using windows (python 3). When i am executing the code on my Linux (python 2.7) it gives me the same result as using nslookup.

info: As in the answer explained it does not depend on the system.




Without changing anything the results of nslookup and getaddrinfo are the same now.

Answer

Tools like dig, host and nslookup query the default DNS server directly using UDP/TCP and their own implementation of DNS queries, whereas Python's socket module uses the DNS lookup interface of the operating system, which usually uses a more sophisticates lookup mechanism involving e.g. a dns cache, a host file, domain suffixes, link-local name resolution etc.

strace shows that Python's socket.getaddrinfo ends up using a netlink (AF_NETLINK) socket, to query the system for the DNS lookup (Python 2.7 on Ubuntu 12.04). nslookup however, reads the default DNS server from /etc/resolv.conf and opens a UDP socket on port 53.

I think there are two reason why you are getting a different entry count:

  1. The DNS entries are quite volatile a may change at any instant
  2. Python returns cached entries provided by the system's DNS cache whereas nslookup always retrieves "fresh" results.

Furthermore, nslookup might produce slightly different DNS queries than the system resolver (producing another answer). That could be checked with Wireshark, but I will leave that for now.

Another issue could be the truncation of DNS responses when UPD is used. If there is a large number of entries, they won't fit in a single UDP package, so the answer contains a truncation flag. It depends on the client to re-send the DNS query over a TCP socket in order to retrieve all results. (Truncated answers are actually empty).

Edit: A note on caching / volatile

Even if the mismatch is not due to your local DNS cache, it may be due to server-side caching. I tried several DNS servers and all gave different results for that particular name. That means that they are not in sync due to DNS changes within the time-to-live (TTL).