The Good, the Bad and the Ugly with DNS wildcard entries… and Kubernetes

fabricat
3 min readNov 14, 2020

Foreword

Your pods behave weirdly?
You are trying to fetch contents from the internet, but you get plenty of SSL certificates errors?

Probably this is the story you were looking for.
Continue reading…

tl;dr

Since DNS name resolution in Kubernetes is a bit tweaked, that results incompatible with any “search domain” holding a wildcard entry.\
I found no other choice than removing the guilty domain from the search setting on your network interface.

If you are getting the search domain from a DHCP lease, you can run the following commands (as root) on all the schedulable nodes in your Kubernetes cluster.

# check the interface name on your host
NETIF="ens3"
cat > /etc/systemd/network/10-${NETIF}-dhcp.network << "EOF"
[Match]
Name=${NETIF}
[Network]
DHCP=ipv4
[DHCP]
UseDomains=false
EOF
systemctl restart systemd-networkd.service
sleep 4
systemctl restart systemd-resolved.service

Or, if you are using Netplan, just add a drop-in file in /etc/netplan/ like this:

network:
version: 2
ethernets:
ens3:
dhcp4-overrides:
use-domains: no

and then run netplan generate && netplan apply

Finally, restart your pod(s) and run the checks stated below.

Check if you are affected

Let’s assume you have a pod called MYPOD.

On the Kubernetes node where MYPOD is running, check whether there is a line starting with search in the file “/run/systemd/resolve/resolv.conf”.
If that file does not exist, check the same in file “/etc/resolv.conf”.

If search is there and one of the domain names in that line has a wildcard entry (let’s assume it is true for a hypothetical domain called ilikewildcards.com), you are probably affected by the problem. In order to make 100% sure, you have to run further checks inside a pod.

First, run this command on the controller:

kubectl exec -ti MYPOD -- cat /etc/resolv.conf

If ilikewildcards.com appears in the output, you are 99.99% affected.
Example follows:

search default.svc.cluster.local svc.cluster.local cluster.local ilikewildcards.com
nameserver 10.0.0.10
options ndots:5

To reach 100% certainty, run:

kubectl exec -ti MYPOD -- getenet hosts example.com

If the response contains ilikewildcards.com, then you are definitely affected.

10.0.101.2     example.com.ilikewildcards.com

Background

Sometimes wildcard DNS entries just happen, because in some cases they are Good. So, where is the Bad?

Usually, there is no bad (except for the fact that we will receive wrong results only when we search for non-existent DNS names).

Unfortunately, the DNS service in Kubernetes has a special option (“ndots:5”, see man page for resolv.conf) which turns the above into much worse results.
In short, that makes so that each DNS lookup asked by any pod is attempted against all the search domain, one by one: first the Kubernetes internal ones, and then against the domains defined in the resolv.conf file of the host, until the resolver gets a match… and you always get a match when there is a wildcard entry.

Explanation

Let’s see an example: let’s assume that your pod is requesting DNS resolution for example.com. The “resolv.conf” of the pod makes it first try to look up example.com.default.svc.cluster.local. Since no match (NXDOMAIN) is returned, it will then attempt with example.com.svc.cluster.local. Again, in case of no match, the pod tries with example.com.cluster.local. At this point, it continues with the search domains inherited by the host: so in our case, it will try resolving example.com.ilikewildcards.com… and that matches (thanks to the wildcard)… but it is wrong.

In the world outside Kubernetes, the search domains are activated only when no “dots” are present in the DNS query or when the queried name “as is” returns no match.

So, how can we avoid this problem?
Well, several DNS policies are available, but since you probably need to resolve internal names for pods and services, the default policy (which is called ClusterFirst, and not Default) is advisable.
The best option is probably to tell the host to “ignore” the domain name received by DHCP.

Uh, I was almost forgetting about the Ugly…
On recent Linux hosts, DHCP client is not managed anymore by dhclient, but it is managed by the systemd-networkd service.
To make it short, you have to create a specific network configuration, using the commands stated above.

Wait, I am not saying that systemd is the Ugly… but I lost way too much time and hair in finding out how to tame the behaviour of its DHCP client.

References

--

--