I have been broken for three decades and I still don't understand DNS. It is a simple protocol but people use it in complicated manners.
What I think is missing is a bit more of the βin practiceβ side. If the author was surprised about TTL values, I doubt they have much experience with some of the other pitfalls, so Iβm not surprised (not a knock on the author). But there is a reason why the phrase βItβs always DNSβ exists.
As an example, it could be helpful to mention that ISP DNS resolvers (or any caching resolver in the path) could decide to ignore the TTL. In this case, your 360 sec TTL might not get updated for an hour or a day or longer. This can be infuriating to troubleshoot.
A section on troubleshooting might also be beneficial. But this mainly consists of checking results from different resolvers in your path - does it work with a local resolver? Your ISPs DNS? The authoritative server?
The biggest pain of DNS for most people is if someone has set the TTL to an absurdly large number, or if a resolver isn't respecting TTL. And once you get into advanced configurations, SOAs and delegation certainly create their own headaches!
Propagation might be a useful way to visualise it, but doesn't match reality unless every cache is a warm cache.
(Assuming a typical home connection, your router is _probably_ not a DNS server with local cache, it probably is a DHCP server which will hand out the upstream/ISPs' nameservers.)
dnsmasq is the defacto tool on these embedded devices for dhcp+dns.
Nowadays I'm in Finland and definitely the router runs no DNS service, the DHCP service advertises the ISP resolvers.
Probably depends on the region/ISP I guess, but I had no expectation that it would be the more common option.
would really be happy to have had these explanations before I had to figure it out for myself.
then you have these guys who reached the next level
This used to be true until virtual hosting came along, allowing for several domains to point to the same IP address, but only for non-HTTPS traffic. Then a bit later we got SNI (Server Name Indication) that did the same thing for HTTPS.
I remember having web servers with 10-12 public IP adresses when I started working. The number of IPv4 addresses needed has been greatly reduced since.
Noticeably faster as in just loading a website? Or in some script where small differences add up? I thought typical DNS lookup was sub 100ms, but I've never tried switching my resolver so I'm curious
The most egregious of course is ISPs rewriting TTLs (or resolvers that just ignore them). But there are other implementation issues too, like caching things that shouldn't be or doing it wrong. I've seen resolvers that cache a CNAME and the A record it resolves to with the TTL of the CNAME (which is wrong).
I'm also very concerned about the "WHY DNS MATTERS FOR SYSTEM DESIGN" section. While everything there is correct enough, it doesn't dive into the implication of each and how things go wrong.
For example, using DNS for round robin balancing is an awful idea in practice. Because Comcast will cache one IP of three, and all of a sudden 60% of your traffic is going to one IP. Similar issue with regional IPs. There are so many ways for the wrong IP to get into a cache.
There is a reason we say "it's always DNS".
The fact that a server can serve multiple vhosts and do TLS cert selection via SNI is not related to the lookup of what server to connect to.
For round-robin, I've actually had it work reasonably well for API usage. Of course it's not ideal, but when I wanted to roll out new things slowly over several days and could not use a load balancer or reverse proxy, it kind of worked. I think most API users are just running with a reasonable resolver and not residential ISP ones.
you mean DNSSEC, right? RIGHT?
But after two months, about 1% was still going to the old server (I had set it up as a proxy for the cutover). Most of that traffic looked like crawlers that were written in things like Python or Ruby and had probably hard coded the IP or done something where it just didn't know what a TTL was.
So at that point I just shut down the old server.
You're probably right about API clients using better resolvers though. I was talking about consumer facing things where a lot of people would be on ISP DNS.
Last week, I pointed my domain to a new server. Changed the A record, waited... nothing. The old site kept showing up. I cleared my browser cache. Still nothing. Restarted my computer. Nothing.
Three hours later, I learned about DNS propagation and TTL. That rabbit hole led me to actually understand how DNS works. Not just "it translates domains to IPs" - but the whole system.
Here's what I learned.
At its core, DNS is simple: it translates domain names into IP addresses. You type example.com, DNS returns 93.184.216.34, and your browser connects to that IP. Without DNS, you'd need to memorize IP addresses for every website. Nobody wants that.
But the how is where it gets interesting.
Here's the mental model that finally made DNS click for me: it's organized like a chain of referrals. Each level knows about the level below it, and nobody knows everything.

Every DNS lookup starts here. I didn't even know root servers existed until I had to debug my propagation issue. There are 13 root server clusters worldwide, operated by organizations like Verisign, ICANN, and NASA. Root servers don't know where example.com is - but they know where to find the .com TLD servers. They're the starting point of the referral chain.
You know this part - the last bit of a domain: .com, .org, .in, .io. TLD servers know all domains registered under them. The .com TLD knows where to find example.com, google.com, and every other .com domain.
Common TLDs:
.com - Commercial (anyone can register).org - Organizations.net - Network services.io - British Indian Ocean Territory (popular with tech).dev - Developers (requires HTTPS).gov - US Government only.edu - Educational institutionsThe name you actually buy: example, google, github. Combined with TLD, you get example.com. This is what you purchase from registrars like Namecheap, GoDaddy, or Cloudflare. I use Cloudflare for my domains - the DNS management is clean and free. (Once you understand DNS, you might want to check out my guide on deploying Django REST Framework to production - that's where you'll actually use this knowledge.)
Anything before the domain: api.example.com, mail.example.com, staging.example.com. The cool part? You create these yourself through DNS records. No additional purchase needed. If you own example.com, you can create unlimited subdomains for free.
When I first opened my domain's DNS panel, I saw a bunch of record types I didn't understand. A, AAAA, CNAME, MX, TXT... it looked intimidating. But once you know what each one does, it's actually straightforward.
This is the one you'll use most. Maps a domain to an IPv4 address. When I moved my site to a new server, this is what I changed.
example.com A 93.184.216.34
When someone requests example.com, DNS returns 93.184.216.34.
Same thing, but for IPv6 addresses. The weird name? Four A's because IPv6 addresses are four times longer than IPv4. I haven't had to set one of these up myself yet.
example.com AAAA 2606:2800:220:1:248:1893:25c8:1946
This one's clever. Instead of pointing to an IP, it points to another domain. It says "go ask this other domain instead."
www.example.com CNAME example.com
blog.example.com CNAME example.ghost.io
Super useful when you use third-party services. I use this to point subdomains to hosted services - your blog can point to Ghost, your docs to GitBook, without managing their IPs yourself.
Ever wondered how [[email protected]](https://www.bhusalmanish.com.np/cdn-cgi/l/email-protection) knows where to go? MX records. They specify where to deliver emails for your domain. The priority number matters - lower numbers are tried first.
example.com MX 10 mail1.example.com
example.com MX 20 mail2.example.com
When someone emails [[email protected]](https://www.bhusalmanish.com.np/cdn-cgi/l/email-protection), mail servers check the MX record to find where to deliver. If mail1 (priority 10) is down, it tries mail2 (priority 20).
This one seems random at first - it just stores text. But it's used for some important stuff.
example.com TXT "v=spf1 include:_spf.google.com ~all"
example.com TXT "google-site-verification=abc123xyz"
I've used TXT records for:
This is what got me. Remember my three hours of confusion? The TTL on my old A record was 3600 seconds. That's one hour. Every resolver that had cached my old IP would hold onto it for up to an hour before checking again.
TTL tells resolvers how long to cache a DNS result.
example.com A 93.184.216.34 TTL=3600
This means: "Cache this answer for 3600 seconds (1 hour). Don't ask again until then."
Common TTL values:
Low TTL = Faster updates, but more DNS queries (slightly slower first load).
High TTL = Fewer DNS queries (better performance), but slower propagation when you change records.
Pro tip I learned the hard way: if you're planning to migrate servers, lower your TTL to 300 a few days before. After migration, old cached records expire in 5 minutes instead of hours. Wish I'd known this before my migration.
Once I traced through this whole process, everything clicked. Here's what actually happens when you type api.example.com in your browser:
Step 1: Browser cache
"Do I have api.example.com cached?" β No
Step 2: OS DNS cache
"Do I have it?" β No
Step 3: Router cache
"Do I have it?" β No
Step 4: ISP's DNS Resolver
"Do I have it?" β No, let me find out...
Step 5: Ask Root Server
Resolver: "Where is api.example.com?"
Root: "I don't know, but .com TLD is at 192.5.6.30"
Step 6: Ask .com TLD Server
Resolver: "Where is api.example.com?"
TLD: "I don't know, but example.com's nameserver is at ns1.example.com"
Step 7: Ask example.com's Nameserver
Resolver: "Where is api.example.com?"
NS: "It's at 93.184.216.34"
Step 8: Response flows back
Resolver caches it (based on TTL)
Returns 93.184.216.34 to your browser
Step 9: Browser connects to 93.184.216.34
Visual representation:

Each intermediate resolver caches the result based on TTL. Subsequent requests from the same network skip most of this because the resolver already knows the answer. That's why the second visit to a site is faster than the first.
I used to confuse resolvers and nameservers. They sound similar, but they do different jobs.
A resolver is a DNS server that does the hard work of finding IP addresses for you. It handles the recursive lookup through Root β TLD β Authoritative nameservers.
Your device asks a resolver: "What's the IP of example.com?" The resolver does all the work and returns: "It's 93.184.216.34"
Common public resolvers:
8.8.8.8 - Google Public DNS1.1.1.1 - Cloudflare (fastest, privacy-focused)208.67.222.222 - OpenDNS (Cisco)9.9.9.9 - Quad9 (security-focused, blocks malware)You can change which resolver you use in your network settings. I switched to 1.1.1.1 on my machines - it's noticeably faster than my ISP's default resolver.
A nameserver is the source of truth. It actually holds the DNS records for a domain. When you register a domain, you point it to nameservers that store your A records, CNAME records, MX records, etc.
The difference between resolvers and nameservers:
When you buy a domain from Namecheap or GoDaddy, they give you default nameservers like:
ns1.namecheap.com
ns2.namecheap.com
You can change these to use Cloudflare, AWS Route 53, or your own nameservers:
# Cloudflare nameservers
ns1.cloudflare.com
ns2.cloudflare.com
# AWS Route 53 (example)
ns-1234.awsdns-56.org
ns-789.awsdns-12.co.uk
Why bother changing nameservers? Different providers offer different features. I moved my domains to Cloudflare because they give you free DDoS protection, a CDN, and the dashboard is just nicer to use. Route 53 makes sense if you're already deep in AWS.
Authoritative nameserver - Has the final answer for a domain. When asked "Where is example.com?", it responds with the actual IP because it holds the records.
Recursive resolver - Doesn't hold records, but finds answers by asking authoritative servers. Your ISP's DNS server and 8.8.8.8 are recursive resolvers.
These commands saved me during my debugging session. When you need to see what's actually cached:
# Chrome
chrome://net-internals/#dns
# Firefox
about:networking#dns
# Edge
edge://net-internals/#dns
These show all cached domains with their IPs and remaining TTL.
# Windows (Command Prompt)
ipconfig /displaydns
# Mac
sudo dscacheutil -cachedump
# Linux
# Most distros use systemd-resolved
resolvectl statistics
# Using nslookup (Windows/Mac/Linux)
nslookup example.com
# Using dig (Mac/Linux - more detailed)
dig example.com
# Query specific record type
dig example.com MX
dig example.com TXT
# Query using a specific resolver
dig @8.8.8.8 example.com
dig @1.1.1.1 example.com
Go to chrome://net-internals/#dns and click "Clear host cache".
# Windows
ipconfig /flushdns
# Mac
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder
# Linux (systemd)
sudo systemd-resolve --flush-caches
This is what I should've done first. Flushing the cache forces your system to fetch fresh DNS records instead of using stale cached ones.
Here's where DNS gets interesting beyond just "hosting a website." If you're designing systems at scale, DNS becomes a tool, not just a lookup service.
DNS can return different IPs for the same domain. Each request might get a different server. This is called DNS round-robin load balancing.
example.com A 93.184.216.34
example.com A 93.184.216.35
example.com A 93.184.216.36
If one server dies, DNS can stop returning its IP. Health checks detect the failure, DNS record is updated, traffic flows to healthy servers.
Return IPs of servers closest to the user. A user in Europe gets European server IPs, a user in Asia gets Asian server IPs. This is called GeoDNS or latency-based routing.
DNS caching reduces lookup time and load on DNS servers. But it makes instant updates impossible. If you change an IP and TTL is 24 hours, some users will hit the old IP for up to 24 hours.
βββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββ
β Term β What it does β
βββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ€
β Root β Top of DNS tree, knows TLD locations β
β TLD β .com, .org, .io - knows domains under β
β Domain β What you buy (example, google) β
β Subdomain β What you create free (api, www, mail) β
βββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ€
β A Record β Domain β IPv4 address β
β AAAA Record β Domain β IPv6 address β
β CNAME β Domain β Another domain (alias) β
β MX β Where to send email β
β TXT β Verification, SPF, DKIM, DMARC β
βββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββ€
β TTL β Cache duration in seconds β
β Resolver β Server that finds IPs for you β
βββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββ
DNS broke my site for three hours. But now I actually understand it - not just what it does, but how it works. Next time something doesn't propagate, I'll know exactly where to look.
What's something you use every day but never really understood until it broke?

Software Developer from Nepal. 3x Hackathon Winner. Building digital products and learning in public.