Not exactly.
Most big routers have ASICs (custom silicon) that can handle the bulk of routing decisions, like an interface card will have a chip that can directly determine where a packet needs to go and forwards it there. These are extremely fast, but limited, and are called "fast path".
Aside: Too many ACLs is a common way that packets fall off the fast path, and is why routers on the public Internet will happily forward along bogon traffic that by it's very nature is just wasting bits on the pipes.
There are some things that the fast path cannot handle, and generating ICMP TTL exceeded messages is one of them. Those go over to the router CPU, which historically has been insanely underpowered. Back when I was doing more routing it was common to have host CPUs in the multi-GHz range with multiple cores, but routers of a similar class would have a 100MHz MIPS CPU.
That's why, as the article goes on to explain, "*"s in the traceroute may not indicate a problem. It's not necessarily a literal deprioritization of ICMP.
If you ever see packet loss in a trace at one step but the steps after it aren't showing it, you can ignore that packet loss, it's likely a CPU limitation on a busy router.
Or are there other types of packets in the slow path that do get a delivery guarantee by the router?
I’m curious how these tradeoffs are made.
(reads article - I've got a five digit /. ID and that was after lurking for several years - respond first, ask questions/read article later)
Oh. You now fail to understand networks in Rust instead of C/Python/nicker elastic. sighs in policy based routing tables.
A modern mtr (traceroute is so 90's) should do things like run up and down the stack for each point along a route. It will still probably need to use the TTL field to find each point (IP) but then use ICMP/TCP/UDP/etc to measure that point in some way or perhaps interpolate it from points either side.
When I want to really get to grips with latency and stuff, I start off with a small dedicated box on a customer network and "smoke ping" with all points measurable on the path. I also have several running from our datacentre and a fair few RIPE Atlas probes too.
traceroute is handy but you must be able to decipher what it is telling you. Wearing a stethoscope does not make you a doctor.
I learnt about traceroute, ping and other network setup basics in my very first job (early 90s) as a network admin in a remote building in Bangalore, setting up the very first WAN for some of the earliest tech (now) behemoths, when the latency of the WAN -- over the SEA-ME-WE cables -- exceeded ~1-2 seconds. The satellite hops via EU added more latency. Traceroute and ping where your best friends to diagnose the frequent drops, from the building top microwave antenna to the only ISP (govt approved, of course) that offered a whopping 2x 64 kbps links. And that supported an entire org of 400-500 developers, including state-of-the-art video conf system to NYC, Ottawa, Tokyo.
Curious to figure out how these tools worked, I borrowed copies of the bible (TCP/IP Illustrated - W Richard Stevens [1]), still the most authentic source of all things TCP/IP related.
I'm not one for nostalgia, but fond memories there. Great to see a modern Rust impl though.
EDIT: You probably to increase the maximum hop count for it to fully work.
https://archive.nanog.org/meetings/nanog47/presentations/Sun...
homepage: https://www.bitwizard.nl/mtr/
excellent article on using mtr: https://www.cloudflare.com/learning/network-layer/what-is-mt...
> traceroute tracks the route packets taken from an IP network on their way to a given host. It utilizes the IP protocol's time to live (TTL) field and attempts to elicit an ICMP TIME_EXCEEDED response from each gateway along the path to the host.
2015 - Characterizing ICMP Rate Limitation on Routers[1]
[0] https://pages.cs.wisc.edu/~suman/courses/640/papers/govindan...
Trippy now includes [0] forward loss (Floss) and backward loss (Bloss) _heuristics_ to help surface such behaviour.
The idea was inspired by our previous discussion [1] on the topic on HN some time ago!
These columns are experimental and so not shown by default but can be enabled [2].
[0] https://github.com/fujiapple852/trippy/blob/master/RELEASES....
Previously, I wrote about setting up a Tailscale exit node and appreciated how traffic gets wired to my home network. I wanted to understand traceroute a bit. I’ve never contemplated how it works, and now feels like as good a time as any to do just that. I mean, now’s the time to rewrite it in Rust.
I’ve just used traceroute to investigate how my query is travelling from my computer to my router and to the internet, finally reaching the end server.
| |
At a cursory level, it looks like it’s asking “where is this IP” at each level, and I’m not sure how it does that.
Traceroute doesn’t actually ask this “where is this IP.” It uses a TTL trick.
But to understand it, let’s write some code.
So if we send packets with TTL=1, the first router replies. TTL=2, the second router replies. And so on, until we reach the destination. That’s traceroute.
The core idea
Traceroute is just sending packets that are designed to die at each hop, then listening for the error messages.
Let’s start with a single function that sends one UDP packet at a given TTL and listens for the ICMP reply. Why UDP? Because these are throwaway packets designed to die in transit. We don’t need TCP’s handshake or delivery guarantees. We just fire bytes at a port and wait for routers to tell us they dropped them.
| |
Let’s walk through this.
Lines 7-9: We create a regular UDP socket and set its TTL. This is the key trick; we’re deliberately setting a low TTL so the packet dies before reaching the destination.
Lines 12-17: A second socket, this time a raw ICMP socket. This one listens for all ICMP packets arriving at our machine, including the “Time Exceeded” replies from routers that dropped our short-lived UDP packet. We need libc::SOCK_RAW here because socket2 doesn’t expose raw socket types directly, and we need root/sudo to open it.
Lines 20-21: We send 32 bytes of zeros to port 33434 on the target. The content doesn’t matter. Port 33434 is the traditional traceroute port; nothing listens there, so when our packet finally does reach the destination, the target responds with ICMP “Port Unreachable” instead of “Time Exceeded,” which is how we know we’ve arrived.
Lines 24-38: We read from the raw ICMP socket. The reply is a raw IP packet; the first 20 bytes are the IP header, and bytes 12-15 contain the source address of whoever sent the ICMP reply (that’s the router that dropped our packet). We use MaybeUninit because Rust won’t let us read uninitialized memory; the unsafe block is safe here since recv tells us exactly how many bytes it wrote.
Lines 42-55: The main loop. We increment TTL from 1 to 15, printing each hop. When the responding IP matches our target, we’ve reached the destination and break out.
This needs sudo to run because of the raw ICMP socket.
| |
It works! We can see our Tailscale gateway, home router, ISP, and Google’s network. But there are two problems: it doesn’t know when to stop (it runs all the way to hop 15), and we only get one probe per TTL with no timing information.
Traceroute increments the port for each probe, but conventionally, the original traceroute by Van Jacobson used 33434. Incrementing the port for each probe helps match replies to specific probes, since the original UDP header is embedded inside the ICMP Time Exceeded reply.
Traceroute also does support a TCP mode with -T, to account for networks where firewalls block UDP but allow TCP through. The principle, however, is the same. Set a low TTL, let the packet die, and read the ICMP error.
ICMP stands for Internet Control Message Protocol. It’s the error reporting protocol for the internet, not a data transport. I’ve seen ICMP errors before without realizing it. ping’s “Destination Host Unreachable” is ICMP type 3. Similarly, “Time Exceeded” is ICMP Type 11, and this is what we rely on.
I think of it this way: if I mail a letter and the recipient writes back “I don’t know what this is about,” that’s an HTTP error. If the postal service returns the letter stamped “address doesn’t exist,” that’s ICMP.
Here’s what the ICMP reply packet looks like when we receive it on the raw socket:

The IP header comes first (bytes 0-19), then the ICMP message starts at byte 20. The ICMP payload also contains our original packet’s headers, which is how real traceroute matches replies to specific probes.
Looking at our code, this is the part where we parse the ICMP reply:
| |
Right now we only read the source IP from the IP header. But the ICMP message itself starts at byte 20, and its first byte is the type. We’re ignoring it completely, which is why our traceroute doesn’t know when to stop. If we checked buf[20], we could distinguish between Type 11 (Time Exceeded, meaning a router along the way) and Type 3 (Destination Unreachable, meaning we’ve arrived).
This raw byte parsing is a bit of an unnecessary hard mode. The pnet_packet crate handles all of this idiomatically, but I wanted to understand what’s actually in the packet.
Let’s fix our traceroute so it knows when it’s arrived. First, we replace the Option<Ipv4Addr> return type with an enum that captures the three possible outcomes:
| |
Then we check buf[20] (the ICMP type byte) after extracting the source IP:
| |
And the main loop can now break when we get Reached:
| |
Funnily enough, while running this, I discovered a bug with the typecheck. It is too naive and trusting. What we need to do, is to say we have “reached” only if the ip is equal to the target IP. Otherwise, we get this when running the code.
| |
| |
With this fix, our traceroute correctly stops at 8.8.8.8:
| |
Our output is missing timing. Real traceroute shows round-trip time for each probe. The fix is straightforward: Instant::now() before send, elapsed() after recv. We update the enum to carry the duration:
| |
Then in probe, we wrap the send/recv in a timer:
| |
And print it in milliseconds:
| |
| |
Now we can see latency at each hop. The jump from ~6 ms (local ISP) to ~12 ms at hop 7 is where our traffic leaves the local network and enters Google’s.
Traceroute sends three probes at each TTL. That’s why we see 3 timing values per hop in the original output:
| |
I wondered what this was, but traceroute does this for three reasons:
* among real times means “flaky” not “dead”.github.com as a destination and I kept hitting the load balancer.In code, it’s just wrapping the existing probe() call in a small inner loop. We track the last IP seen and only print it when it changes, so the output stays clean:
| |
| |
That’s starting to look like the real thing.
At this point, I was happy. I understood more about traceroute than I ever knew. But I wanted to figure out what my implementation was lacking.
| Feature | Real traceroute | Ours |
|---|---|---|
| TTL incrementing | Yes | Yes |
| ICMP type checking | Yes | Yes |
| Timing (RTT) | Yes | Yes |
| 3 probes per hop | Yes | Yes |
| DNS reverse lookup | Yes (e.g. dns.google) |
No |
| Port incrementing per probe | Yes (33434, 33435…) | No (fixed 33434) |
ICMP Echo mode (-I) |
Yes | No (UDP only) |
TCP mode (-T) |
Yes | No |
| IPv6 support | Yes (traceroute6) |
No |
While building this, I realized that traceroute’s output looks like a map of the network, but it’s more of a sketch. There are things happening that it can’t reveal:
github.com attempt.* * * hops aren’t necessarily dead routers. Many routers deprioritize or drop ICMP to save CPU. The packets pass through fine; the router just doesn’t bother replying.*?After all this, I was still a little confused why we would see * * * rows. In our code, we print this when we get a timeout, but we don’t break the loop. Can we get stuck in a * limbo if we didn’t limit our hops?
We see * because:
* means “we didn’t get a reply,” not “there’s nothing there.” Our traceroute output proves this - the hops 3-6 are all * but hop 7 still shows up, meaning packets are passing through those silent routers just fine.
I kept having to type sudo cargo run every time, and it bugged me. Regular traceroute doesn’t need that. So I looked into it.
Our code opens a SOCK_RAW ICMP socket to read replies directly. Raw sockets are a privileged operation because they can sniff arbitrary network traffic, so the kernel requires root.
System traceroute gets around this because it’s installed with the setuid bit set. Running ls -la $(which traceroute) shows something like -r-xr-sr-x with the s flag, which means the binary runs with elevated privileges regardless of who calls it.
On macOS, there’s a third option: ICMP datagram sockets (SOCK_DGRAM with IPPROTO_ICMP) that the kernel allows for unprivileged users. More restricted than raw sockets, but enough for basic ping and traceroute.
I started this exercise as I was thinking deeply about Tailscale. This is part of a larger effort I’m on: spending several evenings trying to understand more of the modern internet. I’m reading the WireGuard whitepapers next, and digging deeper into how Tailscale’s control plane works. There’s a lot happening in this space that’s exciting to me, from a distributed systems perspective and from a programming one.
I call this an evening well spent. One of my previous companies blocked ping and it messed up my mental model all the time when I was there. I’m glad I can write my own traceroute and debug things if a future company chooses to somehow prevent it as well.
The code for this post is on GitHub. Here’s the final version:
| |