top of page
  • White LinkedIn Icon
  • X
  • White RSS Icon
Writer's pictureBrandon Hitzel

PSA: Traceroute - Safe and Effective to use

Updated: Jan 4

Public Service Announcement about the Traceroute tool used by network engineers.


 


Traceroute is a safe and effective troubleshooting tool to use at all levels of information technology and networking experience**. It's used by many people everyday worldwide. Its supported by all major network vendors, operating systems and hardware, including end user operating systems like Windows and Linux!


Don't have $5,000-10,000 per month opex budget for an observability tool? Don't have a full time development team to create and maintain the tooling necessary to fully present your entire network visually based on routing? or perhaps your packets are trapped in an alternate reality and not arriving to their intended destination on-time. Then traceroute is probably your best starting point!



Thousands of Traceroutes are ran every minute worldwide
Thousands of Traceroutes are ran every minute worldwide

Contrary to recent messages circulating that "traceroute doesn't exist" (possibly propagated by the SDN Ai Singularity), its actually a good troubleshooting tool in your belt to obtain information and also verify routing behavior. I often use it to demonstrate routing to folks during training. There might not be an official RFC about it, but the fact is it helps with troubleshooting and validation. I won't link to that post or some of the rebuttals to it, but I will as always provide some useful links. This brief post is intended to fight disinformation circulating the internet.


Public comments made about trace route not existing
Public comments made about traceroute not existing
 

Some people might be saying traceroute is a "hack" or "exploit" but I think its just another tool like ping. I will first acknowledge though that when you issue a traceroute if you don't know what you're looking at then yes you can be drawn to a wrong conclusion or think something is happening that is not (which often happens with non-network engineers).


When you issue a ping you really only learn 3 things: you can get to and from a destination, the end to end latency to the destination, and if the end device is online. However we know if a ping fails that a device can still be online and that it could have failed due to a firewall blocking the ping, we also know that just because a ping goes through doesn't mean our TCP/UDP etc. traffic can if there is an MTU issue. That is why with ping we set the DF bit and adjust the amount of data sent to try and determine what the max MTU is.


Just like with ping when we traceroute we learn nearly the same information. We know if there is high latency it doesn't necessarily mean there is packet loss, and if there are hops not responding (e.g. ***) we know it could be an MPLS hop or a firewall but doesn't mean traffic is being dropped. We still learn a possible path the traffic is taking, we learn IP addresses, hostnames, the latency between hops amongst other info.


High latency could be caused by inter-state/city geography or load-balancing and not necessarily mean there is congestion at those hops which is a common misconception. Also consider control plane policing and other characteristics regarding ICMP messages related to traceroute which can cause no responses or higher latency due to the traces dropping and the tool retrying.


However, assuming we are getting hostnames, the information can be valuable, like a compass, we get a direction of travel. We can determine the last hop our upstream network has before it hands off to the next network and who the network is. We can likely determine geographies of where the traffic is going and how many networks are traveled before the destination.


Map with Compass
Traceroute can be like a compass

Once we get close to the destination we can probably see the last hops, this can validate a design. For instance if you ordered a certain POP to be used on a circuit on a remote location, you could see the router hostname which could list the city. How else can you verify that without someone on that provider's side? If you log into a looking glass (probably not the same router or city) you won't see the last hop IP/hostname.


Lets say you are deploying a new router and a new connection with the intention of doing ECMP. First of course you'll be checking your BGP/IGP configuration and relevant show commands/information, you'll probably also look at the routing table, but won't you also run a traceroute to verify you are seeing the traffic go on both paths as well?


Speaking of multiple links and load-balancing, a common characteristic of one link in a bundle having issues is when traffic is intermittently dropped or behaving differently. Using ping you will only see drops and maybe a different millisecond measurement, however a way to verify this is by tracerouting and seeing if different IP addresses are popping up between the same hostnames when a drop is happening. Some tools will combine pinging with tracing which could help isolate the problematic interface.


Its situations like this where you can verify behaviors or deployments before and after a change. I can count many times where I had a traceroute as part of my validation post-change.

Another example is if you know there is 2 paths from a certain location or subnet and something is reported not working by an end user, you can ask for a traceroute to see if its on the secondary path or the primary path to give a first glance at the network state. That's why its good to either find hardware/tools or leverage your existing monitoring tools to deploy probes and run traceroutes on your network to document state and then alert if things change etc. Thousand eyes, netbrain, and solarwinds come to mind as tools that can do this (there's probably a lot more).


Its safe to run traces on your network consistently and even to destinations over networks you don't control for data measurements, just be aware that this traffic will be classified as best effort and could be dropped (but probably won't be). Please do consider the CPU cost of running these though other operator's networks; keep them within a moderate interval.


I noticed Microsoft Teams clients do traceroutes pretty consistently in the background, think about how much you can learn with millions of hosts doing traces all over the internet and then visualizing that telemetry showing common hops, latency, and IP addresses.


Furthermore, if you were deploying a new diverse secondary DIA circuit to a remote colocation for example, you would want to run traceroutes to determine or confirm that your internet path is different than your primary path. This could change of course but you'd want a baseline path to start.


Lastly, what if you had a customer who was having an issue to an obscure ASN and subnet which you didn't directly peer with, to where you had 5 possible paths to take, a few of which were longer and very latent or intermediate autonomous systems had congestion etc. The level 1 techs can't spend time looking at a bunch of routing data and such to find all the paths and make a determination based on that because they don't know how or have permissions, but instead they ask the customer to traceroute and then pass the ticket to you. From there you see where it's exiting your network with that real-time snapshot. Once you know where it's preferring you can assess your edge configuration and possibly make a local preference adjustment elsewhere for that route via another peer and then provide a better path for that customer to solve the issue. It's different situations like this where network tracing can help lead to a diagnosis, it's not absolute but its helpful.


 


Traceroute example with information source he.net
Example traceroute with information from bgp.he.net/traceroute/

Looking at the above random traceroute output we can see it appears AT&T is the last mile carrier based on hop 3 and can see the first hop when it leaves the local network (last RFC 1918 IP with low latency). We then see another AT&T hop 71.149.23.68 followed by a few unknown *** hops and then eventually see 192.205.32.138 as hope 9 which is the last AT&T IP address. Although there are some unknown hops we can likely conclude the traffic is still on the AT&T network before hop 9.


Once we get on the hurricane electric network we also notice they have "port-channel" and "core" in the listed DNS name so we can deduce that there's a high chance of having multiple links on the path and its a core router, from there we can see it remains on the HE network until the destination. Notice that hop 3 lists hstntx (Houston Tx) and the later HE hostnames list fmt2 (Fremont CA) which helps determine the locations of these routers.


Note the latency, if we had just pinged the destination we would only have seen the ~48ms latency, but here can see as the latency rises between hops 9 and 10, because its going from Texas to California, but due to this distance there is probably more L2 type devices between those routers, more validation would be needed there using a different tracing tool. This is just a quick example to emphasize the points above of the effectiveness of network tracing.


 

Yes we aren't getting the full picture if you don't control the network, yes there are likely hidden hops even on the internet paths (look into tracing to include label switched hops) and yes you will need to look at the routing table on your routers or via looking glasses.


If there aren't DNS records displayed you'll have to check an RIR website or google to try and get info which could be time consuming.



Traceroute not showing a lot of initial information
Traceroute not showing any hostnames from dnschecker.org/online-traceroute.php

You probably shouldn't use traceroute to determine the root cause of a performance issue unless you suspect the issue is increased latency caused by distances or routing changes are frequent, but even then it might not be fruitful. You should seemingly not use traceroute as an SLA validation, (unless e2e path is a part of that) something like TWAMP is better to measure something like that.


Definitely use traceroute to help with various types of validations like determining a network path, routing behavior, first and last hops, and for reconnaissance. Its true it can be more effective if you use it on your own network versus external networks.


I recall I had an issue where I switched to a new service provider and I wondered why my ping was so high to a certain destination with the new provider. I then did a traceroute and noticed traffic was going all the way to Texas before transiting to the last mile provider of the destination closer to where the source originated. After a little additional research based on the trace it appeared that the new provider on the source side only peered with the last mile provider in Texas! Without this information I would have had no idea what was going on and where to try and begin developing a hypothesis.


It's important to educate yourself, your peers, and direct reports on when to use traceroute and when not to use traceroute and how to interpret it. Once you are educated on basic routing and network architecture along with how traceroute works, I think you are safe to use it. It can be an effective way to convey an issue or illustrate a network path quickly without needing to review multiple devices right away, or it could point to which device could have a misconfiguration or location of the problem (CCNP TSHOOT curriculum teaches this).


I think this is turning into one of those divisive topics that many will have an opinion on and which influencers use to trigger responses. However at the end of this post you should know Traceroute exists and you can safely use it.


Thank you for reading and good luck.




**This statement has not been evaluated by the IETF, FCC, or IEEE and is not intended to diagnose, treat, or resolve any networking or communication related issue. Use of traceroute is at our own discretion and networkdefenseblog.com assumes no liability for its use by you.



Would you like to know more?





Some of this post is intended as satire but all information presented is factual. Just mentioning for those who aren't native English speakers or those that are and don't realize this.


 

Brandon Hitzel bio

Recent Posts

See All

Contact Me

  • X
  • X
  • Black LinkedIn Icon
  • LinkedIn Social Icon
  • Black RSS Icon
  • RSS Social Icon

Professional | Personal | Consulting | Volunteering

Use the below form to drop me a line

Success! Message received.

shield.jpg

Copy write © 2024 by Brandon Hitzel 

Site Work in Progress - Best viewed on the desktop

bottom of page