A systematic approach to debugging Linux network issues. The tools that earn their place and the order I use them in.

On this page

Linux Network Troubleshooting: A Systematic Approach

When something on the network is broken, panic-running random tools is a tempting and unproductive approach. After enough production debugging, I've landed on a systematic order: layer by layer, simplest checks first. This post is that order, with the tools and what they tell you.

The mental model: layered #

Network debugging works best when you check from the bottom up:

Physical / link layer (cable plugged in, NIC up)
IP layer (right address, routes, ARP)
Transport layer (TCP/UDP working, ports open)
Application layer (DNS, TLS, HTTP)

Most network issues are at one layer. Check each in order; the answer surfaces.

For cloud / containerized work, "physical layer" is virtual but the same concept applies — virtual NICs, virtual networks, etc.

Layer 1: is the interface up?#

sh.sh

ip link show

Shows network interfaces and their state. Look for UP and LOWER_UP flags:

code

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 ...

UP means the OS thinks it's up. LOWER_UP means the link layer (cable / virtual link) is up. If LOWER_UP is missing, no carrier — physical issue.

If the interface isn't there at all, kernel module or hardware issue.

ethtool eth0 shows link speed, duplex, link partner. Useful for physical debugging (less so for cloud / virtual).

Layer 2: IP, routes, ARP #

sh.sh

ip addr show

Shows IP addresses on each interface. Confirm the right IP is on the right interface.

sh.sh

ip route show

Shows the routing table. The default route is usually default via <gateway> dev <interface>. If the default is missing or wrong, packets to the internet have nowhere to go.

sh.sh

ip neigh show  # ARP table

Maps IPs to MAC addresses on the local network. If you can't reach a host on the same subnet, check if its ARP entry is REACHABLE. If STALE or FAILED, the host might not be responding to ARP.

Layer 3: connectivity tests #

sh.sh

ping <ip>

The classic. Works if the destination responds to ICMP and routing is correct. Some hosts block ICMP, so ping failing doesn't always mean connectivity is broken — but ping working confirms basic L3 connectivity.

sh.sh

traceroute <host>
# or
mtr <host>  # better, runs continuously

Shows the path packets take. Useful for identifying where in the network the problem is. If traceroute stops at a specific hop, that's where the problem starts.

mtr (My Traceroute) is the better tool — it sends packets continuously and shows packet loss per hop. Lets you spot a flaky middle link.

Layer 4: TCP / UDP / ports #

sh.sh

ss -tlnp        # Listening TCP ports with process info
ss -tunap        # All TCP+UDP, listening and connected

ss is the modern replacement for netstat. Shows what's listening and what connections are open.

For testing if a port is reachable from somewhere:

sh.sh

nc -zv <host> <port>          # Test if port is open
curl -v https://host:port/    # Test HTTPS
telnet <host> <port>          # Old-school, still works

nc -zv is great for quick port-open testing.

For looking at actual TCP connections:

sh.sh

ss -tn state established      # Established connections
ss -ti                        # With detailed TCP info

The -i shows congestion window, RTT estimates, retransmits, etc. — diagnostic gold for slow connections.

Layer 5+: DNS, TLS, HTTP #

If basic connectivity works but the application doesn't:

sh.sh

dig <hostname>            # DNS lookup
dig +trace <hostname>     # Full resolution path

dig is the right tool for DNS debugging. nslookup works but is less informative. host is even simpler.

For checking specific resolvers:

sh.sh

dig @8.8.8.8 example.com
dig @<resolver-ip> example.com

Useful for debugging "why does this work from outside the cluster but not inside" (different resolvers).

For TLS:

sh.sh

openssl s_client -connect host:443 -servername host

Shows the TLS handshake, cert chain, etc. -servername is important for SNI.

Common TLS problems: expired cert, wrong intermediate cert, hostname mismatch. openssl s_client shows all of these clearly.

For HTTP:

sh.sh

curl -v https://example.com/
curl -v --resolve example.com:443:1.2.3.4 https://example.com/

-v shows the full request/response including headers. --resolve lets you bypass DNS to test against a specific IP.

Specific failure patterns I keep seeing #

After enough network debugging, certain patterns recur:

"Connection times out" but not "connection refused". Usually a firewall (security group, network ACL, or host firewall) silently dropping packets. Connection refused means the host is reachable but the port is closed; timeout means the host isn't reachable or the firewall is dropping.

"Connection refused". The service isn't listening on the port, OR it's listening on a different interface (e.g., 127.0.0.1 only). Check ss -tlnp on the destination.

"Name or service not known". DNS is broken. Check /etc/resolv.conf, the resolver itself, etc. Common in containers when DNS config is wrong.

Intermittent failures. TCP retransmits, packet loss, MTU issues. ss -ti shows retransmit counts; mtr shows packet loss per hop.

Slow connections. Often DNS — every connection has a DNS step. dig @<resolver> to see resolver latency. Sometimes it's a slow firewall (deep packet inspection on a hot path).

SSL/TLS handshake failures. Cert mismatches, protocol version mismatches, cipher mismatches. openssl s_client shows the handshake stage that fails.

The Linux network namespace twist #

For containers and Kubernetes, every container has its own network namespace. Tools like ip, ss, etc. show the current namespace's state.

To check inside a container's namespace from the host:

sh.sh

nsenter -t <pid> -n ip addr
nsenter -t <pid> -n ss -tlnp

Or with Docker:

sh.sh

docker exec <container> ss -tlnp

Or with kubectl:

sh.sh

kubectl exec <pod> -- ss -tlnp

A common mistake: running ss on the host and being confused that you don't see the container's listening port. Different namespace; need to enter it.

Packet capture: when other tools don't tell you enough #

When you really need to know what's on the wire:

sh.sh

tcpdump -i any host 1.2.3.4 -nnvv

Captures packets matching the filter, prints them. Useful for:

Confirming a packet is actually being sent
Seeing exactly what's in a packet (TLS won't decode, but TCP/IP details show)
Debugging things between tools and the kernel

sh.sh

tcpdump -i any port 443 -w out.pcap

Capture to a file; open in Wireshark for analysis. Wireshark's UI is much better for non-trivial inspection.

For HTTPS traffic, decryption requires the private key (server side) or session keys (client side, less commonly accessible). Plain HTTP can be inspected directly.

Specific things I do for cloud / containerized environments #

VPC Flow Logs. AWS records every packet's accept/reject decision per ENI. Slow but authoritative. Query in CloudWatch Insights or Athena.

Security group reachability test. AWS has a feature ("Reachability Analyzer") that tells you if a connection between two ENIs is allowed by the security groups + NACLs + route tables. Saves a lot of debugging.

Network policies in Kubernetes. When pod-to-pod traffic is blocked, the CNI's NetworkPolicy enforcement is often the cause. Check the active policies.

ip link show veth* to see container's host-side interface. Useful for checking traffic counters: ip -s link show veth123abc.

Conntrack table fullness. cat /proc/sys/net/netfilter/nf_conntrack_count. If close to nf_conntrack_max, new connections are dropping.

What I check for "service is slow"#

Specific checklist:

DNS lookup time: dig with timing.
Packet loss / retransmits: mtr, ss -ti.
TCP throughput: iperf3 between source and destination.
Application-level latency: actual app timing.
Conntrack pressure on intermediate hops.
CPU on intermediate hops (a saturated NIC sometimes shows up as CPU on the network thread).

The order: cheap and fast first; harder later.

Tools I rarely use anymore #

A few tools that are less relevant than they used to be:

netstat: replaced by ss. ss is faster and gives more info.

ifconfig: replaced by ip. ip is more flexible.

route / route add: replaced by ip route.

arp -a: replaced by ip neigh.

The new tools are part of iproute2. Worth learning if you've been using the old ones for years.

What I'd tell someone learning #

Layer-by-layer beats random. Check L1 → L2 → L3 → ... in order. Most issues live at one layer.

ss over netstat. ip over ifconfig. Modern tools, more useful output.

Read manpages. man ip-route, man ss. They're not Wikipedia articles; they're terse but informative.

mtr is your friend for "where in the path is the problem."

tcpdump for "is the packet actually being sent." When other tools don't agree, look at the wire.

Cloud has additional tools. VPC Flow Logs, Reachability Analyzer, etc. Use them.

Network namespaces matter. When working with containers, remember which namespace you're in.

Linux network debugging has the advantage of being well-tooled and well-documented. The tools are old, stable, and understood. The patterns are layered. Most issues yield to a systematic approach. The skill is in working the layers methodically rather than guessing — which is what makes the difference between 30 minutes of debugging and 3 hours.

Network Configuration and Troubleshooting in Linux

Linux Network Troubleshooting: A Systematic Approach

The mental model: layered #

Layer 1: is the interface up?#

Layer 2: IP, routes, ARP #

Layer 3: connectivity tests #

Layer 4: TCP / UDP / ports #

Layer 5+: DNS, TLS, HTTP #

Specific failure patterns I keep seeing #

The Linux network namespace twist #

Packet capture: when other tools don't tell you enough #

Specific things I do for cloud / containerized environments #

What I check for "service is slow"#

Tools I rarely use anymore #

What I'd tell someone learning #

Stay Updated

What We Learned Running Weekly Game Days on Our CI/CD Pipeline

Operational Checklist: Systemd Service Reliability Patterns

More from Linux

SSH Hardening in 2026: Keys, Certificates, and Bastion Patterns

Linux TCP Tuning for High-Throughput Services

Debugging Latency with eBPF: bpftrace One-Liners That Find It

SSH Hardening in 2026: Keys, Certificates, and Bastion Patterns

Linux TCP Tuning for High-Throughput Services

Debugging Latency with eBPF: bpftrace One-Liners That Find It

systemd Timers vs Cron: Migrating Scheduled Jobs the Right Way

External Secrets Operator: One Secrets Workflow Across Clouds

Four Signals That Matter: Choosing SLIs Users Actually Feel

You might have missed

GitOps with Argo CD: Best Practices for 2025

Prompt Engineering Best Practices: Maximizing LLM Performance

AI Agents in DevOps: From Copilots to Autonomous Automation in 2025

About Kiril Urbonas