|
| 1 | += Debug network traffic |
| 2 | +:description: Capture and analyze network traffic between Pods. This also includes TLS encrypted communications. |
| 3 | +:tcpdump: https://www.tcpdump.org/ |
| 4 | +:mitmproxy: https://www.mitmproxy.org/ |
| 5 | + |
| 6 | +You likely know this problem: Some tools is behaving weird, and you need to debug (often times HTTP/HTTPS or DNS) traffic between Kubernetes Pods. |
| 7 | +If the tool would be running on a local machine, one would simply start {tcpdump}[`tcpdump`] and inspect the traffic. |
| 8 | +Maybe use {mitmproxy}[`mitmproxy`] as a HTTPS proxy to re-encrypt the HTTPS traffic, so that it is readable. |
| 9 | + |
| 10 | +However, as we are running in a containerized environment, things are a bit more complicated. |
| 11 | +This guide explains you how you can capture and inspect traffic anyway. |
| 12 | + |
| 13 | +There are a few things needed: |
| 14 | + |
| 15 | +1. A sidecar running {tcpdump}[`tcpdump`], capturing the traffic into a file. |
| 16 | +2. If TLS (e.g. HTTPS) traffic is involved, the product needs to be configured in such a way, that it writes the TLS session keys into a file. |
| 17 | + The key log can be used afterwards to decrypt the TLS traffic. |
| 18 | +3. Wireshark to make it easier to inspect the captured traffic. |
| 19 | + You can give it the TLS key log and it will automatically decrypt the TLS traffic. |
| 20 | +
|
| 21 | +== Simple usage |
| 22 | + |
| 23 | +If you only care about unencrypted communications, you can use this snippet to dump all traffic using {tcpdump}[`tcpdump`]. |
| 24 | + |
| 25 | +[source,yaml] |
| 26 | +---- |
| 27 | +apiVersion: trino.stackable.tech/v1alpha1 |
| 28 | +kind: TrinoCluster |
| 29 | +metadata: |
| 30 | + name: trino |
| 31 | +spec: |
| 32 | + coordinators: |
| 33 | + podOverrides: |
| 34 | + spec: |
| 35 | + containers: |
| 36 | + - name: tcpdump |
| 37 | + image: nicolaka/netshoot |
| 38 | + command: ["/bin/bash"] |
| 39 | + args: |
| 40 | + - -c |
| 41 | + # If the dump grows to big, you can use regular tcpdump filters here |
| 42 | + # to filter the captured traffic |
| 43 | + - tcpdump -i any -w /tmp/tcpdump.pcap |
| 44 | +---- |
| 45 | + |
| 46 | +=== Attach without restart |
| 47 | + |
| 48 | +You can also use something like `kubectl debug trino-coordinator-default-0 -it --image=nicolaka/netshoot -c tcpdump` to use a debug container and attach to a Pod without restart. |
| 49 | + |
| 50 | +== TLS decryption usage |
| 51 | + |
| 52 | +Let's make things a bit more interesting using a real-world example. |
| 53 | +Let's assume Superset is behaving weird and we want to debug the network traffic from Superset to Trino, which is using HTTPS. |
| 54 | + |
| 55 | +As of Java 21 the JVM does not respect the `SSLKEYLOGFILE` env var and does not seem to have support to write the TLS key log. |
| 56 | +So we need to use a third-party Java agent called https://github.com/neykov/extract-tls-secrets[extract-tls-secrets] for that. |
| 57 | + |
| 58 | +[source,yaml] |
| 59 | +---- |
| 60 | +apiVersion: trino.stackable.tech/v1alpha1 |
| 61 | +kind: TrinoCluster |
| 62 | +metadata: |
| 63 | + name: trino |
| 64 | +spec: |
| 65 | + coordinators: |
| 66 | + envOverrides: |
| 67 | + SSLKEYLOGFILE: /tmp/sslkeys.log |
| 68 | + podOverrides: |
| 69 | + spec: |
| 70 | + # As we can not add a curl command to the Trino startup script, we add a initContainer, |
| 71 | + # that curls the needed jar for us |
| 72 | + initContainers: |
| 73 | + - name: download-java-agent |
| 74 | + image: nicolaka/netshoot # We only need curl, reusing same image for quicker pulls |
| 75 | + command: ["/bin/bash"] |
| 76 | + args: |
| 77 | + - -c |
| 78 | + - curl -L -o /jar/extract-tls-secrets.jar https://github.com/neykov/extract-tls-secrets/releases/download/v4.0.0/extract-tls-secrets-4.0.0.jar |
| 79 | + volumeMounts: |
| 80 | + - name: jar |
| 81 | + mountPath: /jar |
| 82 | + containers: |
| 83 | + - name: tcpdump |
| 84 | + image: nicolaka/netshoot |
| 85 | + command: ["/bin/bash"] |
| 86 | + args: |
| 87 | + - -c |
| 88 | + # If the dump grows to big, you can use regular tcpdump filters here |
| 89 | + # to filter the captured traffic |
| 90 | + - tcpdump -i any -w /tcpdump/tcpdump.pcap |
| 91 | + volumeMounts: |
| 92 | + - name: tcpdump |
| 93 | + mountPath: /tcpdump |
| 94 | + - name: trino |
| 95 | + volumeMounts: |
| 96 | + - name: jar |
| 97 | + mountPath: /jar |
| 98 | + volumes: |
| 99 | + - name: jar |
| 100 | + emptyDir: {} |
| 101 | + # As the dump can grow quite big we use a dedicated emptyDir for it |
| 102 | + - name: tcpdump |
| 103 | + emptyDir: {} |
| 104 | + jvmArgumentOverrides: |
| 105 | + add: |
| 106 | + - -javaagent:/jar/extract-tls-secrets.jar=/tmp/sslkeys.log |
| 107 | +---- |
| 108 | + |
| 109 | +Your Trino now captures all traffic into `tcpdump.pcap` and the SSL key logs into `sslkeys.log`. |
| 110 | + |
| 111 | +Use the following command to copy the files to your local machine |
| 112 | + |
| 113 | +[source,bash] |
| 114 | +---- |
| 115 | +kubectl cp trino-coordinator-default-0:/tcpdump/tcpdump.pcap -c tcpdump tcpdump.pcap && kubectl cp trino-coordinator-default-0:/tmp/sslkeys.log -c trino sslkeys.log |
| 116 | +---- |
| 117 | + |
| 118 | +To inspect the traffic in Wireshark run |
| 119 | + |
| 120 | +[source,bash] |
| 121 | +---- |
| 122 | +wireshark -o tls.keylog_file:./sslkeys.log tcpdump.pcap |
| 123 | +---- |
| 124 | + |
| 125 | +Normal Wireshark usage applies now. |
| 126 | +E.g. for the case of Trino we want to see all `POST /v1/statement` HTTPS calls. |
| 127 | +You can filter for them using `http.request.method == POST && http.request.uri == "/v1/statement"`: |
| 128 | + |
| 129 | +image::debug-network-traffic/1.png[] |
| 130 | + |
| 131 | +You can see that the HTTP packet was actually TLS encrypted in the packet explorer at the bottom. |
| 132 | + |
| 133 | +image::debug-network-traffic/2.png[] |
| 134 | + |
| 135 | +To follow the entire HTTP stream, right-click on the packet and select `Follow` -> `HTTP Stream`. |
| 136 | + |
| 137 | +image::debug-network-traffic/3.png[] |
| 138 | + |
| 139 | +You now see the entire Superset -> Trino conversation, in this case the following SQL query: |
| 140 | + |
| 141 | +[source,sql] |
| 142 | +---- |
| 143 | +SELECT date_trunc('day', CAST(tpep_pickup_datetime AS TIMESTAMP)) AS __timestamp, AVG(duration_min) AS "Average trip duration" |
| 144 | +FROM demo.ny_taxi_data GROUP BY date_trunc('day', CAST(tpep_pickup_datetime AS TIMESTAMP)) ORDER BY "Average trip duration" DESC |
| 145 | +LIMIT 10000 |
| 146 | +---- |
| 147 | + |
| 148 | +image::debug-network-traffic/4.png[] |
| 149 | + |
| 150 | +== Follow-up tips |
| 151 | + |
| 152 | +1. You can filter the packets in the {tcpdump}[`tcpdump`] call to reduce the capture file size. |
| 153 | +2. If you do this on a production setup, keep in mind that the dump might contain sensitive data and the TLS keys can be used to decrypt all TLS traffic of this Pod! |
| 154 | +3. In case the product uses HTTP 2 (or newer), you need to use a Wireshark filter such as `http2.headers.path == "/nifi-api/flow/current-user"` |
0 commit comments