INC-2026-04-01: AWS Data Transfer Cost Spike

Incident post-mortem and diagnosis log

Status

Resolved

Severity

Critical

Detected

2026-04-04

Started

2026-03-19

Resolved

2026-04-02 06:24 UTC

Total Impact

~$689

Summary

A misconfigured Caddy web server with file_server browse enabled on publicly accessible ports allowed bots and crawlers to recursively scrape directory listings, generating ~440 GB/day of outbound data transfer from the EC2 instance. This ran undetected for 14 days, incurring ~$549 in AWS data transfer charges on top of normal operating costs.

Detection

The incident was detected on 2026-04-04 when the billing check script triggered a threshold alert:

$ bash scripts/check-aws-bill.sh
[WARNING] AWS bill for this month: $55.48 USD (~51.60 EUR) — exceeds 50 EUR threshold!

This was only 4 days into April, putting the projected monthly cost at ~$400+. The script uses the AWS Cost Explorer API with BlendedCost metrics and a USD-to-EUR conversion rate of 0.93:

# scripts/check-aws-bill.sh (simplified)
THRESHOLD_EUR="${1:-50}"
USD_TO_EUR="0.93"
START_DATE="$(date -u +%Y-%m-01)"
END_DATE="$(date -u -d '+1 day' +%Y-%m-%d)"

aws ce get-cost-and-usage \
  --time-period "Start=${START_DATE},End=${END_DATE}" \
  --granularity MONTHLY \
  --metrics BlendedCost

The script only runs during weekly review, which is why a $40/day bleed went undetected for 14 days.

Infrastructure

EC2 Instance

Property	Value
Instance ID	`i-0a10d***390d`
Type	`t3a.medium` (2 vCPU, 4 GB RAM)
Region	eu-west-1 (Ireland)
OS	Ubuntu 24.04
Storage	80 GB gp3
IAM Role	`[redacted-role]`
Private IP	172.31.xx.xxx
Tailscale IP	100.xx.xxx.xx
Purpose	Always-on dev environment running Claude Code remote-control agent

Services Running

# systemctl list-units --type=service --state=running (relevant)
caddy.service                       Caddy web server
claude-bridge-watcher.service       Watch claude-remote logs for new bridge URLs
claude-redirect.service             Claude Smart Redirect Server (Python, :8787)
claude-remote.service               Claude Code Remote Control (node)
docker.service                      Docker (no containers running)
ssh.service                         OpenBSD Secure Shell server
tailscaled.service                  Tailscale node agent
snap.amazon-ssm-agent...service     AWS SSM Agent

Listening Ports

# ss -tlnp
127.0.0.1:2019   Caddy admin API
127.0.0.1:8787   claude-redirect (Python)
0.0.0.0:22       SSH
*:80             Caddy HTTP
*:443            Caddy HTTPS
100.xx.xxx.xx:43908  Tailscale

Cron Jobs

# crontab -l
0 */12 * * * /home/ubuntu/projects/personal-os/scripts/refresh-claude-token.sh

SSM Parameter Store

Stores Claude Code credentials at /[redacted]/credentials as SecureString in eu-west-1.

Tailscale Network

# tailscale status
100.xx.xxx.xx   [this-machine]     [user]@  linux    -
100.xxx.xxx.xx  [mobile-device]    [user]@  iOS      active; relay "par"
100.xx.xx.xx    [desktop]          [user]@  windows  offline, last seen 9d ago

Exit node: disabled. Not routing other devices' traffic through EC2.

Normal Baseline Cost

Item	Daily	Monthly
t3a.medium on-demand ($0.0376/hr)	~$0.90	~$27
80 GB gp3 EBS	~$0.21	~$6.40
Public IPv4 ($0.005/hr)	~$0.12	~$3.60
SSM, VPC, misc	<$0.05	<$1.50
Expected total	~$1.28	~$38.50

February 2026 had $0.00 in costs — the instance did not exist yet.

Root Cause

The Caddy configuration for dev.liztem.com and mwinsevilla.dev.liztem.com included the file_server browse directive, which serves HTML directory listings for any directory without an index file. With ports 80 and 443 open to the internet, automated crawlers discovered these listings and hammered them continuously.

# The problematic config (before fix)
dev.liztem.com {
    root * /home/ubuntu/projects/personal-os/sites
    file_server browse    # <-- this was the problem
}

AWS charges $0.09/GB for outbound data transfer from EC2 in eu-west-1. At ~440 GB/day, that's ~$40/day in transfer costs alone.

Timeline

2026-03-19 ~07:44 UTC

Server rebooted. Caddy started with file_server browse on HTTP only (no TLS). Claude remote service also started but failed repeatedly due to workspace trust error.

2026-03-19 ~08:11 UTC

Caddy fully up. Bots discover open directory listings. Outbound traffic ramps to ~5.4 MB/s sustained.

2026-03-19 – 2026-04-02

Continuous ~440 GB/day outbound. Daily AWS cost: ~$40/day in data transfer. March total: $634.30.

2026-04-02 06:01 UTC

SSH session opened from iPhone (100.xxx.xxx.xx via Tailscale). Investigation begins: iptables, tcpdump, Caddy config review.

2026-04-02 06:24:09 UTC

Fix applied: sudo sed -i 's/file_server browse/file_server/' /etc/caddy/Caddyfile

2026-04-02 06:24:13 UTC

Caddy reloaded. Traffic drops from ~5,400 kB/s to near zero within minutes.

2026-04-04

Alert triggered: April bill already at $55.48 (4 days in). Full diagnosis performed.

Impact: Cost Breakdown

March 2026

Service	Cost
EC2 Compute (t3a.medium)	$508.22 (mostly data transfer)
Tax	$110.08
EC2 - Other (EBS, IPs)	$14.40
VPC	$1.60
Total	$634.30

Daily Data Transfer Costs (March 19–31)

Date	Data Transfer	Daily Total
Mar 19	$17.78	$19.11
Mar 20	$39.59	$41.83
Mar 21	$40.61	$42.91
Mar 22	$40.27	$42.53
Mar 23	$38.57	$40.81
Mar 24	$36.69	$38.94
Mar 25	$39.38	$41.66
Mar 26	$40.46	$42.67
Mar 27	$39.50	$41.81
Mar 28	$42.16	$44.45
Mar 29	$39.63	$41.91
Mar 30	$40.78	$43.07
Mar 31	$39.75	$42.04

April 2026 (through April 4)

Date	Data Transfer Out	Volume (GB)	Daily Total
Apr 1	$30.66	440.67	$42.57
Apr 2	$10.19	113.20	$11.83
Apr 3	$0.00	0.05	$1.08

Diagnostic Investigation (2026-04-04)

Step-by-step process used to identify the root cause:

1. Initial Alert

$ bash scripts/check-aws-bill.sh
[WARNING] AWS bill for this month: $55.48 USD (~51.60 EUR) — exceeds 50 EUR threshold!

$55.48 in 4 days = projected ~$400+/month. Something was very wrong.

2. Cost Breakdown by Service

# aws ce get-cost-and-usage --group-by Type=DIMENSION,Key=SERVICE
$   43.60  Amazon Elastic Compute Cloud - Compute
$    9.63  Tax
$    1.83  EC2 - Other
$    0.34  Amazon Virtual Private Cloud
$    0.08  AWS Cost Explorer

$43.60 in EC2 compute for 4 days. A t3a.medium should be ~$3.60 for that period — 12x expected.

3. Daily Drill-Down Revealed the Pattern

April 1 cost $42.57, April 2 cost $11.83, April 3 cost $1.08. Something happened between April 2 and 3.

4. Usage Type Breakdown Identified Data Transfer

# April 1 usage types:
$30.66  (440.67 GB)  EU-DataTransfer-Out-Bytes    # THIS IS THE PROBLEM
$ 0.97  ( 19.33 hrs) EU-CPUCredits:t3a
$ 0.96  ( 23.57 hrs) EU-BoxUsage:t3a.medium
$ 0.23  (  2.67 GB)  EU-EBS:VolumeUsage.gp3
$ 0.12  ( 24.00 hrs) EU-PublicIPv4:InUseAddress

5. March Confirmed Sustained Bleeding

# March 2026 total: $634.30
$508.22  Amazon Elastic Compute Cloud - Compute  # almost all data transfer
$110.08  Tax
$ 14.40  EC2 - Other
$  1.60  Amazon Virtual Private Cloud

February 2026 had $0.00 costs (instance did not exist yet).

6. Ruled Out Other Causes

Suspect	Finding	Verdict
Tailscale exit node	`ExitNodeOption: False`, `ExitNode: False`. Total Tailscale TX: 856 KB	Ruled out
Docker containers	`docker ps`: no containers running	Ruled out
Claude remote-control	Running since Mar 28, low CPU (32min over 6 days). Single HTTPS conn to Claude API	Ruled out
SSM Agent	Two HTTPS connections to AWS endpoints. Negligible traffic	Ruled out
Cron jobs	Only `refresh-claude-token.sh` every 12h. No backup or sync jobs	Ruled out
Large files on disk	`sites/` directory only 88 KB. `/tmp/vertex-ai-creative-studio` 448 MB but not served	Ruled out

7. Active Connections at Time of Diagnosis

# ss -tnp (2026-04-04 ~09:50 UTC)
ESTAB  [local]:35816  [aws-ssm-1]:443      ssm-agent-worke  # AWS SSM
ESTAB  [tailscale]:22 [mobile-device]:53939 sshd             # current SSH session (Tailscale)
ESTAB  [local]:41360  [derp-relay]:443     tailscaled       # Tailscale DERP relay
ESTAB  [local]:42636  [ts-coord]:443       tailscaled       # Tailscale coordination
ESTAB  [local]:42634  [claude-api]:443     node             # claude remote-control
ESTAB  [local]:48500  [claude-api]:443     claude           # current claude session
ESTAB  [local]:51840  [aws-ssm-2]:443      ssm-agent-worke  # AWS SSM
ESTAB  [local]:56238  [ts-relay]:443       tailscaled       # Tailscale

All connections accounted for. No suspicious outbound traffic at time of diagnosis — fix was already in place.

8. Current Transfer Rate Confirmed Fix

# 5-second measurement on ens5
TX in 5 seconds: 0 MB
Rate: 0 MB/s

Network Evidence (sar)

System Activity Reporter data from /var/log/sysstat/ confirmed the exact moment traffic stopped.

April 1: Full Day at Saturation

# sar -n DEV --iface=ens5 -f /var/log/sysstat/sa01 (sampled)
# Time         IFACE   rxpck/s  txpck/s  rxkB/s   txkB/s
00:10:06       ens5    384.81   3740.47    29.50   5383.41
06:10:06       ens5    404.70   3790.89    36.31   5407.80
12:20:02       ens5    407.64   3576.05    30.20   5140.76
18:30:01       ens5    416.40   3745.43    30.90   5390.91
23:50:02       ens5    598.19   3671.86    41.87   5278.15
Average:       ens5    399.47   3755.35    30.20   5400.01  # 5.4 MB/s = ~460 GB/day

April 2: Traffic Stops at 06:24 UTC (Full Hourly Log)

# sar -n DEV --iface=ens5 -f /var/log/sysstat/sa02
# Time         IFACE   rxpck/s  txpck/s  rxkB/s   txkB/s
00:10:06       ens5    384.81   3740.47    29.50   5383.41   # high
00:20:02       ens5    407.64   3576.05    30.20   5140.76
00:30:01       ens5    416.40   3745.43    30.90   5390.91
00:40:06       ens5    357.83   3689.13    27.73   5309.93
00:50:06       ens5    424.39   3534.03    30.87   5080.43
01:00:06       ens5    413.28   3495.53    30.16   5023.62
01:10:06       ens5    498.79   3477.14    34.60   4996.96
01:20:02       ens5    530.38   3791.40    36.62   5461.00
01:30:01       ens5    523.18   3637.37    36.34   5231.13
01:40:06       ens5    543.72   3668.03    37.19   5277.97
01:50:02       ens5    463.15   3714.80    33.14   5347.11
02:00:06       ens5    407.96   3914.45    30.68   5641.11
02:10:06       ens5    349.01   3613.52    27.14   5198.86
02:20:02       ens5    364.53   3625.12    27.97   5215.73
02:30:06       ens5    344.64   3523.44    26.71   5052.73
02:40:06       ens5    360.97   3478.70    27.32   4987.92
02:50:02       ens5    392.48   3573.90    29.16   5131.97
03:00:06       ens5    469.31   3563.39    33.07   5108.35
03:10:06       ens5    395.83   3362.55    29.04   4792.90
03:20:06       ens5    411.02   3849.66    30.64   5512.46
03:30:06       ens5    378.15   3879.01    28.99   5554.74
03:40:06       ens5    369.64   3651.22    28.18   5217.12
03:50:02       ens5    378.33   3510.68    28.49   5006.00
04:00:06       ens5    352.99   3564.45    27.19   5088.03
04:10:06       ens5    404.61   3334.08    29.53   4751.01
04:20:02       ens5    389.28   3558.42    28.98   5079.32
04:30:00       ens5    406.98   3625.60    29.97   5180.71
04:40:06       ens5    396.10   3649.94    29.55   5217.62
04:50:02       ens5    418.34   3743.13    30.80   5348.09
05:00:06       ens5    445.47   3884.12    32.39   5562.89
05:10:06       ens5    408.42   3864.49    30.42   5538.18
05:20:02       ens5    371.94   3892.54    28.68   5579.34
05:30:01       ens5    396.91   3878.53    29.90   5556.96
05:40:06       ens5    440.10   3924.76    32.23   5617.23
05:50:02       ens5    460.89   3857.73    33.16   5530.37
06:00:06       ens5    521.26   3787.15    36.15   5415.55
06:10:06       ens5    404.70   3790.89    36.31   5407.80   # still high
06:20:06       ens5    353.01   2997.37    27.89   4247.67   # starting to drop
06:30:02       ens5     34.18     42.60    30.68     15.27   # CADDY RELOADED at 06:24
06:40:06       ens5      8.61     31.45     1.08      2.74   # idle
06:50:03       ens5      7.94     28.78     1.05      2.71
07:00:06       ens5     14.88     41.35     2.23      8.30
07:10:06       ens5     10.85     28.04     4.33      3.42
...
23:50:02       ens5      2.52      3.07     0.63      0.70   # stayed idle all day
Average:       ens5    114.32    984.32     9.15   1394.61

April 3: Fully Idle (Incident Over)

# sar -n DEV --iface=ens5 -f /var/log/sysstat/sa03
Average:       ens5      3.65      3.17     2.15      0.71   # 0.71 kB/s outbound

April 3: Cost Breakdown Confirms Resolution

# aws ce get-cost-and-usage (April 3 usage types)
$ 0.80  (19.69 hrs)  EU-BoxUsage:t3a.medium
$ 0.17  ( 1.89 GB)   EU-EBS:VolumeUsage.gp3
$ 0.10  (21.00 hrs)  EU-PublicIPv4:InUseAddress
$ 0.00  ( 0.05 GB)   EU-DataTransfer-Out-Bytes     # back to near zero
$ 0.00               EU-EUC1-AWS-Out-Bytes
$ 0.00               EU-USE1-AWS-Out-Bytes
$ 0.00               EU-DataTransfer-In-Bytes
$ 0.00               EU-DataTransfer-Regional-Bytes
$ 0.00  ( 2.00)      eu-west-1-KMS-Requests
# Total: $1.08 — normal baseline cost

Interface Counters at Time of Diagnosis

# /proc/net/dev (2026-04-04 ~09:50 UTC)
# Uptime: 16 days, 3 hours 23 minutes
#
# Interface    RX bytes         TX bytes
  lo:          89,297,020       89,297,020
  ens5:        38,064,855,769   6,673,877,909,724   # 38 GB in, 6.6 TB out
  docker0:     0                0
  tailscale0:  126,978          451,927              # 127 KB in, 452 KB out

# 6.6 TB TX / 16 days = ~417 GB/day average
# Tailscale traffic negligible (452 KB total)
# docker0 unused (no containers)

TCP Socket Summary

# ss -s (at time of diagnosis)
Total: 229
TCP:   36 (estab 9, closed 20, orphaned 0, timewait 17)

Transport Total     IP        IPv6
RAW       1         0         1
UDP       8         5         3
TCP       16        15        1
INET      25        20        5

Service Topology

EC2 t3a.medium (eu-west-1) — i-0a10d***390d
  |
  |- Caddy (ports 80, 443)
  |    |- claude.liztem.com     -> reverse_proxy 127.0.0.1:8787
  |    |- dev.liztem.com        -> file_server (sites/)
  |    |- mwinsevilla.dev.liztem.com -> file_server (sites/mwinsevilla/)
  |
  |- claude-remote.service      -> claude remote-control
  |- claude-redirect.service    -> Python redirect server :8787
  |- Tailscale (100.xx.xxx.xx)  -> SSH access
  |- SSM Agent                  -> credential sync

Usage Type Breakdown (April 1)

$30.66  (440.67 GB)  EU-DataTransfer-Out-Bytes
$ 9.63               Tax
$ 0.97  ( 19.33 hrs) EU-CPUCredits:t3a          # t3a unlimited mode CPU burst
$ 0.96  ( 23.57 hrs) EU-BoxUsage:t3a.medium     # base instance cost
$ 0.23  (  2.67 GB)  EU-EBS:VolumeUsage.gp3     # 80GB gp3 storage
$ 0.12  ( 24.00 hrs) EU-PublicIPv4:InUseAddress  # public IPv4

Data transfer out was 97% of the non-tax compute cost. The EC2 instance itself costs ~$1/day. The bots were costing ~$40/day.

Fix Applied

# 2026-04-02 06:24:09 UTC
sudo sed -i 's/file_server browse/file_server/' /etc/caddy/Caddyfile
sudo systemctl reload caddy

Removing browse disables directory listing. Caddy still serves static files but returns 404 for directories without an index.html, eliminating the crawler attack surface.

Caddy Config (Current — Post-Fix)

# /etc/caddy/Caddyfile (as of 2026-04-04)

# Claude Remote Control — smart redirect via Python service
claude.liztem.com {
    reverse_proxy 127.0.0.1:8787
}

# Prototype hosting
dev.liztem.com {
    root * /home/ubuntu/projects/personal-os/sites
    file_server                    # no more 'browse'
}

# MW in Sevilla
mwinsevilla.dev.liztem.com {
    root * /home/ubuntu/projects/personal-os/sites/mwinsevilla
    file_server                    # no more 'browse'
}

# Incident log
log.dev.liztem.com {
    root * /home/ubuntu/projects/personal-os/sites/log
    file_server
}

Claude Remote Wrapper Script

The claude-remote.service runs this wrapper, which auto-restarts and handles auth refresh:

# scripts/claude-remote-wrapper.sh (summary)
#!/bin/bash
MAX_RETRIES=3
RETRY_COUNT=0

while true; do
  cd ~/projects/personal-os
  OUTPUT=$(claude remote-control --name "EC2 Dev" 2>&1)
  EXIT_CODE=$?

  # On auth failure: sync credentials from SSM, retry up to 3x
  if echo "$OUTPUT" | grep -qi "Authentication failed\|token has expired\|401"; then
    RETRY_COUNT=$((RETRY_COUNT + 1))
    if [ $RETRY_COUNT -gt $MAX_RETRIES ]; then
      RETRY_COUNT=0; sleep 1800; continue  # wait 30min
    fi
    /home/ubuntu/projects/personal-os/scripts/sync-claude-creds.sh
    sleep 5; continue
  fi

  # Non-auth failure: restart in 10s
  RETRY_COUNT=0
  sleep 10
done

The wrapper was not the cause of the data transfer. At time of diagnosis, the claude-remote.service had been running since March 28 (6 days) and consumed only 32 minutes of CPU time total, with a single HTTPS connection to the Claude API.

Claude Remote Service Status (at diagnosis)

$ systemctl status claude-remote
● claude-remote.service - Claude Code Remote Control
     Active: active (running) since Sat 2026-03-28 16:29:10 UTC; 6 days ago
   Main PID: 4118609 (claude-remote-w)
      Tasks: 13 (limit: 4381)
     Memory: 470.6M (peak: 471.1M)
        CPU: 32min 45.984s
     CGroup:
       ├─4118609 /bin/bash scripts/claude-remote-wrapper.sh
       ├─4118619 /bin/bash scripts/claude-remote-wrapper.sh
       └─4118620 node /usr/bin/claude remote-control --name "EC2 Dev"

Lessons & Recommendations

1. Request an AWS Billing Adjustment

Action required. AWS has a history of granting credits for unexpected data transfer costs caused by bot abuse, especially for personal/dev accounts.

Steps to request a refund or credit:

Go to AWS Support Center → Create case → "Account and billing" → "Billing"
Select service: Elastic Compute Cloud (EC2), category: Charges - Data Transfer
Subject: "Unexpected data transfer charges from bot/crawler abuse"
In the description, include:
- Instance ID: i-0a10d***390d
- Period: March 19 – April 2, 2026
- Root cause: misconfigured web server directory listing scraped by bots
- Fix applied: directory listing disabled on April 2
- Total unexpected charges: ~$549 in data transfer (March: ~$475, April: ~$41)
- Normal monthly bill: ~$38.50
- This is a personal dev account, not production traffic
Attach this incident log URL or the cost breakdown as supporting evidence
Request a one-time credit for the anomalous data transfer charges

AWS is more likely to grant credits when: (a) the issue is resolved, (b) it's clearly anomalous bot traffic, (c) it's a first-time occurrence, and (d) the account holder is responsive to the issue. Typical response time: 24–72 hours. Credits of $200–500 are common for cases like this.

2. Never Use `file_server browse` on Public Servers

Directory listing on a public-facing server is an invitation for crawlers. Bots scan for open indexes continuously and will discover them within hours. If directory browsing is needed, restrict it by IP or put it behind authentication.

3. Set Up a CloudWatch Billing Alarm

Create a CloudWatch alarm at $30/month (or use AWS Budgets with daily anomaly detection). The current weekly review cadence was too slow to catch a $40/day bleed. An alarm would have caught this within 24 hours and saved ~$500.

# Example: AWS CLI to create a budget with email alert
aws budgets create-budget \
  --account-id [ACCOUNT_ID] \
  --budget '{
    "BudgetName": "Monthly-50USD",
    "BudgetLimit": {"Amount": "50", "Unit": "USD"},
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST"
  }' \
  --notifications-with-subscribers '[{
    "Notification": {
      "NotificationType": "ACTUAL",
      "ComparisonOperator": "GREATER_THAN",
      "Threshold": 60,
      "ThresholdType": "PERCENTAGE"
    },
    "Subscribers": [{
      "SubscriptionType": "EMAIL",
      "Address": "your-email@example.com"
    }]
  }]'

4. Add Rate Limiting to Caddy

Even without browse, add rate limiting to cap outbound damage from future incidents:

# Example with caddy-ratelimit module
dev.liztem.com {
    rate_limit {remote.host} 10r/s
    root * /home/ubuntu/projects/personal-os/sites
    file_server
}

5. Monitor Outbound Traffic

Add a simple cron check on /proc/net/dev or sar deltas to alert on sustained high outbound transfer:

# Example: alert if ens5 TX exceeds 1 GB in 10 minutes
#!/bin/bash
TX1=$(awk '/ens5/{print $10}' /proc/net/dev)
sleep 600
TX2=$(awk '/ens5/{print $10}' /proc/net/dev)
DELTA_MB=$(( (TX2 - TX1) / 1024 / 1024 ))
if [ $DELTA_MB -gt 1024 ]; then
  echo "ALERT: ${DELTA_MB}MB outbound in 10 min"
fi

Key Log Excerpts

Caddy Startup (March 19) — HTTP only, no TLS

Mar 19 08:11:23 caddy: "server is listening only on the HTTP port,
  so no automatic HTTPS will be applied to this server"
Mar 19 08:11:23 caddy: "server running" name="srv0" protocols=["h1"]

Claude Remote Crash Loop (March 19)

Mar 19 07:44:13 claude: Error: Workspace not trusted.
  Please run `claude` in /home/ubuntu/projects/personal-os first
Mar 19 07:44:25 claude: Error: Workspace not trusted.   # restart #2
Mar 19 07:44:36 claude: Error: Workspace not trusted.   # restart #3
Mar 19 07:44:48 claude: Error: Workspace not trusted.   # restart #4
Mar 19 07:44:59 claude: Error: Workspace not trusted.   # restart #5

Caddy Full Startup Log (March 19)

Mar 19 08:11:23 caddy: caddy.Version=v2.11.2
Mar 19 08:11:23 caddy: runtime.GOOS=linux runtime.GOARCH=amd64
Mar 19 08:11:23 caddy: runtime.NumCPU=2 runtime.GOMAXPROCS=2
Mar 19 08:11:23 caddy: runtime.Version=go1.26.0
Mar 19 08:11:23 caddy: "using config from file" file="/etc/caddy/Caddyfile"
Mar 19 08:11:23 caddy: "adapted config to JSON" adapter="caddyfile"
Mar 19 08:11:23 caddy: admin endpoint started address="localhost:2019"
Mar 19 08:11:23 caddy: "server is listening only on the HTTP port,
  so no automatic HTTPS will be applied to this server"
Mar 19 08:11:23 caddy: "HTTP/2 skipped because it requires TLS"
Mar 19 08:11:23 caddy: "HTTP/3 skipped because it requires TLS"
Mar 19 08:11:23 caddy: "server running" name="srv0" protocols=["h1"]
Mar 19 08:11:23 caddy: "autosaved config"
Mar 19 08:11:23 caddy: "serving initial configuration"
Mar 19 08:14:42 caddy: "shutting down apps, then terminating" signal="SIGTERM"

Caddy Reload Log (April 2 — the fix)

Apr 02 06:24:13 caddy: "using config from file" file="/etc/caddy/Caddyfile"
Apr 02 06:24:13 caddy: "adapted config to JSON" adapter="caddyfile"
Apr 02 06:24:13 caddy: "Caddyfile input is not formatted;
  run 'caddy fmt --overwrite' to fix inconsistencies"
Apr 02 06:24:13 caddy: admin endpoint started address="localhost:2019"
Apr 02 06:24:13 caddy: "server is listening only on the HTTPS port but has
  no TLS connection policies; adding one to enable TLS"
Apr 02 06:24:13 caddy: "enabling automatic HTTP->HTTPS redirects"
Apr 02 06:24:13 caddy: "enabling HTTP/3 listener" addr=":443"
Apr 02 06:24:13 caddy: "server running" name="srv0" protocols=["h1","h2","h3"]
Apr 02 06:24:13 caddy: "enabling automatic TLS certificate management"
  domains=["claude.liztem.com","dev.liztem.com","mwinsevilla.dev.liztem.com"]
Apr 02 06:24:13 caddy: "load complete"

Fix Session (April 2, 06:01–06:27 UTC)

06:01:30 sshd: Accepted publickey from [mobile-device] (via Tailscale)
06:14:54 sudo: iptables -L -v -n
06:16:16 sudo: timeout 5 tcpdump -i ens5 -nn -q
06:16:28 sudo: timeout 5 tcpdump -i ens5 -nn -c 500
06:16:41 sudo: timeout 10 tcpdump -i ens5 -nn -ttt 'src host 172.31.xx.xxx'
06:18:33 sudo: timeout 10 tcpdump -i ens5 -nn -ttt 'src host 172.31.xx.xxx'
06:24:09 sudo: sed -i 's/file_server browse/file_server/' /etc/caddy/Caddyfile
06:24:13 sudo: systemctl reload caddy
06:24:13 caddy: "load complete"
06:26:59 sudo: npm update -g @anthropic-ai/claude-code

Summary

Detection

Infrastructure

EC2 Instance

Services Running

Listening Ports

Cron Jobs

SSM Parameter Store

Tailscale Network

Normal Baseline Cost

Root Cause

Timeline

Impact: Cost Breakdown

March 2026

Daily Data Transfer Costs (March 19–31)

April 2026 (through April 4)

Diagnostic Investigation (2026-04-04)

1. Initial Alert

2. Cost Breakdown by Service

3. Daily Drill-Down Revealed the Pattern

4. Usage Type Breakdown Identified Data Transfer

5. March Confirmed Sustained Bleeding

6. Ruled Out Other Causes

7. Active Connections at Time of Diagnosis

8. Current Transfer Rate Confirmed Fix

Network Evidence (sar)

April 1: Full Day at Saturation

April 2: Traffic Stops at 06:24 UTC (Full Hourly Log)

April 3: Fully Idle (Incident Over)

April 3: Cost Breakdown Confirms Resolution

Interface Counters at Time of Diagnosis

TCP Socket Summary

Service Topology

Usage Type Breakdown (April 1)

Fix Applied

Caddy Config (Current — Post-Fix)

Claude Remote Wrapper Script

Claude Remote Service Status (at diagnosis)

Lessons & Recommendations

1. Request an AWS Billing Adjustment

2. Never Use file_server browse on Public Servers

3. Set Up a CloudWatch Billing Alarm

4. Add Rate Limiting to Caddy

5. Monitor Outbound Traffic

Key Log Excerpts

Caddy Startup (March 19) — HTTP only, no TLS

Claude Remote Crash Loop (March 19)

Caddy Full Startup Log (March 19)

Caddy Reload Log (April 2 — the fix)

Fix Session (April 2, 06:01–06:27 UTC)

2. Never Use `file_server browse` on Public Servers