Skip to main content

Production Deployment

Pointing Alloy at the live Railway app and verifying production monitoring.

The Workflow

1. Merge monitoring branch → main via PR
2. Railway auto-deploys updated code
3. Verify /metrics endpoint is live on Railway
4. Update Alloy config to scrape Railway URL
5. Confirm data flows into Grafana dashboard

Step 1 — Merge Via PR

We used a feature branch monitoring/grafana-setup and merged via Pull Request. This:

  • Keeps main clean until monitoring is ready
  • Documents what changed and why
  • Shows the professional git workflow

Step 2 — Verify Production /metrics

curl https://finpay-api-production.up.railway.app/metrics | head -10

Expected output:

# HELP process_cpu_user_seconds_total Total user CPU time spent in seconds.
# TYPE process_cpu_user_seconds_total counter
process_cpu_user_seconds_total 0.812398
...

Step 3 — Update Alloy Config for Production

monitoring/alloy/config.alloy
prometheus.scrape "finpay_api" {
targets = [
{ __address__ = "finpay-api-production.up.railway.app" },
]
metrics_path = "/metrics"
scheme = "https"
scrape_interval = "15s"
forward_to = [prometheus.remote_write.grafana_cloud.receiver]
}

Two changes from local config:

  • __address__ → Railway production URL
  • scheme = "https" → required for HTTPS endpoints

Troubleshooting: Redis Connection Failure

After deployment we saw this in Railway logs:

Redis error read ECONNRESET
Redis error Reached the max retries per request limit (which is 3)

Root Cause

The code checked for UPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN to use the Upstash REST client. In Railway, the variable was named UPSTASH_REDIS_REST_URL but contained a rediss:// TCP URL instead of the https:// REST URL.

// Code expected this branch to activate:
if (process.env.UPSTASH_REDIS_REST_URL && process.env.UPSTASH_REDIS_REST_TOKEN) {
const { Redis } = require('@upstash/redis');
// ...
}
// But fell into this branch instead:
} else {
const IORedis = require('ioredis');
// Tried to connect via TCP to an HTTPS URL — failed
}

Fix

VariableCorrect Value
UPSTASH_REDIS_REST_URLhttps://correct-cougar-XXXXX.upstash.io
UPSTASH_REDIS_REST_TOKENThe token from Upstash REST tab
Challenge Faced

The Railway logs showed Redis connection errors but didn't tell us why the code chose the wrong branch. Reading the actual src/config/redis.js file was the only way to understand the conditional logic. Logs tell you what happened — code tells you why. Always read both.

Verify Alloy is Scraping Production

curl http://localhost:12345/metrics | grep "conn_established"

Expected:

net_conntrack_dialer_conn_established_total{dialer_name="prometheus.scrape.finpay_api"} 2
net_conntrack_dialer_conn_established_total{dialer_name="remote_storage_write_client"} 71
  • prometheus.scrape.finpay_api → Alloy connected to Railway ✅
  • remote_storage_write_client → Alloy connected to Grafana Cloud ✅

Production Dashboard Results

PanelValueStatus
CPU Usage0.3% at idle✅ Healthy
Memory96–104 MiB✅ Normal warm-up
HTTP Request Rate0.07 req/s✅ Railway health checks
HTTP Error Rate0 req/s✅ Zero errors
Response Time P958ms✅ Excellent
Event Loop Lag2–3ms✅ Healthy

Keep Alloy Running

Alloy stops when the terminal closes. Run it as a system service:

sudo systemctl enable alloy
sudo systemctl start alloy
sudo systemctl status alloy
✦ Test Your Knowledge

1.What is the difference between the Upstash REST URL and the TCP URL?

AThey are interchangeable
BREST URL starts with https:// and uses HTTP; TCP URL starts with rediss:// and uses native Redis protocol
CTCP is faster but less secure
DREST URL requires a VPN

2.Why did the app fall into the ioredis branch instead of the Upstash REST branch?

Aioredis is the default in production
BUPSTASH_REDIS_REST_URL and UPSTASH_REDIS_REST_TOKEN were not both correctly set
CThe @upstash/redis package was not installed
DRailway does not support Upstash

3.When scraping a production HTTPS endpoint with Alloy, what additional config is required?

AA VPN tunnel
BA firewall rule
Cscheme = 'https' must be set in the scrape config
DThe port must be changed to 443

4.What does a P95 response time of 8ms on production indicate?

AThe app is slow
B95% of requests complete in under 8ms — excellent performance for a fintech API
COnly 8% of requests are succeeding
DThe monitoring is not working correctly