Scaling WebSocket ServersLesson 6.5

Monitoring WebSocket servers in production

connected client gauge, message rate counter, connection error rate, latency percentiles, Redis memory usage, dead connection detection, Prometheus metrics, alerting thresholds

What to Measure in a WebSocket Server

These are the four metrics every WebSocket server must export:

Connected clients: wss.clients.size as a gauge. Spike = viral event or DDoS. Sudden drop = server crash.
Message rate: Increment a counter per message received and sent. Use a sliding window to get messages/sec.
Error rate: Count per-client errors and connection rejections. Rising error rate without rising connections = client bug or protocol mismatch.
Round-trip latency: Measure ping-to-pong latency in your heartbeat and record as a histogram. p99 above 500ms indicates overload.

Exposing Metrics with Prometheus

const client = require('prom-client');

const connectedClients = new client.Gauge({
  name: 'ws_connected_clients',
  help: 'Number of connected WebSocket clients'
});

const messagesTotal = new client.Counter({
  name: 'ws_messages_total',
  help: 'Total messages received',
  labelNames: ['direction']
});

wss.on('connection', (ws) => {
  connectedClients.inc();
  ws.on('message', () => messagesTotal.inc({ direction: 'in' }));
  ws.on('close',   () => connectedClients.dec());
});

// Expose /metrics for Prometheus scraping
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', client.register.contentType);
  res.end(await client.register.metrics());
});

Ready to practice?

MCQs · Coding challenges · Mini project