Strategic Logging and ObservabilityLesson 4.5

How to write a log query to diagnose a bug from log output

log querying, filtering logs, grep patterns, JSON log parsing, correlation ID filtering, log analysis workflow

Logs Are Only Useful If You Can Query Them

Structured logs in JSON format are queryable with standard tools. Given a user report of a bug, you should be able to open your logs, filter to that user request, and read the exact execution path in under two minutes.

Querying with jq and grep

# Filter all errors from a JSON log file
cat app.log | jq 'select(.level == "error")'

# Find all log entries for a specific correlation ID
grep '"correlationId":"abc-123"' app.log

# Find slow operations (duration over 1000ms)
cat app.log | jq 'select(.duration_ms > 1000)'

# Get the sequence of events for one user request
cat app.log | jq 'select(.correlationId == "abc-123") | {time: .time, msg: .msg}'

# Count errors by message to find most common failures
cat app.log | jq -r 'select(.level == "error") | .msg' | sort | uniq -c | sort -rn

The Diagnosis Workflow

Start with the error entry. Get its correlation ID. Query all entries with that ID sorted by timestamp. Read them in sequence - this is the full execution trace for that request. The log entry immediately before the error entry is almost always the last successful operation, which tells you where propagation began. Then check what input was present at that moment using the context fields you logged.

Ready to practice?

MCQs · Coding challenges · Mini project