Skip to Content
DocsUse CasesDetecting Server Latency Degradation

Detecting Server Latency Degradation

A server’s average response time barely changes — 44.7ms on Day 1, 45.1ms on Day 14. On a dashboard, the line looks flat. But beneath that flat line, the system is destabilizing. The noise envelope is expanding, tail latencies are growing, and the distribution is shifting from predictable to erratic.

This is real data from the Numenta Anomaly Benchmark  — a server in Amazon’s East Coast datacenter that ended in complete system failure from a documented AWS API outage. A winkComposer flow extracts 4-hour statistical fingerprints from raw latency and fuses them into a health assessment that flags structural degradation 4 days before the first visible anomaly.

Drag the slider and watch it detect what the raw chart hides.

Loading server health demo...

What You’re Seeing

The top chart shows raw request latency (cyan) tracked by an exponentially smoothed mean (lavender dashed). The filled envelope is the adaptive floor-to-ceiling range — tight when the server is stable, widening as latency behaviour becomes erratic.

Below it, the stddev sparkline shows the standard deviation computed over non-overlapping 4-hour windows. This is where the hidden structure emerges: the mean barely moved (0.3ms over 14 days), but the window-to-window standard deviation climbed 21% from the first quarter to the third — invisible in the raw signal, clear in the sparkline.

The vertical marker on the sparkline shows where a Page-Hinkley  change-point detector fires — the noise floor has shifted to a new regime. With moderate sensitivity, this fires at Day 4.2, roughly 4 days before the first labeled anomaly at Day 8.2.

Three labeled anomalies appear in the dataset: a transient glitch (Day 8.2), a burst overload (Day 12.8), and system failure (Day 15.0). The assessment card tracks the escalation from Normal through Monitor and Degraded as the statistical fingerprints deteriorate.

The key drivers table shows each evidence source’s intensity and persistence. The leading signal is volatility (window standard deviation), not the mean — exactly the kind of structural change that threshold-based alerting on the mean would miss entirely.


How It Works

One flow, five building blocks — each extracts a different perspective on server health:

twStatsextracts 4-hour statistical fingerprints from raw latency→ exact stddev, range, kurtosis, cv per windowwindowRangecomputes the spread within each window→ max − min: how wide the latency swingsesStatssmooths the raw signal and tracks theadaptive floor–ceiling envelopevolatilityShiftdetects regime shifts in thewindow-to-window noise floorserverHealthfuses 4 evidence sources into→ health state + conviction × persistence

Tumbling window statistics compute exact mean, standard deviation, range, kurtosis, and coefficient of variation over non-overlapping 4-hour windows of raw latency. No exponential decay, no tuning — pure batch statistics at streaming speed. This is where the hidden structure emerges: the standard deviation climbing from 1.63 to 2.02 while the mean barely moves.

Window range computes the spread within each window — the difference between the highest and lowest latency reading. A widening range means tail latencies are growing, even if the average holds steady.

Exponential statistics provide the visual layer — smoothed signal and adaptive envelope for the chart. The floor-to-ceiling envelope tracks recent extremes and makes the raw signal’s behaviour legible.

Change-point detection runs a Page-Hinkley  test on the window standard deviation. When the noise floor shifts to a new level, it fires — the vertical marker on the sparkline. With moderate sensitivity (delta 0.008, lambda 0.8), it detects the first structural shift at Day 4.2 with zero false alarms in the first three days.

The health assessment fuses four evidence sources into a single confidence score:

  • Volatility — window standard deviation exceeding the 1.63ms baseline (the dominant signal, weighted 1.0)
  • Tail latency — window range exceeding the 7.6ms baseline (how wide the extremes swing)
  • Distribution shape — excess kurtosis rising above zero (outlier frequency increasing; weighted 0.5 because it is impulsive, not progressive)
  • Consistency — coefficient of variation exceeding the 3.6% baseline (reliability dropping)

The system learns the server’s baseline automatically during the first few windows. After that, each source accumulates evidence independently. The combined conviction determines the health state (Healthy, Monitor, Degraded, Critical), the recommended action, and shows both the intensity and persistence of each driver.


Why Tumbling Windows?

Exponential smoothing adapts — it tracks the signal and decays old information. That is ideal for real-time alerting on individual samples. But for detecting structural change, you need statistics that do not adapt.

A tumbling window computes the exact standard deviation of those 48 readings — no weighting, no decay. When that number changes from window to window, the change is real. The standard deviation of Window 20 is the true standard deviation of those 4 hours of latency. When it rises from 1.63 to 1.94, a genuine structural shift has occurred — not an artefact of exponential forgetting.


References

  • Ahmad, S., Lavin, A., Purdy, S. & Agha, Z. (2017). Unsupervised real-time anomaly detection for streaming data. Neurocomputing, 262, 134–147. doi:10.1016/j.neucom.2017.04.070 
  • Page, E.S. (1954). Continuous inspection schemes. Biometrika, 41(1/2), 100–115. doi:10.2307/2333009 
  • Dataset: Numenta Anomaly Benchmark (NAB) realKnownCause/ec2_request_latency_system_failure.csv. 4,032 samples at 5-minute intervals, 14 days. Server in Amazon’s East Coast datacenter; documented system failure from AWS API outage.

Next Steps