Detecting Sensor Freeze
A flow meter reads 198.47 litres per minute. Then 198.47 again. And again. Every reading is individually valid — within range, within noise tolerance. But the sensor stopped measuring several minutes ago. It is reporting the same stale value because its mechanism is stuck.
Sensor freeze is one of the most insidious data quality failures. It does not trigger range alarms, does not produce errors, does not look obviously wrong. Every downstream decision — mass balance, leak detection, process control — silently corrupts.
This recipe builds a flow that continuously learns what “normal variability” looks like for a signal, and uses that knowledge to detect when variability disappears.
Learns, adapts, preserves.
A long-term esStats node learns the signal’s normal variability. A dynamic threshold adapts from that learned baseline — no hardcoded values, works for any signal. A controller trigger preserves the learned state when a freeze is confirmed — anomalous data cannot corrupt what the system already learned.
flow( 'sensor-freeze-detector' )
.sanitize( 'sane', 'flowRate', { failureReason: 'failReason' },
{ ranges: { flowRate: { min: 0, max: 500 } } } )
.median3( 'm3', 'flowRate', { median3: 'm3' } )
.esStats( 'stats', 'm3',
{ mean: 'mean', stdev: 'sigma' },
{ halfLife: 3 } )
.esStats( 'baseline', 'm3',
{ stdev: 'baselineSigma' },
{ halfLife: 60 } )
.threshold( 'thr', 'sigma', { active: 'sigmaLow' },
{ mode: 'below',
threshold: ( msg ) => msg.baselineSigma * 0.10,
hysteresis: ( msg ) => msg.baselineSigma * 0.12 } )
.persistenceCheck( 'confirm',
( msg ) => msg.sigmaLow === true,
{ persistenceConfirmed: 'confirmedFreeze' },
{ minVotes: 4, outOfTotal: 6,
triggers: [ { control: 'pause', targets: [ 'baseline' ] } ] } )
.run()Drag the slider and watch sigma collapse the moment the sensor stops changing — then watch the adaptive threshold catch it.
What You’re Seeing
The gray line is the raw flow rate — noisy readings around 200 L/min. The cyan line is the exponentially smoothed mean, tracking the underlying signal.
The violet dashed curve on the right axis is sigma — the running standard deviation. During normal operation, sigma fluctuates around 2–3. Between samples 150 and 200, a quiet period reduces the noise, and sigma dips — but stays well above the detection threshold. The detector does not fire.
Two amber dashed lines on the right axis show the adaptive detection boundaries. These are not hardcoded — they are computed as fractions of the long-term baseline sigma (10% for the threshold, 22% for the reset point). Watch them converge during the first 50–60 samples as the baseline stabilises.
At sample 200, the sensor freezes. Sigma decays exponentially toward zero and crosses the lower amber line. The rose shaded region marks where sigma is below threshold. The rose vertical marks the moment the persistence check confirms — 4 of 6 consecutive readings showed sigma below threshold. At confirmation, a controller trigger pauses the baseline to prevent the frozen data from dragging the reference down.
After sample 320, the sensor recovers. Sigma climbs back above the upper amber line (the hysteresis reset point), and the detector clears.
Where This Pattern Fits
| Domain | What freezes | Why it is invisible |
|---|---|---|
| Water treatment | pH sensor stuck on last reading | Readings look normal — pH changes slowly |
| HVAC | Temperature sensor on a frozen output | The building management system trusts the value and stops heating |
| Oil and gas | Flow transmitter electronics lock up | Constant flow looks like stable operation |
| Meteorology | Wind sensor bearings seize | Zero variation reads as calm wind, not a broken anemometer |
| Manufacturing | Pressure transducer membrane stuck | Constant pressure looks like process stability |
How It Works
Two esStats nodes run in parallel on the same median-filtered input. The short-term node (halfLife=3) computes sigma — the fast-responding standard deviation that collapses during a freeze. The long-term baseline (halfLife=60) establishes what sigma normally looks like for this signal. The baseline converges slowly and is deliberately insensitive to short disruptions.
The threshold node uses dynamic options — functions that receive the current message and return a value. The threshold is set to 10% of the baseline sigma; the hysteresis to 12%. With a typical baseline of 2.5, the threshold sits at about 0.25 and the reset point at about 0.55. These values adapt automatically to the signal — no manual calibration.
Why 10%? Any estimate computed from a small number of samples fluctuates. The shorter the half-life, the fewer samples contribute, and the wider sigma swings during normal operation. Statistics gives us a precise floor: with halfLife=3, sigma is effectively computed from about 6 samples, and even in a 1-in-1000 unlucky stretch it will not dip below roughly 33% of its true value. Setting the threshold at 10% — well below that natural floor — makes a false alarm virtually impossible even before the persistence check adds its own safety margin. The multiplier is not a guess; it follows from the half-life.
When the persistence check confirms a freeze (4 of 6), it fires a controller trigger that pauses the baseline esStats node. Pausing stops the update function while keeping the last-known baseline visible downstream. This prevents the frozen data — with its collapsing variance — from dragging the baseline down, which would lower the threshold and delay recovery detection.
During the quiet period in this demo (samples 150–200), sigma drops but stays well above the adaptive threshold — the detector does not fire. This is the specificity test: reduced noise is not the same as no noise.
References
- Welford, B.P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3), 419–420. doi:10.1080/00401706.1962.10490022
- Sharma, A.B., Golubchik, L. & Govindan, R. (2010). Sensor faults: Detection methods and prevalence in real-world datasets. ACM Transactions on Sensor Networks, 6(3), 1–39. doi:10.1145/1754414.1754419
Next Steps
- Detecting Gradual Drift — the complementary recipe for slow, invisible drift using fast/slow esMean crossover
- Detecting Sudden Shifts — abrupt step changes detected by Kalman innovation gating
- Under the Hood — understand what happens inside the pipeline