Subsampling Noisy Signals with a Deadband
A vibration sensor sampling at 20 kHz produces 1.7 billion readings a day. Most of those readings carry no new information — the signal is sitting on its noise floor. The trick is keeping the samples that depart from the noise and dropping the ones that don’t, without having to pick a threshold by hand for every signal in the plant.
The first reach is usually fixed-rate subsampling: keep every Nth sample, drop the rest. One node, zero tuning, which is exactly why it’s so common. Here’s what that looks like on a test signal with a quiet baseline, three brief pulses, a slow ramp, and a noisier settling regime.
flow('fixed-rate-subsample')
.passIf('every5', (msg, counter) => (msg.index % 5) === 0)
.run()Drag the slider and watch the reconstruction. Fixed-rate keeps the same 20% of samples regardless of what the signal is doing.
Fixed-rate is blind to where the events actually are. It drops through the pulses — three events of five samples each are almost entirely thrown away — and it keeps samples in the quiet baseline just as densely as it does in the noisy regime. The reconstruction cannot recover what the gate discarded.
The fix is an adaptive deadband. Run an esStats node to estimate
the local noise, then use passIf with a predicate that keeps the
sample only when its deviation from the running mean exceeds K times
the local stdev. The threshold breathes with the signal — tight when
quiet, wide when noisy — and the gate concentrates its budget where
the information actually is.
flow('subsample-deadband')
.sanitize('sane', 'value', { failureReason: 'failReason' },
{ ranges: { value: { min: -50, max: 50 } } })
.median3('m3', 'value', { median3: 'smoothed' })
.esStats('stats', 'smoothed',
{ mean: 'mean', stdev: 'stdev' },
{ halfLife: 50 })
.passIf('keep', (msg, counter) =>
((msg.index % 10) === 0) ||
(Math.abs(msg.value - msg.mean) > 1.25 * msg.stdev))
.run()Drag the slider again and compare: the kept-sample dots now cluster on the pulses, on the top of the ramp, and on the larger excursions in the noisy regime. Nothing is wasted on the quiet baseline.
What You’re Seeing
Both demos run over the same four-region signal: a quiet baseline, three brief pulses, a slow upward ramp, and a noisier settling regime at a new level. The slate line is the raw signal; the orange line is the linear-interpolation reconstruction between consecutive kept samples — the signal a downstream consumer would see if it only stored the amber-dot points.
In the fixed-rate demo, the kept dots sit at msg.index % 5 === 0
— a regular lattice across the entire signal. Each of the three pulses
catches at most one kept sample (and sometimes zero), so the
reconstruction misses the pulse heights entirely. In the quiet baseline
region the gate keeps 20% of samples even though almost none of them
carry information worth storing.
In the adaptive demo, the cyan line is the running mean from
esStats and the dashed violet lines are the adaptive deadband at
mean ± 1.25σ. The passIf gate combines two conditions — a heartbeat
that keeps every tenth sample regardless of the signal, plus the
adaptive deadband that keeps any sample whose deviation exceeds K
times the local stdev. In the quiet baseline region only the
heartbeat fires — a sparse lattice of dots that guarantees a minimum
sampling rate. The pulses trip the adaptive gate for each excursion;
the ramp produces a scattering as the lagging mean catches up; and in
the noisy regime the stdev expands to match the new noise floor, so
the dots scatter on the larger excursions at a roughly constant rate.
The reconstruction tracks the raw signal closely in every region —
the subsampling budget has moved to where the information is.
The KPI card under each chart shows the compression ratio for the current slider position — how many of the samples seen so far were dropped.
Where This Pattern Fits
| Domain | What you’re keeping | Why a fixed threshold fails |
|---|---|---|
| Vibration monitoring | Excursions above the running noise floor | Bearing wear changes the noise floor over time |
| Pressure logging | Pressure bumps from valve actions | The baseline drifts with temperature |
| Power quality | Voltage sags below local nominal | Nominal voltage changes by region and season |
| Process tags | Setpoint deviations | Recipes change the setpoint daily |
| Battery monitoring | Step changes during load events | Resting voltage drifts with state of charge |
How It Works
The esStats node maintains an exponentially weighted running mean
and stdev. With halfLife: 50, each sample’s influence halves every
50 steps — the estimate tracks the most recent few hundred samples,
long enough to be stable, short enough to follow real changes in the
signal. The first ~50 samples are warmup; the deadband settles
after that.
The passIf predicate combines two conditions with OR. The first is
a heartbeat — msg.index % 10 === 0 — that keeps every tenth
sample regardless of whether the adaptive gate has fired. This
guarantees a minimum sampling rate during long quiet periods, where
downstream consumers would otherwise see nothing, and as a side
benefit it produces visible kept dots during the first 50 samples
while esStats is still warming up.
The second is the adaptive deadband — |value - mean| > K * stdev
— the gate that keeps any sample whose deviation from the running mean
exceeds K times the local stdev. Note the asymmetry: esStats reads
from the median-smoothed stream so its stats are noise-protected, but
the gate itself tests the raw value — the same signal the chart
draws — so the reader’s eye and the algorithm see the same thing.
Because the predicate is multiplicative on stdev — not additive on
the value itself — the threshold scales with whatever the local
noise happens to be, with no absolute tolerance to pick per signal.
K = 1.25 is the starting point used here. Under a Gaussian baseline, samples lying outside ±1.25σ from the mean account for roughly 21% of the population — so the deadband drops about 79% of samples during quiet operation. Tighter K (closer to 1.0) keeps more; wider K (closer to 2.0) keeps fewer. K itself is data-dependent: the value here was chosen for the four-region synthetic signal the demo runs on, and other datasets should sweep K on their own representative data before deploying.
The median3 node sits between sanitize and esStats to absorb
single-sample spikes. Without it, one bad reading would inflate the
running stdev and quietly widen the deadband for many samples
afterwards. It’s a standard hygiene step at the head of any pipeline
that reads from a noisy sensor.
References
- Welford, B.P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3), 419–420. doi:10.1080/00401706.1962.10490022
- Bristol, E.H. (1990). Swinging Door Trending: Adaptive Trend Recording? ISA National Conference Proceedings, pp. 749–756.
Next Steps
- Trajectory-Aware Adaptive Compression — the eight-node version that adds Kalman prediction, trend awareness, and a controller-driven gate
- Detecting Sudden Shifts — when the goal is detecting the events instead of keeping the samples that mark them
- Under the Hood — understand how messages flow through composed nodes