Recipe · subsampling
Subsampling noisy signals with a deadband
A vibration sensor sampling at 20 kHz produces 1.7 billion readings a day. Most of those readings carry no new information — the signal is sitting on its noise floor. The trick is keeping the samples that depart from the noise and dropping the ones that don't, without having to pick a threshold by hand for every signal in the plant.
The first reach is usually fixed-rate subsampling: keep every Nth sample, drop the rest. One node, zero tuning, which is exactly why it’s so common. Here’s what that looks like on a test signal with a quiet baseline, three brief pulses, a slow ramp, and a noisier settling regime.
flow('fixed-rate-subsample')
.passIf('every5', (msg, counter) => (msg.index % 5) === 0)
.run()Drag the slider and watch the reconstruction. Fixed-rate keeps the same 20% of samples regardless of what the signal is doing.
Fixed-rate is blind to where the events actually are. It drops through the pulses — three events of five samples each are almost entirely thrown away — and it keeps samples in the quiet baseline just as densely as it does in the noisy regime. The reconstruction cannot recover what the gate discarded.
The fix is an adaptive deadband. Run an esStats node to estimate
the local noise, then use passIf with a predicate that keeps the
sample only when its deviation from the running mean exceeds K times
the local stdev. The threshold breathes with the signal — tight when
quiet, wide when noisy — and the gate concentrates its budget where
the information actually is.
flow('subsample-deadband')
.sanitize('sane', 'value', { failureReason: 'failReason' },
{ ranges: { value: { min: -50, max: 50 } } })
.median3('m3', 'value', { median3: 'smoothed' })
.esStats('stats', 'smoothed',
{ mean: 'mean', stdev: 'stdev' },
{ halfLife: 50 })
.passIf('keep', (msg, counter) =>
((msg.index % 5) === 0) ||
(Math.abs(msg.value - msg.mean) > 1.5 * msg.stdev))
.run()Drag the slider again and compare: the kept-sample dots now cluster on the pulses, on the top of the ramp, and on the larger excursions in the noisy regime. Nothing is wasted on the quiet baseline.
What You’re Seeing
Both demos run over the same four-region signal: a quiet baseline, three brief pulses, a slow upward ramp, and a noisier settling regime at a new level. The slate line is the raw signal; the orange line is the linear-interpolation reconstruction between consecutive kept samples — the signal a downstream consumer would see if it only stored the amber-dot points.
In the fixed-rate demo, the kept dots sit at msg.index % 5 === 0
— a regular lattice across the entire signal. Each of the three pulses
catches at most one kept sample (and sometimes zero), so the
reconstruction misses the pulse heights entirely. In the quiet baseline
region the gate keeps 20% of samples even though almost none of them
carry information worth storing.
In the adaptive demo, the cyan line is the running mean from
esStats and the dashed violet lines are the adaptive deadband at
mean ± 1.5σ. The passIf gate combines two conditions — a heartbeat
that keeps every fifth sample (the same base rate as fixed-rate), plus
the adaptive deadband that keeps any sample whose deviation exceeds K
times the local stdev. The deadband is a strict superset of
fixed-rate — same lattice in quiet regions, plus event-concentrated
extras. The pulses trip the adaptive gate for each excursion; the ramp
produces a scattering as the lagging mean catches up; and in the noisy
regime the stdev expands to match the new noise floor, so the dots
scatter on the larger excursions.
Reading the KPI strip
Each chart shows three metrics, measured only over the skipped points — the ones the algorithm had to reconstruct, not the ones it stored exactly.
- Compression — the percentage of samples dropped. Higher means less stored.
- Max Error — the worst single-point reconstruction deviation. This occurs at event boundaries — the gap between the last quiet sample and the first event sample, where the reconstruction ramps linearly but the actual signal steps. The deadband improves on fixed-rate here because its adaptive extras provide closer interpolation anchors near events.
- RMS Error — the average reconstruction quality across all skipped points. The deadband’s extras pull the reconstruction closer to the signal in event regions, reducing the overall average.
The deadband keeps more samples than fixed-rate (lower compression) but places them where they reduce reconstruction error. The trade-off is deliberate: spend storage budget on event fidelity, not quiet-region redundancy. For signals with sharp structural breaks (step changes, regime shifts), the adaptive compression recipe adds a Kalman predictor and boundary anchoring to handle what the deadband cannot.
Where This Pattern Fits
| Domain | What you’re keeping | Why a fixed threshold fails |
|---|---|---|
| Vibration monitoring | Excursions above the running noise floor | Bearing wear changes the noise floor over time |
| Pressure logging | Pressure bumps from valve actions | The baseline drifts with temperature |
| Power quality | Voltage sags below local nominal | Nominal voltage changes by region and season |
| Process tags | Setpoint deviations | Recipes change the setpoint daily |
| Battery monitoring | Step changes during load events | Resting voltage drifts with state of charge |
How It Works
The esStats node maintains an exponentially weighted running mean
and stdev. With halfLife: 50, each sample’s influence halves every
50 steps — the estimate tracks the most recent few hundred samples,
long enough to be stable, short enough to follow real changes in the
signal. The first ~50 samples are warmup; the deadband settles
after that.
The passIf predicate combines two conditions with OR. The first is
a heartbeat — msg.index % 5 === 0 — that keeps every fifth
sample, matching the fixed-rate baseline. This makes the deadband a
strict superset of fixed-rate: identical coverage in quiet regions,
with adaptive extras layered on top during events.
The second is the adaptive deadband — |value - mean| > K * stdev
— the gate that keeps any sample whose deviation from the running mean
exceeds K times the local stdev. Note the asymmetry: esStats reads
from the median-smoothed stream so its stats are noise-protected, but
the gate itself tests the raw value — the same signal the chart
draws — so the reader’s eye and the algorithm see the same thing.
Because the predicate is multiplicative on stdev — not additive on
the value itself — the threshold scales with whatever the local
noise happens to be, with no absolute tolerance to pick per signal.
K = 1.5 is the starting point used here. Under a Gaussian baseline, samples lying outside ±1.5σ from the mean account for roughly 13% of the population — so the adaptive gate fires on about one in eight samples during quiet operation. Combined with the every-fifth heartbeat, the total kept rate in quiet regions is around 28%. Tighter K (closer to 1.0) keeps more; wider K (closer to 2.0) keeps fewer. K itself is data-dependent: the value here was chosen for the four-region synthetic signal the demo runs on, and other datasets should sweep K on their own representative data before deploying.
The median3 node sits between sanitize and esStats to absorb
single-sample spikes. Without it, one bad reading would inflate the
running stdev and quietly widen the deadband for many samples
afterwards. It’s a standard hygiene step at the head of any pipeline
that reads from a noisy sensor.
References
- Welford, B.P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3), 419–420. doi:10.1080/00401706.1962.10490022
- Bristol, E.H. (1990). Swinging Door Trending: Adaptive Trend Recording? ISA National Conference Proceedings, pp. 749–756.
Next Steps
- Trajectory-Aware Adaptive Compression — when reconstruction quality matters: adds a Kalman predictor and boundary anchoring to cut max error by 3–4×
- Sudden Shifts — when the goal is detecting the events instead of keeping the samples that mark them
- Under the Hood — understand how messages flow through composed nodes