Recipe · subsampling

Subsampling noisy signals with a deadband

A vibration sensor sampling at 20 kHz produces 1.7 billion readings a day. Most of those readings carry no new information — the signal is sitting on its noise floor. The trick is keeping the samples that depart from the noise and dropping the ones that don't, without having to pick a threshold by hand for every signal in the plant.

The first reach is usually fixed-rate subsampling: keep every Nth sample, drop the rest. One node, zero tuning, which is exactly why it’s so common. Here’s what that looks like on a test signal with a quiet baseline, three brief pulses, a slow ramp, and a noisier settling regime.


flow('fixed-rate-subsample')
    .passIf('every5', (msg, counter) => (msg.index % 5) === 0)
    .run()

Drag the slider and watch the reconstruction. Fixed-rate keeps the same 20% of samples regardless of what the signal is doing.

Loading fixed-rate subsample baseline...

Fixed-rate is blind to where the events actually are. It drops through the pulses — three events of five samples each are almost entirely thrown away — and it keeps samples in the quiet baseline just as densely as it does in the noisy regime. The reconstruction cannot recover what the gate discarded.

The fix is an adaptive deadband. Run an esStats node to estimate the local noise, then use passIf with a predicate that keeps the sample only when its deviation from the running mean exceeds K times the local stdev. The threshold breathes with the signal — tight when quiet, wide when noisy — and the gate concentrates its budget where the information actually is.


flow('subsample-deadband')
    .sanitize('sane', 'value', { failureReason: 'failReason' },
        { ranges: { value: { min: -50, max: 50 } } })
    .median3('m3', 'value', { median3: 'smoothed' })
    .esStats('stats', 'smoothed',
        { mean: 'mean', stdev: 'stdev' },
        { halfLife: 50 })
    .passIf('keep', (msg, counter) =>
        ((msg.index % 5) === 0) ||
        (Math.abs(msg.value - msg.mean) > 1.5 * msg.stdev))
    .run()

Drag the slider again and compare: the kept-sample dots now cluster on the pulses, on the top of the ramp, and on the larger excursions in the noisy regime. Nothing is wasted on the quiet baseline.

Loading subsample deadband recipe...

What You’re Seeing

Both demos run over the same four-region signal: a quiet baseline, three brief pulses, a slow upward ramp, and a noisier settling regime at a new level. The slate line is the raw signal; the orange line is the linear-interpolation reconstruction between consecutive kept samples — the signal a downstream consumer would see if it only stored the amber-dot points.

In the fixed-rate demo, the kept dots sit at msg.index % 5 === 0 — a regular lattice across the entire signal. Each of the three pulses catches at most one kept sample (and sometimes zero), so the reconstruction misses the pulse heights entirely. In the quiet baseline region the gate keeps 20% of samples even though almost none of them carry information worth storing.

In the adaptive demo, the cyan line is the running mean from esStats and the dashed violet lines are the adaptive deadband at mean ± 1.5σ. The passIf gate combines two conditions — a heartbeat that keeps every fifth sample (the same base rate as fixed-rate), plus the adaptive deadband that keeps any sample whose deviation exceeds K times the local stdev. The deadband is a strict superset of fixed-rate — same lattice in quiet regions, plus event-concentrated extras. The pulses trip the adaptive gate for each excursion; the ramp produces a scattering as the lagging mean catches up; and in the noisy regime the stdev expands to match the new noise floor, so the dots scatter on the larger excursions.

Reading the KPI strip

Each chart shows three metrics, measured only over the skipped points — the ones the algorithm had to reconstruct, not the ones it stored exactly.

Compression — the percentage of samples dropped. Higher means less stored.
Max Error — the worst single-point reconstruction deviation. This occurs at event boundaries — the gap between the last quiet sample and the first event sample, where the reconstruction ramps linearly but the actual signal steps. The deadband improves on fixed-rate here because its adaptive extras provide closer interpolation anchors near events.
RMS Error — the average reconstruction quality across all skipped points. The deadband’s extras pull the reconstruction closer to the signal in event regions, reducing the overall average.

The deadband keeps more samples than fixed-rate (lower compression) but places them where they reduce reconstruction error. The trade-off is deliberate: spend storage budget on event fidelity, not quiet-region redundancy. For signals with sharp structural breaks (step changes, regime shifts), the adaptive compression recipe adds a Kalman predictor and boundary anchoring to handle what the deadband cannot.

Where This Pattern Fits

Domain	What you’re keeping	Why a fixed threshold fails
Vibration monitoring	Excursions above the running noise floor	Bearing wear changes the noise floor over time
Pressure logging	Pressure bumps from valve actions	The baseline drifts with temperature
Power quality	Voltage sags below local nominal	Nominal voltage changes by region and season
Process tags	Setpoint deviations	Recipes change the setpoint daily
Battery monitoring	Step changes during load events	Resting voltage drifts with state of charge

How It Works

The esStats node maintains an exponentially weighted running mean and stdev. With halfLife: 50, each sample’s influence halves every 50 steps — the estimate tracks the most recent few hundred samples, long enough to be stable, short enough to follow real changes in the signal. The first ~50 samples are warmup; the deadband settles after that.

The passIf predicate combines two conditions with OR. The first is a heartbeat — msg.index % 5 === 0 — that keeps every fifth sample, matching the fixed-rate baseline. This makes the deadband a strict superset of fixed-rate: identical coverage in quiet regions, with adaptive extras layered on top during events.

The second is the adaptive deadband — |value - mean| > K * stdev — the gate that keeps any sample whose deviation from the running mean exceeds K times the local stdev. Note the asymmetry: esStats reads from the median-smoothed stream so its stats are noise-protected, but the gate itself tests the raw value — the same signal the chart draws — so the reader’s eye and the algorithm see the same thing. Because the predicate is multiplicative on stdev — not additive on the value itself — the threshold scales with whatever the local noise happens to be, with no absolute tolerance to pick per signal.

K = 1.5 is the starting point used here. Under a Gaussian baseline, samples lying outside ±1.5σ from the mean account for roughly 13% of the population — so the adaptive gate fires on about one in eight samples during quiet operation. Combined with the every-fifth heartbeat, the total kept rate in quiet regions is around 28%. Tighter K (closer to 1.0) keeps more; wider K (closer to 2.0) keeps fewer. K itself is data-dependent: the value here was chosen for the four-region synthetic signal the demo runs on, and other datasets should sweep K on their own representative data before deploying.

The median3 node sits between sanitize and esStats to absorb single-sample spikes. Without it, one bad reading would inflate the running stdev and quietly widen the deadband for many samples afterwards. It’s a standard hygiene step at the head of any pipeline that reads from a noisy sensor.

References

Welford, B.P. (1962). Note on a method for calculating corrected sums of squares and products. Technometrics, 4(3), 419–420. doi:10.1080/00401706.1962.10490022
Bristol, E.H. (1990). Swinging Door Trending: Adaptive Trend Recording? ISA National Conference Proceedings, pp. 749–756.

Next Steps

Trajectory-Aware Adaptive Compression — when reconstruction quality matters: adds a Kalman predictor and boundary anchoring to cut max error by 3–4×
Sudden Shifts — when the goal is detecting the events instead of keeping the samples that mark them
Under the Hood — understand how messages flow through composed nodes