Compressing High-Rate Sensor Data

Accelerometers, current transducers, acoustic emission sensors — industrial equipment generates data faster than any historian can store it. A single sensor sampling at 20 kHz produces 1.7 billion readings per day. Scale that to a hundred machines on a factory floor, and storage becomes the engineering problem — not the sensing.

This is a real bearing failure from the NASA IMS Bearing Dataset — the same run-to-failure experiment used in the bearing health use case. Accelerometers capture one-second vibration snapshots every 10 minutes, each containing 20,480 readings sampled at 20 kHz. A single winkComposer flow decides what to keep and what to discard — adapting its storage rate as the bearing degrades, with no per-sensor tuning required.

The three days leading up to failure produce 432 snapshots. Rendering all of them — each with 20,480 samples — would leave any browser gasping for air. The slider below picks every third snapshot, and the chart shows 120 consecutive samples from it. The compression metrics reflect the full 20,480 samples.

Drag the slider from healthy operation into the failure zone and watch the compression adapt.

Loading compression data...

What You’re Seeing

The slider scrubs through 432 snapshots taken every 10 minutes over three days — from healthy operation into the failure zone. Each chart shows 120 consecutive samples (~6 ms at 20 kHz) from the selected snapshot. The cyan × markers are the actual raw samples. The amber ● markers are the points the compression kept — connected by a faded orange reconstruction line. The gaps between ● markers are the compression: samples the algorithm decided could be reconstructed by interpolation.

The metric cards update for each snapshot:

Compression — what fraction of the original is discarded. During healthy operation, compression exceeds 90%. As degradation begins, the signal grows complex and the ratio drops — the algorithm stores more to preserve the emerging failure signature.
Max Error and RMS Error — reconstruction fidelity in engineering units. The flow keeps RMS error near 0.08 g throughout, even as the signal changes character.

How It Works

One winkComposer flow, nine building blocks — each justified by measured reconstruction fidelity.

A median filter removes impulse noise and provides a one-sample look-back for step bracketing. A Kalman filter predicts the signal’s trajectory — its innovation gate (a chi-squared test) fires when the signal does something the model did not predict, marking step changes. Running statistics estimate the local noise floor, which sets the adaptive deadband threshold: the flow stores a point when the signal deviates from its predicted trajectory by more than the local noise scaled by a single sensitivity parameter. A trend detector classifies slope direction — when the trend reverses, a point is stored at the inflection. An orchestration node resets the predictor after a step change, preventing runaway storage.

The points selected for storage then pass through a second Kalman filter that removes measurement noise from the stored values, improving piecewise-linear reconstruction fidelity.

One sensitivity parameter — the same value for every sensor, every machine, every sampling rate. No per-tag deviation tuning. No engineer choosing thresholds.

Against the Industry Standard

The standard compression algorithm for industrial historians is Swinging Door Trending (Bristol, 1990), deployed in AVEVA PI and similar systems for over 40 years. We compared the winkComposer flow against a reference C# implementation (25 stars, MIT license, 149 commits) of the exact Bristol algorithm, on the same 432 NASA IMS bearing snapshots.

SDT requires a compression deviation (cd) — a fixed tolerance in engineering units. If the signal drifts more than cd from the last stored value, a new point is stored. The engineer must choose cd for each signal class: too small and storage fills up, too large and events are lost. We used cd = 0.15 g, tuned to the healthy bearing’s vibration amplitude.

The winkComposer flow replaces cd with K — a multiplier on the signal’s own running noise estimate. The effective tolerance is K × local stdev, which adapts continuously: tight when the signal is quiet, wide when it is noisy. K = 2 means “store when the deviation exceeds twice the current noise floor.” One value works across every sensor — no per-signal calibration.

All metrics are averaged across the 432 snapshots (20,480 raw samples each):

Metric	Definition	winkComposer	SDT
Avg compression	Percentage of samples discarded	87.6%	82.4%
Avg stored points	Samples kept per 20,480-sample snapshot	2,539	3,610
Avg max error	Largest reconstruction deviation in any single snapshot, averaged across all snapshots	0.416 g	0.150 g
Avg RMS error	Root-mean-square reconstruction error per snapshot, averaged across all snapshots	0.082 g	0.075 g
Tuning	What an engineer must configure per signal	K=2 (universal)	cd=0.15 g (per-signal)

The winkComposer flow stores 30% fewer points with RMS error within 10% of the reference. SDT guarantees a tighter worst-case error bound (max error equals the chosen cd), but that guarantee requires an engineer to select the right cd for each signal class — a tuning burden that scales linearly with the number of monitored sensors.

References

Bristol, E.H. (1990). Swinging Door Trending: Adaptive Trend Recording? ISA National Conference Proceedings, pp. 749–756.
Kalman, R.E. (1960). A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 82(1), 35–45. doi:10.1115/1.3662552
gfoidl/DataCompression — reference SDT implementation (C#, MIT). github.com/gfoidl/DataCompression
NASA IMS Bearing Dataset. Center for Intelligent Maintenance Systems, University of Cincinnati. data.nasa.gov
Dataset: Test 2, Bearing 1 (inner race defect). 984 one-second snapshots at 20,480 Hz over 7 days. 4 channels, channel 1 used. 432 snapshots (days 4.5–7 before failure) included in this visualization. Each snapshot taken every 10 minutes. Chart displays 120 consecutive samples (starting at sample 5000) from each snapshot.