Skip to Content
DocsUse CasesCompressing High-Rate Sensor Data

Compressing High-Rate Sensor Data

Accelerometers, current transducers, acoustic emission sensors — industrial equipment generates data faster than any historian can store it. A single sensor sampling at 20 kHz produces 1.7 billion readings per day. Scale that to a hundred machines on a factory floor, and storage becomes the engineering problem — not the sensing.

This is a real bearing failure from the NASA IMS Bearing Dataset  — the same run-to-failure experiment used in the bearing health use case. Accelerometers capture one-second vibration snapshots every 10 minutes, each containing 20,480 readings sampled at 20 kHz. A single winkComposer flow decides what to keep and what to discard — adapting its storage rate as the bearing degrades, with no per-sensor tuning required.

The three days leading up to failure produce 432 snapshots. Rendering all of them — each with 20,480 samples — would leave any browser gasping for air. The slider below picks every third snapshot, and the chart shows 120 consecutive samples from it. The compression metrics reflect the full 20,480 samples.

Drag the slider from healthy operation into the failure zone and watch the compression adapt.

Loading compression data...

What You’re Seeing

The slider scrubs through 432 snapshots taken every 10 minutes over three days — from healthy operation into the failure zone. Each chart shows 120 consecutive samples (~6 ms at 20 kHz) from the selected snapshot. The cyan × markers are the actual raw samples. The amber ● markers are the points the compression kept — connected by a faded orange reconstruction line. The gaps between ● markers are the compression: samples the algorithm decided could be reconstructed by interpolation.

The metric cards update for each snapshot:

  • Compression — what fraction of the original is discarded. During healthy operation, compression exceeds 90%. As degradation begins, the signal grows complex and the ratio drops — the algorithm stores more to preserve the emerging failure signature.
  • Max Error and RMS Error — reconstruction fidelity in engineering units. The flow keeps RMS error near 0.08 g throughout, even as the signal changes character.

How It Works

One winkComposer flow, nine building blocks — each justified by measured reconstruction fidelity.

median3impulse smoothing + 1-sample look-back→ smoothed vibrationkalman1dpredicts signal trajectory, detects steps→ innovation gate (chi-squared test)esStatsrunning stdev estimates local noise floor→ adaptive deadband thresholdtrendclassifies slope direction→ inflection point detectioncontrollerresets predictor after step changes→ prevents runaway storagewinnowslope-aware deadband + tightening→ significant: has the signal strayed?passIfgates flow on winnow’s verdict→ only significant samples continuekalman1d₂removes measurement noise from stored points→ smoother reconstructionpersistIfwrites selected samples to storage→ QuestDB via ILP

A median filter removes impulse noise and provides a one-sample look-back for step bracketing. A Kalman filter predicts the signal’s trajectory — its innovation gate (a chi-squared test) fires when the signal does something the model did not predict, marking step changes. Running statistics estimate the local noise floor, which sets the adaptive deadband threshold: the flow stores a point when the signal deviates from its predicted trajectory by more than the local noise scaled by a single sensitivity parameter. A trend detector classifies slope direction — when the trend reverses, a point is stored at the inflection. An orchestration node resets the predictor after a step change, preventing runaway storage.

The points selected for storage then pass through a second Kalman filter that removes measurement noise from the stored values, improving piecewise-linear reconstruction fidelity.

One sensitivity parameter — the same value for every sensor, every machine, every sampling rate. No per-tag deviation tuning. No engineer choosing thresholds.


Against the Industry Standard

The standard compression algorithm for industrial historians is Swinging Door Trending (Bristol, 1990), deployed in AVEVA PI and similar systems for over 40 years. We compared the winkComposer flow against a reference C# implementation  (25 stars, MIT license, 149 commits) of the exact Bristol algorithm, on the same 432 NASA IMS bearing snapshots.

SDT requires a compression deviation (cd) — a fixed tolerance in engineering units. If the signal drifts more than cd from the last stored value, a new point is stored. The engineer must choose cd for each signal class: too small and storage fills up, too large and events are lost. We used cd = 0.15 g, tuned to the healthy bearing’s vibration amplitude.

The winkComposer flow replaces cd with K — a multiplier on the signal’s own running noise estimate. The effective tolerance is K × local stdev, which adapts continuously: tight when the signal is quiet, wide when it is noisy. K = 2 means “store when the deviation exceeds twice the current noise floor.” One value works across every sensor — no per-signal calibration.

All metrics are averaged across the 432 snapshots (20,480 raw samples each):

MetricDefinitionwinkComposerSDT
Avg compressionPercentage of samples discarded87.6%82.4%
Avg stored pointsSamples kept per 20,480-sample snapshot2,5393,610
Avg max errorLargest reconstruction deviation in any single snapshot, averaged across all snapshots0.416 g0.150 g
Avg RMS errorRoot-mean-square reconstruction error per snapshot, averaged across all snapshots0.082 g0.075 g
TuningWhat an engineer must configure per signalK=2 (universal)cd=0.15 g (per-signal)

The winkComposer flow stores 30% fewer points with RMS error within 10% of the reference. SDT guarantees a tighter worst-case error bound (max error equals the chosen cd), but that guarantee requires an engineer to select the right cd for each signal class — a tuning burden that scales linearly with the number of monitored sensors.


References

  • Bristol, E.H. (1990). Swinging Door Trending: Adaptive Trend Recording? ISA National Conference Proceedings, pp. 749–756.
  • Kalman, R.E. (1960). A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 82(1), 35–45. doi:10.1115/1.3662552 
  • gfoidl/DataCompression — reference SDT implementation (C#, MIT). github.com/gfoidl/DataCompression 
  • NASA IMS Bearing Dataset. Center for Intelligent Maintenance Systems, University of Cincinnati. data.nasa.gov 
  • Dataset: Test 2, Bearing 1 (inner race defect). 984 one-second snapshots at 20,480 Hz over 7 days. 4 channels, channel 1 used. 432 snapshots (days 4.5–7 before failure) included in this visualization. Each snapshot taken every 10 minutes. Chart displays 120 consecutive samples (starting at sample 5000) from each snapshot.
Last updated on