Time-Series Data for Business Intelligence Analysis: A Primer
What is Time-Series Data?
Time-series data is a sequence of data points collected over time intervals. Examples include vibration data from a rotating machine, status data from an instrument, and log data from an industrial robot. Time-series data has both operational and business value. On the operational side, time-series data can help to monitor status, detect anomalies, and predict failures. On the business side, time-series data can help to analyze customer (or user) behavior, determine component reliability, and improve overall performance.
What are the challenges of extracting business intelligence from Time-Series Data?
Extracting operational insights from time-series data is well understood. This process requires real-time analysis by applying physical or AI-based models to one data point or a series of data points at a time. Insights are produced as data streams in (or near) real-time.
However, extracting business insights from time-series data is far more challenging. This process requires batch analysis by applying data mining techniques across a stretch of time, such as a day, a quarter, or a year. For example, a fundamental need for Business Intelligence (BI) is counting the number of occurrences of a particular condition over a particular time interval under a set of constraints. However, this can be difficult to do for time-series data. Imagine that the time-series data is vibration data that comes in every 10-seconds – what can you count in this case? Counting the vibration data that falls within a range offers only very limited information.
Therefore, time-series data needs to be prepared before BI analysis can be conducted. The preparation process involves converting time-series data into events.
What are Events in Time-Series Data?
Events are the fundamental insight elements inside time-series data. An event is typically not contained in a single data point, but it needs to be mined across a time interval. For example, for analytical instrumentation, an event may be a lab test, which can have a time span of 45 minutes and can contain many time-series data points during this interval.
An event is typically a complex data type that includes many properties. For example, for a test event, it can contain the test status (success, failure, abort, rework), the test time (45-minutes), the operator (JR), the error codes ([72, 33]), the test results (temperature, pressure, charge volume), and etc.
Events are derived from analyses that use physics-based or AI-based models. Since the quality of the insights entirely depends on the quality of the events derived, event generation is the critical process, and it can involve data cleansing, validation, transformation, contextualization, fusion, analytics, domain knowledge integration, and many other steps.
Events are dynamic. As more insights are discovered from events data, new events or new properties for existing events are derived, which feed into the discovery of new insights. Because of this, it is important to build a robust and agile data engine to perform events generation. The data engine needs to support processing of massive amount of time-series data, and it also needs to be flexible enough to add new event properties and definitions on new and existing events quickly.
Events are fundamental. While an event can be complex, it should be a self-contained, single incident. Any analysis that requires looking across events should be left to the data layer above, which is typically the BI layer. This yields the maximum flexibility in insight analysis.
By generating events, the user has effectively converted time-series data into transactional data, which is the data type easily digested by modern data pipeline, data warehouse, and BI software. Therefore, events generation is a power process to derive insights from time-series data and make time-series data compatible with the Modern Data Stack.