An in-depth look at Time Series Analysis: Recognizing the elements of time-series data applicable to statistical models and machine learning.

Emilio Cardenas
4 min readMar 25, 2022

This blog will cover the following:

1. What is time-series data, and how does it differ from other types of data?

2. Time-series data’s constituents.

3. What is the purpose of time series analysis?

4. The most widely used approaches for forecasting time series (statistical and machine learning).

  1. What is time-series data, and how does it differ from other types of data?

A time series is a collection of data points that have been indexed (or listed or graphed) in chronological sequence. A time series is a collection of images taken at evenly spaced intervals over a period of time. Time-series data, in simple terms, is a dataset that records a sample across time and is collected on a regular basis. Commodity prices, stock prices, property prices over time, weather records, firm sales data, and patient health measures such as ECG are just a few examples. In our daily lives, we encounter a lot of time series data. As a result, as a data scientist, the capacity to interpret it is critical. It’s also fun to experiment with.

Patients’ ECG data (Image from the MIMIC-III Waveform Database)

2. Time-series data components

The majority of time-series data may be broken down into three categories: trend, seasonality, and noise.

Trend — The data has been moving in a series for a long time, either upwards or downwards. It could be due to population growth, inflation, environmental change, or technological adoption. The long-term rise in the US stock market during the last ten years, the growth in the real estate market in most areas of the world over the last year, and the longevity of people’s lives are all examples.

Seasonality — The data is domain-specific and connected with calendar-related impacts, whether weekly, monthly or seasonally. For example, most e-commerce sites see an increase in sales around Christmas. In North America, on the other hand, the number of sold houses in the summer would be larger than in the winter because people are hesitant to relocate during the cold.

Noise — is often referred to as leftovers or irregulars. It’s what’s left once trend and seasonality have been eliminated. It’s a non-predictable short-term fluctuation. Sometimes there is a lot of noise.

3. What is the purpose of time series analysis?

Time series analysis has a variety of applications in a variety of industries. It can be used in the following instances as a general rule of thumb:

Predict future values based on historical data, such as the price of a home, a sale, or a stock.

Anomaly detection is the process of identifying outliers or changes in economic, commercial, or health variables. Identifying changepoints when economics is influenced by geopolitical events or irregularities in patients’ vital signs are two examples.

Pattern recognition, signal processing, weather forecasting, earthquake prediction, and so on are all examples of applications.

4. The most often utilized approaches for forecasting time series

Facebook Prophet

Prophet is a process for forecasting time series data based on an additive model in which non-linear trends are fit with yearly, weekly, and daily seasonality, as well as holiday impacts, according to the documentation.

It works best with time series with substantial seasonal influences and historical data from multiple seasons. Prophet is forgiving of missing data and trend shifts, and it usually handles outliers well. This implies Prophet used an additive model to incorporate all of the above-mentioned factors: trends, seasonality, and noise, as well as holiday effects.

Prophet comes with both a Python and an R API. It’s simple to put into practice and produce forecasts.

LSTM (Long short term memory)

The LSTM is a Recurrent Neural Network (RNN) that excels at dealing with sequence data. It’s commonly utilized in speech recognition and machine translation. If you’re familiar with the RNN structure, you’ll notice that LSTM adds three special gates to each of its cells to remember long-term and short-term memories, whereas Vanilla RNN models struggle to remember long-term sequences.

ARIMA

ARIMA stands for Autoregression integrated moving average and is a statistical approach. The dependent relationship between an observation and a number of lagged observations is used in an autoregressive model. To make the time series steady, differencing raw observations (e.g. subtracting an observation from an observation from the preceding time step) is used. The term “moving average” refers to a model that uses the relationship between an observation and a residual error from a moving average model applied to lagged data.

Conclusion — The aim of this blog was to inform the reader of the applicability of Time Series Models, their components and some of the most popular algorithms used.

--

--