Novel forecasting tools

Click here for a short explanation

Short explanation of novel forecasting tools

In this playbook Machine Learning methods are considered as novel forecasting tools. Machine learning is an overarching terms for various methods that learn patterns from existing data. These patterns can be used to classify data in various groups, making use of unlabelled data (unsupervisedlearning) or to use the optimal pattern found in a so-called training set to forecast values in a so-called test set, making use of labelled data (supervised learning). Often the training and test sets are randomly chosen in multiple runs in order to come to the best model.

In order to forecast acreages in the hemp-flax case for various European countries we compare a classical parametric panel data model, a mixed-effects (ME) model, and a combined mixed-effects random forest model (MERF). The latter combines classical estimation with a machine learning module.

Advantages of this approach

The strength of machine learning is that it can find patterns in large sets of data, i.e. many variables or large numbers of observations. The training can be done by assigning weights or parameters to variables, or by using decision tree structures, ensembles of trees or so-called random forests.

Other advantages are flexibility in dealing with heterogeneity, volatility and non-linearity.

Disadvantages of this approach

Differences with classical approaches

Classical econometric modeling approaches have a basis in statistics and rely on linear regression techniques. Machine learning models are usually based on drawing many random samples from the given data and confront these samples with various techniques, e.g. tree-based models or networks with feedback mechanisms.

In other words, the statistical basis of classical forecasting tools allows for statistical testing, e.g. on statistical significance of a variable, the model as a whole, or evidence for structural breaks and group differences. Machine learning models do not allow for statistical testing. Since prediction is usually the goal, the quality of ML models is assessed on how well they predict.

Lorem ipsum

Disatvantages of this approach

Most ML methods do not consider data sequentiality as in time series and this has to be imposed explicitly. The auditability is also low, since most ML methods, including Random Forests, often operate as a black box. Moreover, final model structures are purely based on forecasting performance. Included model components are not based on statistical evidence.

11/12/2024 08:51:05