Model the data

‘Model the data’ is the main aim of any Data Science project. Now, we can finally put all the work in the previous step from our 8-step Data Science method to good use. You may assume this step is the most time-consuming but, for most projects, this is not the case. That is because Data Science is 90% data preparation and 10% science.

There are many different ways to model the data. The most basic form is reporting. Reporting is a form of diagnostic analytics, where we observe the past and try to explain it. In other words, we evaluate past decisions.

Monitoring is another method, bordering both diagnostic and predictive analytics. It is more in line with predictive analytics, detecting any possible deviations at an early stage to provide decision support. By understanding better what is likely to happen, a company can mitigate the consequences of the forecasted event or profit from them.

Forecasting and predictive models are two other forms of predictive analytics. Predictive models sit between predictive and prescriptive analytics since a predictive model implicitly prescribes what to do. With prescriptive analytics, it is possible to automate your decisions, but this is also a complex method to implement. However, such complex analytics typically provide additional business benefits. As such, the prescriptive model is widely regarded as the ultimate form of advanced analytics.


ONE Variable to analyze the performance of organizations/projects.

TWO Quantitative and measuring an objective or a Critical Success Factor.

THREE Internal and external effects to perform better and to deploy effective means/policy.

FOUR Factors which can be influenced by the organization.

The most rudimentary form of a data model is a key performance indicator (KPI). KPIs are used to analyze the performance of organizations or projects, by measuring an objective or a critical success factor. A KPI measures an internal or external factor that can be influenced by the organization. In other words, a KPI is actionable.


ONE To make objectives measurable.

TWO To make targeted data analyses.

THREE Draw conclusions based on data.

FOUR And used as a basis for making better choices for policy/action/prevention.

KPIs make an objective (or the factors to reach the objective) measurable. They enable organizations to perform root cause analysis, allowing you to understand why a certain target was (or was not) reached. As such, they provide an excellent base to define and track policies and actions.


Informally, an algorithm is any well-defined computational
procedure that takes some value, or set of values, as input and
produces some value, or set of values, as output. An algorithm
is thus a sequence of computational steps that transform the
input into the output.
– Thomas H. Cormen, Chales E. Leiserson (2009), Introcduction to
Algorithms 3rd edition.

Advanced models, including predictive and prescriptive models, make use of algorithms. These algorithms vary between a simple “if… then… else” statement and incredibly complex sequences. But, in all cases, the operation of an algorithm is always the same.

As stated by Thomas H. Cormen and Charles E. Leierson, an algorithm is a sequence of computational steps that transform the input into an output. This also means there is nothing “smart” about algorithms. They just follow certain steps without any real intelligence. But an algorithm can be so complex that its operation is totally incomprehensible to a human being.

An algorithm must satisfy the following conditions. It should:

BE FINITE An algorithm that never ends is useless, as it will never solve the problem.

HAVE WELL-DEFINED INSTRUCTIONS Each step of the algorithm must be precisely defined.

BE EFFECTIVE The algorithm should provide the desired output.


Different techniques that use algorithms. These include:

TRANSFORMING ANALYTICS These techniques are used in step 4 of the 8-step model, namely – prepare, integrate, and explore the data. Examples include data aggregation, enrichment, and processing techniques such as data cleaning, preparation, and separation.

LEARNING ANALYTICS These analytics provide insights into relationships and can also classify objects into groups. Regression, clustering, classification, and recommendation are all examples of learning analytics techniques.

PREDICTIVE ANALYTICS These include simulation or optimization techniques. With simulation, a simplified representation of reality is created. This may be a process or a system, for example. The model used is either predictive or prescriptive in nature. But both try to predict the future using the patterns in the data. Optimization simply refers to operation research techniques.


AI overview

Advanced techniques are described as Artificial Intelligence (AI). AI is often defined as the simulation of human intelligence in machines, which are programmed to operate as humans and mimic their actions.

Within AI, the following techniques exist:

MACHINE LEARNING Machine learning covers a range of statistical techniques giving computers the ability to learn. In other words, they progressively improve their capacity to execute a task. Machine learning can also be split into Deep Learning, unsupervised and supervised learning.

NATURAL LANGUAGE PROCESSING Natural Language Processing (NLP) is an area of AI concerned with the interactions between computers and human (natural) languages. The field of NLP includes text generation (e.g. a machine writing a book), question answering (e.g. chatbots), context extraction (e.g. anonymization and summarization), classification (e.g. sentiment analysis) and machine translation (e.g. Google Translate).

EXPERT SYSTEMS These represent a simple AI system, typically consisting of lists of “if… then” statements and other such associations, which are written in a human-like language.

SPEECH Speech is linked to NLP, but it is a unique technique within AI. Compared to NLP, the words are converted from speech to text or from text to speech.

VISION This technique deals with the way computers see and understand digital images and videos including, for example, facial recognition.

PLANNING Planning is a branch of AI concerning strategies and action sequences. Self-driving cars are an example of planning.

ROBOTICS Robotics is also known as Robotic Process Automation (RPA). Robotics automates the manual tasks usually done by a human.


There are three types of machine learning algorithms:

SUPERVISED In supervised learning, the algorithm uses a labeled dataset. This label is the outcome the algorithm needs to predict. A part of the labeled data is then used to train a model in an iterative way; every step the model compares the label from the database with the outcome of the model and then readjusts it until the model has been optimized. The model can then be applied to the other part of the labeled dataset to validate and measure the accuracy of the model. When the accuracy is sufficient the model can then be used to predict the label for the unlabeled data. Classification and regression are examples of supervised learning.

UNSUPERVISED In unsupervised learning, the algorithm uses an unlabeled dataset. The algorithm is searching for unknown patterns and relationship in the data. It is important for the Data Scientist to research whether the outcome of the model is useful and/or actionable. There are different types of unsupervised learning:

→ Clustering Creating groups in such a way that objects in the same group (called a cluster) are more similar to each other than to those in other groups.

→ Anomaly Detection Identification of rare events or observations that raise suspicions by differing significantly from the majority of the data. Detecting bank fraud is an example of this type of unsupervised learning.

→ Association Discovering interesting relations between variables in large databases. The aim is to predict what other attributes are commonly associated with a couple of key variables. Recommendations to add products to a shopping basket is one example of an association model.

→ Autoencoders The aim of autoencoders is to remove “noise” from visual data like images, video, or medical scans. This is done by learning a representation (encoding) for the data set and then generating the reduced encoding, which is a close representation to its original input.



With reinforcement learning, the agent relies both on learning from past feedback and exploration of new tactics that may present a larger payoff. This involves a long-term strategy where the agent tries to maximize the cumulative reward. This is an iterative process where, the more rounds of feedback, the better the agent’s strategy becomes. This technique is especially useful for training robots, which make a series of decisions during tasks like steering an autonomous vehicle or managing a warehouse inventory.

Share this article:
Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on pinterest

Latest News

Contact us to learn more

Please see our Privacy Policy regarding how we will handle this information.


Join our data science mailinglist

This website uses cookies to ensure you get the best experience on our website. More information.