Using a Predictive Analytics Decision Tree

13645 Views
0 Comments
14 Likes

When it comes to Predictive Analytics, several algorithms can allow you to use the available data by constructing a prediction model. Previously, we looked at a few algorithms designed to calculate probability. Another popular predictive analytics and AI algorithm is a Decision Tree known as C4.5.

What Do We Mean By Decision Tree?

By now, we all recognize that the term "Decision Tree" is used in many contexts across the internet. We've even come across “Decision Tree” being used in a business rules scenario. However, for the purposes of this article, the term "Decision Tree" refers to the existence of a predictive model based on the analysis of available historical data.

We have found that using the C4.5 Decision Tree is highly effective. At its most simplistic, the model allows the classification of new cases based on historical information from similar cases, given that the historical data is relevant and the tree is not overfit.

decision tree

The bottom line is that as soon as the predictive model is created, it is possible to classify new cases without delay.

Extraction of Business Rules

A good example of the use-case for Decision Tree C4.5 is the creation of a set of rules from existing data. The extraction of rules will dump down the decision rules used for classifying the new case.

 

  • if (Outlook == 'sunny') and (Humidity  70) then Play = 'no'
  • if (Outlook == 'overcast') then Play = 'yes'
  • if (Outlook == 'rain') and (Windy == false) then Play = 'yes'
  • if (Outlook == 'rain') and (Windy == true) then Play = 'no'

The C4.5 Decision Tree has an advantage over some other AI methods in that it enables users to understand why a particular decision is made. After all, the notion of explanation is crucial when it comes to auditing, in order that the rationale behind a particular decision may be fully explained.

 

What are some Use Cases?

For example in health care, this type of data may be used for the analysis and ultimately the diagnosis of many diseases by data mining algorithms such as C4.5. Furthermore, as C4.5 is a particularly fast algorithm, it can manage the significant amounts of data that is usually found in the health care and clinical fields.


Author: Arash Aghlara, Founder & CTO, Flexrule

 



Upcoming Live Webinars

 




Copyright 2006-2024 by Modern Analyst Media LLC