Decision tree analysis

From apppm
(Difference between revisions)
Jump to: navigation, search
(Limitations)
Line 18: Line 18:
 
===Real life examples===
 
===Real life examples===
  
Finance: In the banking and financial sector, decision tree analysis is used for credit scoring, risk management, and fraud discovery <ref name="Kohavi">Kohavi, R., & Provost, F. (1998). Glossary of terms. Machine learning, 30(2-3), 271-274</ref> <ref name="Tang">Tang, F., Zeng, G., Deng, L., Huang, G., Li, X., & Wang, X. (2015). Decision tree models for effective credit scoring in peer-to-peer online microloan platforms. Decision Support Systems, 78, 15-26</ref>.
+
Finance: In the banking and financial sector, decision tree analysis is used for credit scoring, risk management, and fraud discovery <ref name="Kohavi">Kohavi, R., & Provost, F. (1998). Glossary of terms. Machine learning</ref> <ref name="Tang">Tang, F., Zeng, G., Deng, L., Huang, G., Li, X., & Wang, X. (2015). Decision tree models for effective credit scoring in peer-to-peer online microloan platforms. Decision Support Systems</ref>.
  
In medicine, decision tree analysis is used to make clinical decisions, diagnose diseases, and determine the fate of patients <ref name="Chen1"> Chen, J., Guo, Y., Li, S., Li, J., & Li, J. (2019). A decision tree approach to predicting the survival of gastric cancer patients. Journal of Cellular Biochemistry, 120(8), 13144-13151 </ref> <ref name="Leite">Leite, F. N., Oliveira, C. A., Cunha, A. M., Körbes, D., Fumagalli, F., & Leite, J. S. (2018). Decision trees for predicting breast cancer recurrence using clinical data. Expert Systems with Applications, 94, 97-103</ref>.
+
In medicine, decision tree analysis is used to make clinical decisions, diagnose diseases, and determine the fate of patients <ref name="Chen1"> Chen, J., Guo, Y., Li, S., Li, J., & Li, J. (2019). A decision tree approach to predicting the survival of gastric cancer patients. Journal of Cellular Biochemistry </ref> <ref name="Leite">Leite, F. N., Oliveira, C. A., Cunha, A. M., Körbes, D., Fumagalli, F., & Leite, J. S. (2018). Decision trees for predicting breast cancer recurrence using clinical data. Expert Systems with Applications</ref>.
  
 
Marketing: Customer segmentation, marketing targeting, and product recommendation all use decision tree analysis <ref name="Kotsiantis"/> <ref name="Verbeke">Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B., & Vanthienen, J. (2014). A novel profit-based classification model for customer base analysis</ref>.
 
Marketing: Customer segmentation, marketing targeting, and product recommendation all use decision tree analysis <ref name="Kotsiantis"/> <ref name="Verbeke">Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B., & Vanthienen, J. (2014). A novel profit-based classification model for customer base analysis</ref>.
  
Engineering: In production and engineering, decision tree analysis is used for fault diagnosis, quality control, and process optimization <ref name="Al-Marwani">Al-Marwani, A., Ramachandran, M., & Subramanian, R. (2020). A review on the application of decision tree and random forest algorithms in engineering. Journal of Advanced Research in Dynamical and Control Systems, 12(02), 317-330</ref> <ref name="Chen2">Chen, G., Gao, X., & Li, C. (2015). Decision tree-based quality control for ultrasonic welding of lithium-ion battery. Journal of Materials Processing Technology, 215, 82-90.</ref>.
+
Engineering: In production and engineering, decision tree analysis is used for fault diagnosis, quality control, and process optimization <ref name="Al-Marwani">Al-Marwani, A., Ramachandran, M., & Subramanian, R. (2020). A review on the application of decision tree and random forest algorithms in engineering. Journal of Advanced Research in Dynamical and Control Systems</ref> <ref name="Chen2">Chen, G., Gao, X., & Li, C. (2015). Decision tree-based quality control for ultrasonic welding of lithium-ion battery. Journal of Materials Processing Technology.</ref>.
  
Environmental science: decision tree analysis is used in species distribution modeling, land use planning, and environmental impact evaluation <ref name="Figueiredo" >Figueiredo, R. O., Rocha, J. C. V., & Tavares, R. A. (2019). Decision tree models for environmental impact assessment. Environmental Modelling & Software, 120, 104488</ref> <ref name="Pu">Pu, J., Tang, Q., & Yao, X. (2020). A comparative study of decision tree algorithms for modeling the spatial distribution of forest soil nutrients. Science of The Total Environment, 714, 136836</ref>.
+
Environmental science: decision tree analysis is used in species distribution modeling, land use planning, and environmental impact evaluation <ref name="Figueiredo" >Figueiredo, R. O., Rocha, J. C. V., & Tavares, R. A. (2019). Decision tree models for environmental impact assessment. Environmental Modelling & Software</ref> <ref name="Pu">Pu, J., Tang, Q., & Yao, X. (2020). A comparative study of decision tree algorithms for modeling the spatial distribution of forest soil nutrients. Science of The Total Environment</ref>.
  
 
=='''Limitations'''==
 
=='''Limitations'''==
Line 36: Line 36:
 
Decision trees are sensitive to small changes in the data or model parameters and can be unstable. This can lead to various tree structures or predictions<ref name="Breiman"/> <ref name="Hastie"/>.
 
Decision trees are sensitive to small changes in the data or model parameters and can be unstable. This can lead to various tree structures or predictions<ref name="Breiman"/> <ref name="Hastie"/>.
  
Decision trees may be biased toward factors with a high cardinality or number of categories, which could lead to an over- or under-representation of particular categories <ref name="James">James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. Springer </ref> <ref name="Zadrozny">Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. Journal of Machine Learning Research, 3(Nov) </ref>.
+
Decision trees may be biased toward factors with a high cardinality or number of categories, which could lead to an over- or under-representation of particular categories <ref name="James">James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. Springer </ref> <ref name="Zadrozny">Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. Journal of Machine Learning Research </ref>.
  
 
Interpretability: When dealing with big or complex trees that have numerous branches or nodes, decision trees can be challenging to understand and interpret <ref name="Caruana">Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</ref> <ref name="Lakkaraju">Lakkaraju, H., Bach, S. H., & Leskovec, J. (2016). Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</ref>.
 
Interpretability: When dealing with big or complex trees that have numerous branches or nodes, decision trees can be challenging to understand and interpret <ref name="Caruana">Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</ref> <ref name="Lakkaraju">Lakkaraju, H., Bach, S. H., & Leskovec, J. (2016). Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</ref>.

Revision as of 16:09, 19 February 2023

Contents

Abstract


A common machine learning algorithm called decision tree analysis is used to categorize and predict outcomes based on a collection of input features [1]. Each node represents a choice, and each branch represents one or more potential outcomes, creating a tree-like model of decisions and their potential effects. In order to decide what actions to take at each node based on the input features, the algorithm learns from a training collection of labeled data.

Many industries, including banking, medicine, marketing, and engineering, use decision tree analysis [2]. It is especially helpful for issues with binary outcomes, like predicting client churn, determining credit risk, and spotting fraud [3]. Decision trees are a popular option for outlining the decision-making process to stakeholders because they are simple to understand [4].

Decision trees can, however, be vulnerable to overfitting, which can result in subpar performance on new data. To get around this, methods like regularization, ensembling, and trimming can be used to improve the model [4].

Pruning is a technique that involves cutting off limbs from the tree that do not increase the model's accuracy when applied to fresh data. A method called assembling joins different decision trees to produce a more reliable model. By penalizing complexity in the tree, regularization keeps it from getting too complicated and overfitting the training data [4].


Application


Real life examples

Finance: In the banking and financial sector, decision tree analysis is used for credit scoring, risk management, and fraud discovery [5] [6].

In medicine, decision tree analysis is used to make clinical decisions, diagnose diseases, and determine the fate of patients [7] [8].

Marketing: Customer segmentation, marketing targeting, and product recommendation all use decision tree analysis [2] [9].

Engineering: In production and engineering, decision tree analysis is used for fault diagnosis, quality control, and process optimization [10] [11].

Environmental science: decision tree analysis is used in species distribution modeling, land use planning, and environmental impact evaluation [12] [13].

Limitations


Decision trees are susceptible to overfitting, which happens when the model interprets training data as noise rather than the underlying pattern [14] [15].

Decision trees are sensitive to small changes in the data or model parameters and can be unstable. This can lead to various tree structures or predictions[1] [4].

Decision trees may be biased toward factors with a high cardinality or number of categories, which could lead to an over- or under-representation of particular categories [16] [17].

Interpretability: When dealing with big or complex trees that have numerous branches or nodes, decision trees can be challenging to understand and interpret [18] [19].

References


  1. 1.0 1.1 Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Chapman and Hall
  2. 2.0 2.1 Kotsiantis, S. B., Zaharakis, I. D., & Pintelas, P. E. (2006). Machine learning: a review of classification and combining techniques. Artificial Intelligence Review, 26(3), 159-190
  3. Wasserman, L. (2013). All of statistics: A concise course in statistical inference. Springer Science & Business Media(Wasserman, 2013)
  4. 4.0 4.1 4.2 4.3 Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer
  5. Kohavi, R., & Provost, F. (1998). Glossary of terms. Machine learning
  6. Tang, F., Zeng, G., Deng, L., Huang, G., Li, X., & Wang, X. (2015). Decision tree models for effective credit scoring in peer-to-peer online microloan platforms. Decision Support Systems
  7. Chen, J., Guo, Y., Li, S., Li, J., & Li, J. (2019). A decision tree approach to predicting the survival of gastric cancer patients. Journal of Cellular Biochemistry
  8. Leite, F. N., Oliveira, C. A., Cunha, A. M., Körbes, D., Fumagalli, F., & Leite, J. S. (2018). Decision trees for predicting breast cancer recurrence using clinical data. Expert Systems with Applications
  9. Verbeke, W., Dejaeger, K., Martens, D., Hur, J., Baesens, B., & Vanthienen, J. (2014). A novel profit-based classification model for customer base analysis
  10. Al-Marwani, A., Ramachandran, M., & Subramanian, R. (2020). A review on the application of decision tree and random forest algorithms in engineering. Journal of Advanced Research in Dynamical and Control Systems
  11. Chen, G., Gao, X., & Li, C. (2015). Decision tree-based quality control for ultrasonic welding of lithium-ion battery. Journal of Materials Processing Technology.
  12. Figueiredo, R. O., Rocha, J. C. V., & Tavares, R. A. (2019). Decision tree models for environmental impact assessment. Environmental Modelling & Software
  13. Pu, J., Tang, Q., & Yao, X. (2020). A comparative study of decision tree algorithms for modeling the spatial distribution of forest soil nutrients. Science of The Total Environment
  14. Kuhn, M., & Johnson, K. (2013). Applied predictive modeling. Springer
  15. Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann
  16. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. Springer
  17. Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. Journal of Machine Learning Research
  18. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., & Elhadad, N. (2015). Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
  19. Lakkaraju, H., Bach, S. H., & Leskovec, J. (2016). Interpretable decision sets: A joint framework for description and prediction. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox