2021
Zhang, Jiajia; Feng, Tao; Lin, Zhengkui; Timmermans, Harry J P
Advancing Association Rule Base on Gini Impurity Statistic for Predicting Transportation Mode Choice Proceedings Article
In: 2021, (100th Transportation Research Board Annual Meeting ; Conference date: 21-01-2021 Through 29-01-2021).
Abstract | BibTeX | Tags: Class association rules, Gini impurity, Transportation mode choice, Weight of rules
@inproceedings{Zhang2021,
title = {Advancing Association Rule Base on Gini Impurity Statistic for Predicting Transportation Mode Choice},
author = {Jiajia Zhang and Tao Feng and Zhengkui Lin and Harry J P Timmermans},
year = {2021},
date = {2021-01-01},
abstract = {Recently, machine learning approaches have been applied to predict transportation mode choice as an alternative to the more commonly used discrete choice models. General class association rules (CARs) have been introduced as a promising machine learning method, but the interpretability of the prediction results in terms of the underlying behavioral decision-making process has remained a concern. In an attempt to improve CARs, this study proposes a more advanced association rule model (named CARGIGI) with stronger interpretability. Based on the original CARIG approach that uses information gain (IG) statistic for improving the predictive accuracy, in this model, the Gini impurity (GI) statistic is used to generate new rules for improving predictive accuracy and calculate the relative importance of the variables, that of the variable levels and the weight of rules in transportation mode decision process. The weight of rules is introduced as a new pruning indicator to improve the predictive accuracy, while the relative importance of the level of a variable is used to enhance the behavioral interpretability of the results. The suggested approach is applied to the 2015 Dutch National Travel Survey. Results indicate that travel distance, OV card usage frequency, travel time, and travel purpose are the most important variables, while travel party and gender are the least important variables for predicting transportation mode choice. In addition, a 10-fold cross validation test is conducted to validate the advanced model. The results show that the newly proposed model outperform both the selected machine learning algorithms and the MNL model.},
note = {100th Transportation Research Board Annual Meeting ; Conference date: 21-01-2021 Through 29-01-2021},
keywords = {Class association rules, Gini impurity, Transportation mode choice, Weight of rules},
pubstate = {published},
tppubtype = {inproceedings}
}
Recently, machine learning approaches have been applied to predict transportation mode choice as an alternative to the more commonly used discrete choice models. General class association rules (CARs) have been introduced as a promising machine learning method, but the interpretability of the prediction results in terms of the underlying behavioral decision-making process has remained a concern. In an attempt to improve CARs, this study proposes a more advanced association rule model (named CARGIGI) with stronger interpretability. Based on the original CARIG approach that uses information gain (IG) statistic for improving the predictive accuracy, in this model, the Gini impurity (GI) statistic is used to generate new rules for improving predictive accuracy and calculate the relative importance of the variables, that of the variable levels and the weight of rules in transportation mode decision process. The weight of rules is introduced as a new pruning indicator to improve the predictive accuracy, while the relative importance of the level of a variable is used to enhance the behavioral interpretability of the results. The suggested approach is applied to the 2015 Dutch National Travel Survey. Results indicate that travel distance, OV card usage frequency, travel time, and travel purpose are the most important variables, while travel party and gender are the least important variables for predicting transportation mode choice. In addition, a 10-fold cross validation test is conducted to validate the advanced model. The results show that the newly proposed model outperform both the selected machine learning algorithms and the MNL model.