Town trip forecasting based on data mining techniques
Subject Areas : Mathematical OptimizationMohammad Fili 1 , Majid Khedmati 2 *
1 - Department of Industrial Engineering, Sharif University of Technology, Tehran, Iran
2 - Department of Industrial Engineering, Sharif University of Technology
Keywords: Random forest, Artificial Neural Network (ANN), Forecasting trip duration, Grouping categorical variables, Modified nearest neighborhood (MNN),
Abstract :
In this paper, a data mining approach is proposed for duration prediction of the town trips (travel time) in New York City. In this regard, at first, two novel approaches, including a mathematical and a statistical approach, are proposed for grouping categorical variables with a huge number of levels. The proposed approaches work based on the cost matrix generated by repetitive post-hoc tests for different pairs. Then, a random forest model is constructed for the prediction of the type of trips, short or long. Finally, based on the trip type and each of the mathematical and statistical approaches, separate artificial neural networks (ANN) are developed to predict the duration time of the trips. According to the results, the mathematical approach performs better and provides more accurate results than the statistical approach. In addition, the proposed methods are compared with some other methods in the literature in which the results show that they perform better than all other methods. The RMSE of mathematical and statistical approaches is, respectively, 4.23 and 4.27 minutes for short trips, and the related value is 9.5 minutes for long trips. In addition, a modified version of the nearest neighborhood approach, entitled modified nearest neighborhood (MNN), is proposed for the prediction of the trip duration. This model resulted in accurate predictions where its RMSE is 4.45 minutes.
Turner, S. M., Eisele, W. L., Benz, R. J., & Douglas, J. (1998). Travel time data collection handbook. In Federal Highway Administration, USA.
Wu, C. H., Ho, J. M., & Lee, D. T. (2004). Travel-time prediction with support vector regression. IEEE Transactions on Intelligent Transportation Systems, 5(4), 276-281. https://doi.org/10.1109/TITS.2004.837813.
Cho, Y., Kwac, J. (2007). A Travel Time Prediction with Machine Learning Algorithms. http://cs229.stanford.edu/proj2007.
Kwon, J., & Petty, K. (2005). Travel time prediction algorithm scalable to freeway networks with many nodes with arbitrary travel routes. Transportation Research Record. https://doi.org/10.3141/1935-17
Kwon, J., Mauch, M., & Varaiya, P. (2006). Components of congestion: Delay from incidents, special events, lane closures, weather, potential ramp metering gain, and excess demand. Transportation Research Record, 1959, 84-91. https://doi.org/10.3141/1959-10
Zhan, X., Hasan, S., Ukkusuri, S. V., & Kamga, C. (2013). Urban link travel time estimation using large-scale taxi data with partial information. Transportation Research Part C: Emerging Technologies, 33, 37-49. https://doi.org/10.1016/j.trc.2013.04.001
Wang, J., Tsapakis, I., & Zhong, C. (2016). A space-time delay neural network model for travel time prediction. Engineering Applications of Artificial Intelligence, 52, 145-160. https://doi.org/10.1016/j.engappai.2016.02.012
Li, C. Sen, & Chen, M. C. (2014). A data mining based approach for travel time prediction in freeway with non-recurrent congestion. Neurocomputing, 133, 74-83. https://doi.org/10.1016/j.neucom.2013.11.029
Zhang, Y., & Haghani, A. (2015). A gradient boosting method to improve travel time prediction. Transportation Research Part C: Emerging Technologies, 58, 308-324. https://doi.org/10.1016/j.trc.2015.02.019
Friedman, J. H. (2001). Greedy function approximation: A gradient
boosting machine. Annals of Statistics, 29, 1189-1232. https://doi.org/10.1214/aos/1013203451
Antoniades, C., Fadavi, D., Amon, A. F. J. (2016). Fare and Duration Prediction: A Study of New York City Taxi Rides. http://cs229.stanford.edu/proj2016/report
Jaiwal, H., Bansal, T., Jakate, P., Saxena, T. (2016). NYC Taxi Rides:
Fare and Duration Prediction. https://cseweb.ucsd.edu/classes/wi17/cse258-a/reports/a077.pdf
Niaki, S. T. A., & Hoseinzade, S. (2013). Forecasting S&P 500 index using artificial neural networks and design of experiments. Journal of Industrial Engineering International, 9, 1-9. https://doi.org/10.1186/2251-712X-9-1
Zolghadr, M., Niaki, S. A. A., & Niaki, S. T. A. (2018). Modeling and forecasting US presidential election using learning algorithms. Journal of Industrial Engineering International, 14, 491-500. https://doi.org/10.1007/s40092-017-0238-2
Maleki, M. R., Amiri, A., & Mousavi, S. M. (2015). Step change point estimation in the multivariate-attribute process variability using artificial neural networks and maximum likelihood estimation. Journal of Industrial Engineering International, 11, 505-515. https://doi.org/10.1007/s40092-015-0117-7
2016 Yellow Taxi-Trip Data. (n.d.). https://data.cityofnewyork.us/Transportation/2016-Yellow-Taxi-Trip-Data/k67s-dv2t
O’Rourke, J. (1998). Computational Geometry in C. 2nd edition,
Cambridge. Weather data. (n.d.). www.wunderground.com
Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques. 3rd edition, Elsevier.
Montgomery, D. C. (2012). Design and Analysis of Experiments. 8th Edition, John Wiley.