• Vol 9, No 1 (2018)
  • Electrical, Electronics, and Computer Engineering

An Approximation Method of Regression Analysis in Concurrent Big Data Stream

Chanintorn Jittawiriyanukoon, Vilasinee Srisarkun

Cite this article as:

Jittawiriyanukoon, C., Srisarkun, V., 2018. An Approximation Method of Regression Analysis in Concurrent Big Data Stream. International Journal of Technology. Volume 9(1), pp. 192-200

Chanintorn Jittawiriyanukoon Assumption University
Vilasinee Srisarkun Assumption University
Email to Corresponding Author


Time series big data dynamically changes the size, and, unfortunately, it may be difficult to curate the enormous amount of data due to the processing capacity and storage size. This big data allows researcher to iterate on the model millions of times over. To execute a regression on several billion rows of data on a distributed network, the resource capacity regarding large volumes of data and its distributed environment must be considered. Algorithms must be real-time based data awareness. Moreover, analyzing big data sources requires the data to be pre-processed rather than immediately collected and analyzed. This pre-processing approach for the big data sources helps minimize the amount of collected data by extracting insights. It analyzes big data quicker and is cost-effective for storage space. Hence, in this research, an approximation method for analyzing regression problems in a big data stream with parallelism is proposed. The partitioning method for huge data stream helps reduce the computing time and required space, and the speed-up can improve the processing time. The performance evaluation of concurrent regression model is first executed by massive online analysis (MOA) simulation. Then, to validate the approximation method, the results performed by our proposed method are compared to those results collected from the simulation. The comparisons show evenly between the two methods.

Approximation method; Big data curation; MOA; Parallel processing; Regression analysis


Bifet,A., Kirkby, R., Holmes, G., Pfahringer, B., 2010. MOA: Massive Online Analysis.Journal of Machine Learning Research 11,pp. 1601–1604

Fox,J., Weisberg, S., 2011. An R Companion to Applied Regression. California: SAGE Publications

G.D.G.Software SARL., 2016. Software SARL. Available online at http://www.gdgsoft.com,Accessed on February 15, 2017

Heinis,T., 2014. Data Analysis: Approximation Aids Handling of Big Data. Nature, pp. 198–198

Hodge,V.J., 2014. Outlier Detection in Big Data.IGI Global, pp. 1762–1771

Hu,W., Kaabouch, N., 2014. Big DataManagement, Technologies, and Applications. Information Science Reference,IGI Global

James,G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning with Applications in R. NewYork: Springer

Jittawiriyanukoon,C., 2014. Performance Evaluation of Reliable Data Scheduling for ErlangMultimedia in Cloud Computing. IEEEProceedings of the Ninth International Conference on Digital InformationManagement (ICDIM 2014), pp. 39–44

Khan,A., Ahirwar, K.K., 2011. Mobile Cloud Computing as a Future of MobileMultimedia Database. InternationalJournal of Computer Science and Communication, Volume 2(1), pp. 219–231

Malik,A.W., Park, A.J., Fujimoto, R.M., 2010. An Optimistic Parallel SimulationProtocol for Cloud Computing Environments. SCSM&S Magazine, Volume IV, pp. 1–9

Srimani,P.K., Patil, M.M., 2016. Mining Data Streams with Concept Drift in MassiveOnline Analysis Frame Work. WSEASTransaction on Computers, Volume 15, pp. 133–142

Sunghae,J., Seung-Joo, L., Jea-Bok, R., 2015. A Divided Regression Analysis for BigData. International Journal of SoftwareEngineering and Its Application, Volume 9(5), pp. 21–32

Tsai,C.-F., Lin, W.-C., Ke, S.-W., 2016. Big Data Mining with Parallel Computing: AComparison of Distributed and Map Reduce Methodologies. Journal of Systems and Software, Volume 122, pp. 83–92