|Boris Onykiy||Department of Analysis of Competitive Systems, National Research Nuclear University MEPhI, Kashirskoe hwy, 31, Moscow, 115409, Russian Federation|
|Evgeniy Antonov||Laboratory of Advanced Storage and Processing Systems for Ultra Large Data, Plekhanov Russian University of Economics, Stremyanny lane 36, Moscow, 117997, Russian Federation|
|Alexey Artamonov||Department of Analysis of Competitive Systems, National Research Nuclear University MEPhI, Kashirskoe hwy, 31, Moscow, 115409, Russian Federation|
|Evgeny Tretyakov||Laboratory of Advanced Storage and Processing Systems for Ultra Large Data, Plekhanov Russian University of Economics, Stremyanny lane 36, Moscow, 117997, Russian Federation|
paper presents the development of an information and analytical system to
foster scientific and technological development in a given scientific field. In
this work, the main software tools for implementing distributed computing,
which involves a set of software components for collecting, processing, and
analyzing large amounts of data, are considered. In addition, various
approaches for task coordination between different sets of software are
discussed and techniques for storing large amounts of data are described. The
system architecture and database schema are designed and tested. Nowadays, the
intellectualization of individual software agents is a key aspect of a new
generation of multiagent systems. For this reason, this paper develops an
approach that can organize activities of a large number of software agents to
increase system intellectualization through swarm intelligence at the level of
individual agents. Three remote servers were used to build and test the system
deployment, comprising such components as a platform for monitoring and
scheduling workflow, data storage, and a graphical user interface that enables
data retrieval and interaction on the Internet.
Apache airflow; Data collection; Data storing; Distributed computing; Multiagent system
In the process of viable decision-making in scientific and technological development, a synoptic view is required regarding the current state of the specific areas of concern and the trends of modern development. In the course of performing search operations, an analyst has to interact with various sources of information, mostly located on the Internet (Berawi, 2018b). The conditions for a quick search in a short period of time determine the impossibility of performing the corresponding work in manual mode, as, in this case, aggregating a large number of unrelated information sources is necessary (Kulik, 2015; Inkina et al., 2019). In this regard, automation is needed in the processes of searching, collecting, and aggregating information (Berawi, 2018a).
In this paper, the automation of data collection and processing is achieved by developing a multiagent system (MAS). In general, an agent in information technology (a software agent) is a computer program that is activated on schedule or by request with some autonomy to perform specific tasks (Ananieva et al., 2015; Onykiy et al., 2017).
In contrast with the classical method of problem-solving (searching for a deterministic algorithm that allows the best solution to be found), in multiagent technologies, the solution is obtained as a result of the interaction among many independent targeted software agents. A review of domestic and foreign manuscripts shows the relevance of an automated data-based decision-making information system and software, and the intellectualization of an individual software agent is a key aspect of a new generation of MAS. For this reason, modern approaches for storing and analyzing large amounts of data are set forth here to consider a software-based solution for agent interaction in the distribution of data-collection and -processing tasks. Furthermore, this method aims to take into account the possibility of increasing each software agent’s intellectualization.
This paper considers the approach of processing data from various Internet resources in a specific field by considering the increasing volume of data in time. At this stage, the approach can be used as a data collection and analytical tool to provide superior information on a given subject (for example, for understanding customer experiences and how consumer behavior has changed over time). Furthermore, using the approach, we can organize the activities of a large number of software agents, thus increasing system intellectualization with swarm intelligence at the level of individual agents.
The developed system for information and analytical support uses distributed capacities to solve the task. The system is scalable, to improve performance and reduce workflow execution time, and it is possible to set workers using additional capacity. Elasticsearch can be scaled horizontally as well. The architecture and the database schema have been designed and tested. To collect the data from other information sources on the Internet, the DAG is added with the parameters for extracting data. Up-to-date data will be displayed and taken into account in the Dashboard and in the GUI.
In addition, at this stage of development of the MAS, we have collected the keywords that represent the Big Data technology field. The next step of the work is to improve the approach with an automated search of relevant articles on the Internet. The collected data will be used for prediction analysis to make a list of contemporary references in a specific area for understanding the current state of technological development.
The study was carried out at the expense of the Russian Science Foundation grant (project No. 19-71-30008, 2019).
Ananieva, A.G., Artamonov, A.A., Galin, I.U., Tretyakov, E.S., Kshnyakov, D.O., 2015. Algorithmization of Search Operations in Multiagent Information-Analytical Systems. Journal of Theoretical and Applied Information Technology, Volume 81(1), pp. 11–17
Antonov, E., Lopatina, E., Ionkina, K., Evgeniy, T., 2020. Agent Data Merging. Procedia Computer Science, Volume 169, pp. 473–478
Artamonov, A.A., Leonov, D.V., Nikolaev, V.S., Onykiy, B.N., Pronicheva, L.V., Sokolina, K.A., Ushmarov, I.A., 2014. Visualization of Semantic Relations in Multi-Agent Systems. Scientific Visualization, Volume 6(3), pp. 68–76
Berawi, M.A., 2018a. Improving Business Processes through Advanced Technology Development. International Journal of Technology. Volume 9(4), pp. 641–644
Berawi, M.A., 2018b. Utilizing Big Data in Industry 4.0: Managing Competitive Advantages and Business Ethics. International Journal of Technology. Volume 9(3), pp. 430–433
Bezerra, D., Aschoff, R.R., Szabo, G., Sadok, D., 2018. An IoT Protocol Evaluation in a Smart Factory Environment. In: 15th Latin American Robotics Symposium (LARS), 6th Brazilian Robotics Symposium (SBR) and 9th Workshop on Robotics in Education (WRE), pp. 124–128
Bhatnagar, D., SubaLakshmi, R.J., Vanmathi, C., 2020. Twitter Sentiment Analysis using Elasticsearch, LOGSTASH and KIBANA. In: 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), pp. 1–5
Dhulavvagol, P.M., Bhajantri, V.H., Totad, S.G., 2020. Performance Analysis of Distributed Processing System using Shard Selection Techniques on Elasticsearch. Procedia Computer Science, Volume 167, pp. 1626–1635
Fedorova, V.A., Efremov, E.A., Kolyagina I.A., 2019. Search and Index Data using Elasticsearch. Issues of Radio Electronics. Volume 3, pp. 74–77
Fomina, J., Safikanov, D., Artamonov, A., Tretyakov, E, 2020. Parametric and Semantic Analytical Search Indexes in Hieroglyphic Languages. Procedia Computer Science, Volume 169, pp. 507–512
Gao, R., Li, D., Li, W., Dong, Y., 2012. Application of Full Text Search Engine Based on Lucene. Advances in Internet of Things, Volume 2(4), pp. 106–109
Golosova M.V., Grigorieva, M.A., Klimentov, A.A, Ryabinkin, E.A., Dimitrov, G., Potekhin, M., 2015. Studies of Big Data Metadata Segmentation between Relational and Non-Relational Databases. Journal of Physics: Conference Series, Volume 664(4), pp. 1–9
Grigorieva, M.A., Aulov, V.A., Golosova, M.V., Gubin, M.Y., Klimentov, A.A., 2016. Data Knowledge Base Prototype for Modern Scientific Collaborations. Ceur Workshop Proceedings, Volume 1787, pp. 26–33
Han, L., Zhu, L., 2020. Design and Implementation of Elasticsearch for Media Data. In: International Conference on Computer Engineering and Application (ICCEA) 2020, pp. 137–140.
Hong, X.J., Sik Yang, H., Kim, Y.H., 2018. Performance Analysis of RESTful API and RabbitMQ for Microservice Web Application. In: 9th International Conference on Information and Communication Technology Convergence (ICTC) 2018, pp. 257–259
Inkina, V.A., Antonov, E.V., Artamonov, A.A., Ionkina, K.V., Tretyakov E.S., Cherkasskiy A.I., 2019. Multiagent Information Technologies in System Analysis. In: Proceedings of the 27th International Symposium Nuclear Electronics and Computing (NEC) 2019, pp. 195–199
Kulik, S.D., 2015. Model for Evaluating the Effectiveness of Search Operations. Journal of ICT Research and Applications, Volume 9(2), pp. 177–196
Mitchell, R., Pottier, L., Jacobs, S., Silva, R.F.D., Rynge, M., Vahi, K., Deelman, E., 2019. Exploration of Workflow Management Systems Emerging Features from Users Perspectives. In: IEEE International Conference on Big Data 2019, pp. 4537–4544
Natesan, G., Chokkalingam, A., 2019. Optimal Task Scheduling in the Cloud Environment Using a Mean Grey Wolf Optimization Algorithm. International Journal of Technology, Volume 10(1), pp. 126–136
Onykiy, B.N., Artamonov, A.A., Tretyakov, E.S., Ionkina, K.V., 2017. Visualization of Large Samples of Unstructured Information on the Basis of Specialized Thesauruses. Scientific Visualization, Volume 9(5), pp. 54–58
Shah, N., Willick, D., Mago V., 2018. A Framework for Social Media Data Analytics using Elasticsearch and Kibana. Wireless Networks, Volume 1, pp. 1–9
Yang, H., 2019. Design and Implementation of Data Acquisition System based on Scrapy Technology. In: 2nd International Conference on Safety Produce Informatization (IICSPI) 2019, pp. 417–420
You, X., Wang, Y., 2019. Automatic Network Application System Based on Selenium. In: 2nd International Conference on Computer and Communication Engineering Technology (CCET) 2019, pp. 149–153