Published at : 07 Dec 2020
Volume : IJtech
Vol 11, No 6 (2020)
DOI : https://doi.org/10.14716/ijtech.v11i6.4465
Boris Onykiy | Department of Analysis of Competitive Systems, National Research Nuclear University MEPhI, Kashirskoe hwy, 31, Moscow, 115409, Russian Federation |
Evgeniy Antonov | Laboratory of Advanced Storage and Processing Systems for Ultra Large Data, Plekhanov Russian University of Economics, Stremyanny lane 36, Moscow, 117997, Russian Federation |
Alexey Artamonov | Department of Analysis of Competitive Systems, National Research Nuclear University MEPhI, Kashirskoe hwy, 31, Moscow, 115409, Russian Federation |
Evgeny Tretyakov | Laboratory of Advanced Storage and Processing Systems for Ultra Large Data, Plekhanov Russian University of Economics, Stremyanny lane 36, Moscow, 117997, Russian Federation |
This
paper presents the development of an information and analytical system to
foster scientific and technological development in a given scientific field. In
this work, the main software tools for implementing distributed computing,
which involves a set of software components for collecting, processing, and
analyzing large amounts of data, are considered. In addition, various
approaches for task coordination between different sets of software are
discussed and techniques for storing large amounts of data are described. The
system architecture and database schema are designed and tested. Nowadays, the
intellectualization of individual software agents is a key aspect of a new
generation of multiagent systems. For this reason, this paper develops an
approach that can organize activities of a large number of software agents to
increase system intellectualization through swarm intelligence at the level of
individual agents. Three remote servers were used to build and test the system
deployment, comprising such components as a platform for monitoring and
scheduling workflow, data storage, and a graphical user interface that enables
data retrieval and interaction on the Internet.
Apache airflow; Data collection; Data storing; Distributed computing; Multiagent system
In the process of
viable decision-making in scientific and technological development, a synoptic
view is required regarding the current state of the specific areas of concern
and the trends of modern development. In the course of performing search operations,
an analyst has to interact with various sources of information, mostly located
on the Internet (Berawi, 2018b). The
conditions for a quick search in a short period of time determine the
impossibility of performing the corresponding work in manual mode, as, in this
case, aggregating a large number of unrelated information sources is necessary (Kulik, 2015; Inkina et al., 2019). In this
regard, automation is needed in the processes of searching, collecting, and
aggregating information (Berawi, 2018a).
In this
paper, the automation of data collection and processing is achieved by
developing a multiagent system (MAS). In general, an agent in information
technology (a software agent) is a computer program that is activated on schedule or by request with
In contrast with the classical method of problem-solving
(searching for a deterministic algorithm that allows the best solution to be
found), in multiagent technologies, the solution is obtained as a result of the
interaction among many independent targeted software agents. A review of domestic
and foreign manuscripts shows the relevance of an automated data-based
decision-making information system and software, and the intellectualization of
an individual software agent is a key aspect of a new generation of MAS. For
this reason, modern approaches for storing and analyzing large amounts of data
are set forth here to consider a software-based solution for agent interaction
in the distribution of data-collection and -processing tasks. Furthermore, this
method aims to take into account the possibility of increasing each software
agent’s intellectualization.
This paper considers the approach of
processing data from various Internet resources in a specific field by
considering the increasing volume of data in time. At this stage, the approach
can be used as a data collection and analytical tool to provide superior
information on a given subject (for example, for understanding customer
experiences and how consumer behavior has changed over time). Furthermore,
using the approach, we can organize the activities of a large number of
software agents, thus increasing system intellectualization with swarm
intelligence at the level of individual agents.
The developed system for information and
analytical support uses distributed capacities to solve the task. The system is
scalable, to improve performance and reduce workflow execution time, and it is
possible to set workers using additional capacity. Elasticsearch can be scaled
horizontally as well. The architecture and the database schema have been
designed and tested. To collect the data from other information sources on the
Internet, the DAG is added with the parameters for extracting data. Up-to-date
data will be displayed and taken into account in the Dashboard and in the GUI.
In addition, at this stage of development of
the MAS, we have collected the keywords that represent the Big Data technology
field. The next step of the work is to improve the approach with an automated
search of relevant articles on the Internet. The collected data will be used
for prediction analysis to make a list of contemporary references in a specific
area for understanding the current state of technological development.
The study was carried out at the expense of the
Russian Science Foundation grant (project No. 19-71-30008, 2019).
Filename | Description |
---|---|
R1-EECE-4465-20201124233636.png | Figure 1 |
R1-EECE-4465-20201124233708.png | Figure 2 |
R1-EECE-4465-20201124233727.png | Figure 3 |
R1-EECE-4465-20201124233756.png | Figure 4 |
R1-EECE-4465-20201124233815.png | Figure 5 |
R1-EECE-4465-20201124233832.png | Figure 6 |
R1-EECE-4465-20201124233850.png | Figure 7 |
Ananieva, A.G.,
Artamonov, A.A., Galin, I.U., Tretyakov, E.S., Kshnyakov, D.O., 2015.
Algorithmization of Search Operations in Multiagent Information-Analytical
Systems. Journal of Theoretical and Applied Information Technology,
Volume 81(1), pp. 11–17
Antonov, E., Lopatina,
E., Ionkina, K., Evgeniy, T., 2020. Agent Data Merging. Procedia Computer
Science, Volume 169, pp. 473–478
Artamonov, A.A.,
Leonov, D.V., Nikolaev, V.S., Onykiy, B.N., Pronicheva, L.V., Sokolina, K.A.,
Ushmarov, I.A., 2014. Visualization of Semantic Relations in Multi-Agent
Systems. Scientific Visualization, Volume 6(3), pp. 68–76
Berawi, M.A., 2018a.
Improving Business Processes through Advanced Technology Development. International
Journal of Technology. Volume 9(4), pp. 641–644
Berawi, M.A., 2018b.
Utilizing Big Data in Industry 4.0: Managing Competitive Advantages and
Business Ethics. International Journal of Technology. Volume 9(3), pp.
430–433
Bezerra, D., Aschoff,
R.R., Szabo, G., Sadok, D., 2018. An IoT Protocol Evaluation in a Smart Factory
Environment. In: 15th Latin American Robotics Symposium
(LARS), 6th Brazilian Robotics Symposium (SBR) and 9th
Workshop on Robotics in Education (WRE), pp. 124–128
Bhatnagar, D.,
SubaLakshmi, R.J., Vanmathi, C., 2020. Twitter Sentiment Analysis using
Elasticsearch, LOGSTASH and KIBANA. In: 2020 International Conference on Emerging Trends in Information
Technology and Engineering (ic-ETITE), pp. 1–5
Dhulavvagol, P.M.,
Bhajantri, V.H., Totad, S.G., 2020. Performance Analysis of Distributed
Processing System using Shard Selection Techniques on Elasticsearch. Procedia
Computer Science, Volume 167, pp.
1626–1635
Fedorova, V.A.,
Efremov, E.A., Kolyagina I.A., 2019. Search and Index Data using Elasticsearch.
Issues of Radio Electronics. Volume
3, pp. 74–77
Fomina, J., Safikanov,
D., Artamonov, A., Tretyakov, E, 2020. Parametric and Semantic Analytical
Search Indexes in Hieroglyphic Languages. Procedia Computer Science,
Volume 169, pp. 507–512
Gao, R., Li, D., Li,
W., Dong, Y., 2012. Application of Full Text Search Engine Based on
Lucene. Advances in Internet of Things, Volume 2(4), pp. 106–109
Golosova M.V.,
Grigorieva, M.A., Klimentov, A.A, Ryabinkin, E.A., Dimitrov, G., Potekhin, M.,
2015. Studies of Big Data Metadata Segmentation between Relational and Non-Relational
Databases. Journal of Physics: Conference Series, Volume 664(4), pp. 1–9
Grigorieva, M.A.,
Aulov, V.A., Golosova, M.V., Gubin, M.Y., Klimentov, A.A., 2016. Data Knowledge
Base Prototype for Modern Scientific Collaborations. Ceur Workshop Proceedings,
Volume 1787, pp. 26–33
Han, L., Zhu, L., 2020.
Design and Implementation of Elasticsearch for Media Data. In: International Conference on Computer
Engineering and Application (ICCEA) 2020, pp. 137–140.
Hong, X.J., Sik Yang,
H., Kim, Y.H., 2018. Performance Analysis of RESTful API and RabbitMQ for
Microservice Web Application. In: 9th International
Conference on Information and Communication Technology Convergence (ICTC) 2018,
pp. 257–259
Inkina, V.A., Antonov,
E.V., Artamonov, A.A., Ionkina, K.V., Tretyakov E.S., Cherkasskiy A.I., 2019.
Multiagent Information Technologies in System Analysis. In: Proceedings
of the 27th International Symposium Nuclear Electronics and
Computing (NEC) 2019, pp. 195–199
Kulik, S.D., 2015.
Model for Evaluating the Effectiveness of Search Operations. Journal of ICT
Research and Applications, Volume 9(2), pp. 177–196
Mitchell, R., Pottier,
L., Jacobs, S., Silva, R.F.D., Rynge, M., Vahi, K., Deelman, E., 2019.
Exploration of Workflow Management Systems Emerging Features from Users
Perspectives. In: IEEE
International Conference on Big Data 2019, pp. 4537–4544
Natesan, G.,
Chokkalingam, A., 2019. Optimal Task Scheduling in the Cloud Environment Using
a Mean Grey Wolf Optimization Algorithm. International Journal of Technology,
Volume 10(1), pp. 126–136
Onykiy, B.N.,
Artamonov, A.A., Tretyakov, E.S., Ionkina, K.V., 2017. Visualization of Large
Samples of Unstructured Information on the Basis of Specialized Thesauruses. Scientific Visualization, Volume 9(5),
pp. 54–58
Shah, N., Willick, D.,
Mago V., 2018. A Framework for Social Media Data Analytics using Elasticsearch
and Kibana. Wireless Networks, Volume 1, pp. 1–9
Yang, H., 2019. Design
and Implementation of Data Acquisition System based on Scrapy Technology. In:
2nd International Conference on Safety Produce Informatization (IICSPI)
2019, pp. 417–420
You, X., Wang, Y., 2019. Automatic Network Application System Based on Selenium. In: 2nd International Conference on Computer and Communication Engineering Technology (CCET) 2019, pp. 149–153