Impementasi Web Scraping  Pada OJS  Dengan Metode CSS Selector

Agus Purnomo

doi:10.30865/resolusi.v3i2.423

Authors

Agus Purnomo Institute Agama Islam Negeri Salatiga, Salatiga, Indonesia

DOI:

https://doi.org/10.30865/resolusi.v3i2.423

Keywords:

OJS; Web Scraping; CSS Selector; BeautifulSoup; Python

Abstract

Reference sources in research and scientific article writing are journals. For reputable international journals, a lot of software has been developed to retrieve meta data for research mapping. To obtain meta data, for the purposes of visualizing research results from national journals, especially Sinta's journal, it still has to be done manually. Researchers must copy and paste one by one the articles they want. In order to facilitate the retrieval of article meta data contained in a sinta journal with the OJS system, the development of an OJS journal scraping web application was carried out. The chosen web scraping method is the CSS Selector. The application is developed with the Python programming language and additional BeautifulSoup, flask and pandas libraries. From the results of testing the OJS scraping application was able to retrieve meta data in the form of title data, abstract, keywords, author, links. Weaknesses found were not being able to retrieve data in journals with the OJS system that had made changes to the non-OJS standard web interface.

Downloads

Download data is not yet available.

References

J. Hillen, “Web scraping for food price research,” Br. Food J., vol. 121, no. 12, 2019, doi: 10.1108/BFJ-02-2019-0081.

F. Djiwadikusumah, G. H. Irawan, and R. Haekal Al-Fadilah, “Web scraping situs e-commerce menggunakan teknik parsing dom,” J. Siliwangi, vol. 7, no. 2, 2021.

D. F. Setiawan, T. Tristiyanto, and A. Hijriani, “APLIKASI WEB SCRAPING DESKRIPSI PRODUK,” J. Teknoinfo, vol. 14, no. 1, 2020, doi: 10.33365/jti.v14i1.498.

D. D. A. Yani, H. S. Pratiwi, and H. Muhardi, “Implementasi Web Scraping untuk Pengambilan Data pada Situs Marketplace,” J. Sist. dan Teknol. Inf., vol. 7, no. 4, 2019, doi: 10.26418/justin.v7i4.30930.

M. Djufri, “PENERAPAN TEKNIK WEB SCRAPING UNTUK PENGGALIAN POTENSI PAJAK (STUDI KASUS PADA ONLINE MARKET PLACE TOKOPEDIA, SHOPEE DAN BUKALAPAK),” J. BPPK Badan Pendidik. dan Pelatih. Keuang., vol. 13, no. 2, 2020, doi: 10.48108/jurnalbppk.v13i2.636.

R. Baskara and F. Rahma, “Implementasi Web Scraping Pada Media Sosial Instagram,” Automata, vol. 3, pp. 1–3, 2022.

R. Crystal Pereira and T. Vanitha, “Web Scraping of Social Networks,” Int. J. Innov. Res. Comput. Commun. Eng. (An ISO, vol. 3297, no. 7, 2015.

I. Dongo, Y. Cadinale, A. Aguilera, F. Martínez, Y. Quintero, and S. Barrios, “Web Scraping versus Twitter API: A Comparison for a Credibility Analysis,” 2020, doi: 10.1145/3428757.3429104.

M. I. Habibie, T. Widiaputra, and Y. Yulianingsani, “WEB SCRAPING OF DISEASE INFORMATION FROM SOCIAL MEDIA TWITTER,” J. Teknoinfo, vol. 16, no. 2, 2022, doi: 10.33365/jti.v16i2.1871.

S. Satriajati, S. B. Panuntun, and S. Pramana, “IMPLEMENTASI WEB SCRAPING DALAM PENGUMPULAN BERITA KRIMINAL PADA MASA PANDEMI COVID-19,” Semin. Nas. Off. Stat., vol. 2020, no. 1, 2021, doi: 10.34123/semnasoffstat.v2020i1.578.

sinta, “journals,” kemendikbud, 2023. https://sinta.kemdikbud.go.id/journals.

S. E. Chasins, M. Mueller, and R. Bodik, “Rousillon: Scraping distributed hierarchical web data,” 2018, doi: 10.1145/3242587.3242661.

I. Onyenwe, S. Ogbonna, E. Onyedimma, O. Ikechukwu-Onyenwe, and C. Nwafor, “Developing Smart Web-Search using Regex,” Int. J. Nat. Lang. Comput., vol. 11, no. 3, 2022, doi: 10.5121/ijnlc.2022.11303.

Z. Wu, B. Ericson, and C. Brooks, “Regex Parsons: Using Horizontal Parsons Problems to Scaffold Learning Regex,” 2021, doi: 10.1145/3488042.3489968.

A. Rahmatulloh and R. Gunawan, “Web Scraping with HTML DOM Method for Data Collection of Scientific Articles from Google Scholar,” Indones. J. Inf. Syst., vol. 2, no. 2, 2020, doi: 10.24002/ijis.v2i2.3029.

V. Mitra, H. Sujaini, and A. B. P. Negara, “Rancang Bangun Aplikasi Web Scraping untuk Korpus Paralel Indonesia - Inggris dengan Metode HTML DOM,” J. Sist. dan Teknol. Inf., vol. 5, no. 1, 2017.

P. Gao, H. Han, J. Guo, and M. Saeki, “Stable web scraping: An approach based on neighbour zone and path similarity of page elements,” Int. J. Web Eng. Technol., vol. 13, no. 4, 2018, doi: 10.1504/IJWET.2018.097561.

R. Yaqoob, Sanaa, M. Haris, Samadyar, and M. A. Shah, “The Price Scraping Bot Threat on E-commerce Store Using Custom XPATH Technique,” 2021, doi: 10.23919/ICAC50006.2021.9594223.

M. S. Rohman, H. A. Santoso, G. W. Saraswati, N. Anisa, and S. Winarsih, “Pemanfaatan Topic-Focused Crawler untuk Pembangunan Corpus Berita Bencana menggunakan Teknik Scrapy CSS Selector,” Semin. Nas. APTIKOM, 2019.

E. Uzun, “A regular expression generator based on CSS selectors for efficient extraction from HTML pages,” Turkish J. Electr. Eng. Comput. Sci., vol. 28, no. 6, 2020, doi: 10.3906/ELK-2004-67.

J. Attardi, “CSS Selectors,” in Modern CSS, 2020.

R. Gunawan, A. Rahmatulloh, I. Darmawan, and F. Firdaus, “Comparison of Web Scraping Techniques?: Regular Expression, HTML DOM and Xpath,” 2019, doi: 10.2991/icoiese-18.2019.50.

I. Darmawan, M. Maulana, R. Gunawan, and N. Widiyasono, “Evaluating Web Scraping Performance Using XPath, CSS Selector, Regular Expression, and HTML DOM With Multiprocessing Technical Applications,” Int. J. Informatics Vis., vol. 6, no. 4, 2022, doi: 10.30630/joiv.6.4.1525.

Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Impementasi Web Scraping Pada OJS Dengan Metode CSS Selector

Impementasi Web Scraping Pada OJS Dengan Metode CSS Selector

Authors

DOI:

Keywords:

Abstract

Downloads

References

ARTICLE HISTORY

Issue

Section

Most read articles by the same author(s)