Big Five Personality Detection on Twitter Users Using Gradient Boosted Decision Tree Method
DOI:
https://doi.org/10.30865/klik.v3i6.933Keywords:
Twitter; Tweet; Personality; Big Five; Gradient Boosted Decision TreeAbstract
In 2020, the Covid-19 virus caused a pandemic that made most people more active on social media, such as Twitter. Twitter has a tweet feature allows its users to send short messages about how they feel and think at that moment. Based on someone's tweet, we know their mindset, and it allows us to know the personality of that person. One model of personality is the Big Five personality. Big Five divides personality into five classes: openness, conscientiousness, extraversion, agreeableness, and neuroticism. Several ways can be done to determine personality, such as taking a psychological test. However, it can take a long time and total concentration. Therefore, this study conducted a Big Five personality detection on Twitter users using the Gradient Boosted Decision Tree (GBDT) method. This study aims to obtain a high accuracy value by weighting it through the TF-IDF method and using sentiment and emotion features. This study utilized an Indonesian dataset that was collected through Twitter API. This study consists of two scenario tests, with the first scenario test being carried out with an imbalanced dataset and the second scenario test being carried out by applying the oversampling technique with SMOTE method to handle the imbalanced dataset. By applying SMOTE method, this study obtained a high accuracy with a value of 60.36%.
Downloads
References
D. A. González-Padilla and L. Tortolero-Blanco, “Social media influence in the COVID-19 Pandemic,” International braz j urol, vol. 46, no. 1, pp. 120–124, Jul. 2020, doi: 10.1590/S1677-5538.IBJU.2020.S121.
P. E. Walck and E. W. Scripps, “Twitter: Social Communication in the Twitter Age,” International Journal of Interactive Communication Systems and Technologies, vol. 3, no. 2, pp. 66–69.
A. Y. O’Glasser, R. C. Jaffe, and M. Brooks, “To Tweet or Not to Tweet, That Is the Question,” Semin Nephrol, vol. 40, no. 3, pp. 249–263, May 2020, doi: 10.1016/J.SEMNEPHROL.2020.04.003.
E. S. Negara, D. Triadi, and R. Andryani, “Topic Modelling Twitter Data with Latent Dirichlet Allocation Method,” ICECOS 2019 - 3rd International Conference on Electrical Engineering and Computer Science, Proceeding, pp. 386–390, Oct. 2019, doi: 10.1109/ICECOS47637.2019.8984523.
V. Balakrishnan, S. Khan, and H. R. Arabnia, “Improving cyberbullying detection using Twitter users’ psychological features and machine learning,” Comput Secur, vol. 90, p. 101710, Mar. 2020, doi: 10.1016/J.COSE.2019.101710.
V. Balakrishnan, S. Khan, T. Fernandez, and H. R. Arabnia, “Cyberbullying detection on twitter using Big Five and Dark Triad features,” Pers Individ Dif, vol. 141, pp. 252–257, Apr. 2019, doi: 10.1016/J.PAID.2019.01.024.
M. Piotrowska, “The importance of personality characteristics and behavioral constraints for retirement saving,” Econ Anal Policy, vol. 64, pp. 194–220, Dec. 2019, doi: 10.1016/J.EAP.2019.09.001.
A. Oshio, K. Taku, M. Hirano, and G. Saeed, “Resilience and Big Five personality traits: A meta-analysis,” Pers Individ Dif, vol. 127, pp. 54–60, Jun. 2018, doi: 10.1016/J.PAID.2018.01.048.
S. V. THERIK, “Deteksi Kepribadian Big Five Pengguna Twitter Dengan Metode C4.5,” 2021, Accessed: Nov. 27, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/172327/slug/deteksi-kepribadian-big-five-pengguna-twitter-dengan-metode-c4-5.html
A. A. RIZKITA, “Prediksi Kepribadian Big Five Pengguna Media Sosial Twitter Dengan Metode Naive Bayes-Support Vector Machine (NBSVM),” 2022, Accessed: Nov. 27, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/176993/slug/prediksi-kepribadian-big-five-pengguna-media-sosial-twitter-dengan-metode-naive-bayes-support-vector-machine-nbsvm-.html
R. ELLANDI, “Prediksi kepribadian Big Five dengan Term-Frequency Inverse Document Frequency Menggunakan Metode k-Nearest Neighbor pada Twitter,” 2019, Accessed: Nov. 27, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/152263/slug/prediksi-kepribadian-big-five-dengan-term-frequency-inverse-document-frequency-menggunakan-metode-k-nearest-neighbor-pada-twitter.html
H. Rao et al., “Feature selection based on artificial bee colony and gradient boosting decision tree,” Appl Soft Comput, vol. 74, pp. 634–642, Jan. 2019, doi: 10.1016/J.ASOC.2018.10.036.
M. M. Nishat et al., “Performance Investigation of Different Boosting Algorithms in Predicting Chronic Kidney Disease,” 2020 2nd International Conference on Sustainable Technologies for Industry 4.0, STI 2020, Dec. 2020, doi: 10.1109/STI50764.2020.9350440.
Q. Li, Z. Wen, and B. He, “Practical Federated Gradient Boosting Decision Trees,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 4642–4649, Apr. 2020, doi: 10.1609/AAAI.V34I04.5895.
R. A. YAHYA, “Ekspansi Fitur Dengan fastText pada Klasifikasi Topik di Twitter Menggunakan Metode Gradient Boosted Decision Tree,” 2022, Accessed: Dec. 05, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/179208/slug/ekspansi-fitur-dengan-fasttext-pada-klasifikasi-topik-di-twitter-menggunakan-metode-gradient-boosted-decision-tree.html
D. T. MAULIDIA, “Ekspansi Fitur dengan Word2vec untuk Klasifikasi Topik dengan Gradient Boosted Decision Tree di Twitter,” 2022, Accessed: Dec. 05, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/179194/slug/ekspansi-fitur-dengan-word2vec-untuk-klasifikasi-topik-dengan-gradient-boosted-decision-tree-di-twitter.html
V. N. Gudivada, D. L. Rao, and A. R. Gudivada, “Information Retrieval: Concepts, Models, and Systems,” Handbook of Statistics, vol. 38, pp. 331–401, Jan. 2018, doi: 10.1016/BS.HOST.2018.07.009.
S. M. Mohammad, “Practical and Ethical Considerations in the Effective use of Emotion and Sentiment Lexicons,” Nov. 2020, Accessed: Jun. 16, 2023. [Online]. Available: https://arxiv.org/abs/2011.03492v2
W. Liang, S. Luo, G. Zhao, and H. Wu, “Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms,” Mathematics 2020, Vol. 8, Page 765, vol. 8, no. 5, p. 765, May 2020, doi: 10.3390/MATH8050765.
R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” 2020 11th International Conference on Information and Communication Systems, ICICS 2020, pp. 243–248, Apr. 2020, doi: 10.1109/ICICS49469.2020.239556.
J. Brandt and E. Lanzén, “A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification,” 2021, Accessed: Jun. 18, 2023. [Online]. Available: https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-432162
A. ANGSAWENI, “Identifikasi Kepribadian Big Five pada Pengguna Twitter menggunakan Metode AdaBoost,” 2022, Accessed: Nov. 27, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/183749/slug/identifikasi-kepribadian-big-five-pada-pengguna-twitter-menggunakan-metode-adaboost.html
Bila bermanfaat silahkan share artikel ini
Berikan Komentar Anda terhadap artikel Big Five Personality Detection on Twitter Users Using Gradient Boosted Decision Tree Method
ARTICLE HISTORY
Issue
Section
Copyright (c) 2023 Adhie Rachmatulloh Sugiono, Warih Maharani
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).