Big Five Personality Detection on Twitter Users Using Gradient Boosted Decision Tree Method


Authors

  • Adhie Rachmatulloh Sugiono Telkom University, Bandung, Indonesia
  • Warih Maharani Telkom University, Bandung, Indonesia

DOI:

https://doi.org/10.30865/klik.v3i6.933

Keywords:

Twitter; Tweet; Personality; Big Five; Gradient Boosted Decision Tree

Abstract

In 2020, the Covid-19 virus caused a pandemic that made most people more active on social media, such as Twitter. Twitter has a tweet feature allows its users to send short messages about how they feel and think at that moment. Based on someone's tweet, we know their mindset, and it allows us to know the personality of that person. One model of personality is the Big Five personality. Big Five divides personality into five classes: openness, conscientiousness, extraversion, agreeableness, and neuroticism. Several ways can be done to determine personality, such as taking a psychological test. However, it can take a long time and total concentration. Therefore, this study conducted a Big Five personality detection on Twitter users using the Gradient Boosted Decision Tree (GBDT) method. This study aims to obtain a high accuracy value by weighting it through the TF-IDF method and using sentiment and emotion features. This study utilized an Indonesian dataset that was collected through Twitter API. This study consists of two scenario tests, with the first scenario test being carried out with an imbalanced dataset and the second scenario test being carried out by applying the oversampling technique with SMOTE method to handle the imbalanced dataset. By applying SMOTE method, this study obtained a high accuracy with a value of 60.36%.

Downloads

Download data is not yet available.

References

D. A. González-Padilla and L. Tortolero-Blanco, “Social media influence in the COVID-19 Pandemic,” International braz j urol, vol. 46, no. 1, pp. 120–124, Jul. 2020, doi: 10.1590/S1677-5538.IBJU.2020.S121.

P. E. Walck and E. W. Scripps, “Twitter: Social Communication in the Twitter Age,” International Journal of Interactive Communication Systems and Technologies, vol. 3, no. 2, pp. 66–69.

A. Y. O’Glasser, R. C. Jaffe, and M. Brooks, “To Tweet or Not to Tweet, That Is the Question,” Semin Nephrol, vol. 40, no. 3, pp. 249–263, May 2020, doi: 10.1016/J.SEMNEPHROL.2020.04.003.

E. S. Negara, D. Triadi, and R. Andryani, “Topic Modelling Twitter Data with Latent Dirichlet Allocation Method,” ICECOS 2019 - 3rd International Conference on Electrical Engineering and Computer Science, Proceeding, pp. 386–390, Oct. 2019, doi: 10.1109/ICECOS47637.2019.8984523.

V. Balakrishnan, S. Khan, and H. R. Arabnia, “Improving cyberbullying detection using Twitter users’ psychological features and machine learning,” Comput Secur, vol. 90, p. 101710, Mar. 2020, doi: 10.1016/J.COSE.2019.101710.

V. Balakrishnan, S. Khan, T. Fernandez, and H. R. Arabnia, “Cyberbullying detection on twitter using Big Five and Dark Triad features,” Pers Individ Dif, vol. 141, pp. 252–257, Apr. 2019, doi: 10.1016/J.PAID.2019.01.024.

M. Piotrowska, “The importance of personality characteristics and behavioral constraints for retirement saving,” Econ Anal Policy, vol. 64, pp. 194–220, Dec. 2019, doi: 10.1016/J.EAP.2019.09.001.

A. Oshio, K. Taku, M. Hirano, and G. Saeed, “Resilience and Big Five personality traits: A meta-analysis,” Pers Individ Dif, vol. 127, pp. 54–60, Jun. 2018, doi: 10.1016/J.PAID.2018.01.048.

S. V. THERIK, “Deteksi Kepribadian Big Five Pengguna Twitter Dengan Metode C4.5,” 2021, Accessed: Nov. 27, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/172327/slug/deteksi-kepribadian-big-five-pengguna-twitter-dengan-metode-c4-5.html

A. A. RIZKITA, “Prediksi Kepribadian Big Five Pengguna Media Sosial Twitter Dengan Metode Naive Bayes-Support Vector Machine (NBSVM),” 2022, Accessed: Nov. 27, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/176993/slug/prediksi-kepribadian-big-five-pengguna-media-sosial-twitter-dengan-metode-naive-bayes-support-vector-machine-nbsvm-.html

R. ELLANDI, “Prediksi kepribadian Big Five dengan Term-Frequency Inverse Document Frequency Menggunakan Metode k-Nearest Neighbor pada Twitter,” 2019, Accessed: Nov. 27, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/152263/slug/prediksi-kepribadian-big-five-dengan-term-frequency-inverse-document-frequency-menggunakan-metode-k-nearest-neighbor-pada-twitter.html

H. Rao et al., “Feature selection based on artificial bee colony and gradient boosting decision tree,” Appl Soft Comput, vol. 74, pp. 634–642, Jan. 2019, doi: 10.1016/J.ASOC.2018.10.036.

M. M. Nishat et al., “Performance Investigation of Different Boosting Algorithms in Predicting Chronic Kidney Disease,” 2020 2nd International Conference on Sustainable Technologies for Industry 4.0, STI 2020, Dec. 2020, doi: 10.1109/STI50764.2020.9350440.

Q. Li, Z. Wen, and B. He, “Practical Federated Gradient Boosting Decision Trees,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 04, pp. 4642–4649, Apr. 2020, doi: 10.1609/AAAI.V34I04.5895.

R. A. YAHYA, “Ekspansi Fitur Dengan fastText pada Klasifikasi Topik di Twitter Menggunakan Metode Gradient Boosted Decision Tree,” 2022, Accessed: Dec. 05, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/179208/slug/ekspansi-fitur-dengan-fasttext-pada-klasifikasi-topik-di-twitter-menggunakan-metode-gradient-boosted-decision-tree.html

D. T. MAULIDIA, “Ekspansi Fitur dengan Word2vec untuk Klasifikasi Topik dengan Gradient Boosted Decision Tree di Twitter,” 2022, Accessed: Dec. 05, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/179194/slug/ekspansi-fitur-dengan-word2vec-untuk-klasifikasi-topik-dengan-gradient-boosted-decision-tree-di-twitter.html

V. N. Gudivada, D. L. Rao, and A. R. Gudivada, “Information Retrieval: Concepts, Models, and Systems,” Handbook of Statistics, vol. 38, pp. 331–401, Jan. 2018, doi: 10.1016/BS.HOST.2018.07.009.

S. M. Mohammad, “Practical and Ethical Considerations in the Effective use of Emotion and Sentiment Lexicons,” Nov. 2020, Accessed: Jun. 16, 2023. [Online]. Available: https://arxiv.org/abs/2011.03492v2

W. Liang, S. Luo, G. Zhao, and H. Wu, “Predicting Hard Rock Pillar Stability Using GBDT, XGBoost, and LightGBM Algorithms,” Mathematics 2020, Vol. 8, Page 765, vol. 8, no. 5, p. 765, May 2020, doi: 10.3390/MATH8050765.

R. Mohammed, J. Rawashdeh, and M. Abdullah, “Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results,” 2020 11th International Conference on Information and Communication Systems, ICICS 2020, pp. 243–248, Apr. 2020, doi: 10.1109/ICICS49469.2020.239556.

J. Brandt and E. Lanzén, “A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification,” 2021, Accessed: Jun. 18, 2023. [Online]. Available: https://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-432162

A. ANGSAWENI, “Identifikasi Kepribadian Big Five pada Pengguna Twitter menggunakan Metode AdaBoost,” 2022, Accessed: Nov. 27, 2022. [Online]. Available: https://openlibrary.telkomuniversity.ac.id/home/catalog/id/183749/slug/identifikasi-kepribadian-big-five-pada-pengguna-twitter-menggunakan-metode-adaboost.html


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Big Five Personality Detection on Twitter Users Using Gradient Boosted Decision Tree Method

Dimensions Badge

ARTICLE HISTORY


Published: 2023-06-24
Abstract View: 135 times
PDF Download: 146 times

Issue

Section

Articles