Efektivitas Pelatihan Awal Berbasis Domain Spesifik Legal-BERT Untuk Natural Language Processing Hukum: Replikasi Dan Perluasan Studi Casehold


Authors

  • Hasani Zakiri Universitas Amikom Yogyakarta, Indonesia
  • Alva Hendi Muhammad Universitas Amikom Yogyakarta, Indonesia
  • Asro Nasiri Universitas Amikom Yogyakarta, Indonesia

DOI:

https://doi.org/10.47065/jieee.v5i1.2610

Keywords:

Legal NLP; Domain-Specific Pretraining; ; Legal-BERT; Transformer; CaseHOLD.

Abstract

Abstract?The emergence of domain-specific language models has demonstrated significant potential across various specialized fields. However, their effectiveness in legal natural language processing (NLP) remains underexplored, particularly given the unique challenges posed by legal text complexity and specialized terminology. Legal NLP has practical applications such as automated legal precedent search and court decision analysis that can accelerate legal research from weeks to hours. This study evaluates the CaseHOLD dataset to provide comprehensive empirical validation of domain-specific pretraining benefits for legal NLP tasks with focus on data efficiency and context complexity analysis. We conducted systematic experiments using the CaseHOLD dataset containing 53,000 legal multiple-choice questions. We compared four models: BiLSTM, BERT-base, Legal-BERT, and RoBERTa across varying data volumes (1%, 10%, 50%, 100%) and context complexity levels. Paired t-tests with 10-fold cross-validation and Bonferroni correction ensure robust methodology that guarantees finding reliability. Legal-BERT achieved the highest macro-F1 score of 69.5% (95% CI: [68.0, 71.0]), demonstrating a statistically significant improvement of 7.2 percentage points over BERT-base (62.3%, p < 0.001, Cohen's d= 1.23). RoBERTa showed competitive performance at 68.9%, nearly matching Legal-BERT. The most substantial improvements occurred under limited data conditions with 16.6% improvement at 1% training data. Context complexity analysis revealed an inverted-U pattern with optimal performance on 41-60 word texts. The introduced Domain Specificity Score (DS-score) showed strong positive correlation (r = 0.73, p < 0.001) with pretraining effectiveness, explaining 53.3% of performance improvement variance. These findings provide empirical evidence that domain-specific pretraining offers significant advantages for legal NLP tasks, particularly under data-constrained conditions and moderate-high context complexity. The key distinction of this research is the development of a predictive DS-score framework enabling benefit estimation before implementation, unlike previous studies that only evaluated post-hoc performance. The results have practical implications for developing legal NLP systems in resource-limited environments and provide optimal implementation guidance for Legal-BERT.

Downloads

Download data is not yet available.

References

J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 4171–4186. doi: 10.18653/v1/N19-1423.

A. Rogers, O. Kovaleva, and A. Rumshisky, “A Primer on Neural Network Models for Natural Language Processing,” J. Artif. Intell. Res., vol. 57, pp. 615–732, 2020, doi: 10.1613/jair.1.11030.

I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Aletras, and I. Androutsopoulos, “LEGAL-BERT: The Muppets straight out of Law School,” in Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 2020, pp. 2898–2904. doi: 10.18653/v1/2020.findings-emnlp.261.

L. Zheng, N. Guha, B. R. Anderson, P. Henderson, and D. E. Ho, “When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings,” in Proc. 18th International Conference on Artificial Intelligence and Law, São Paulo, Brazil, 2021, pp. 159–168. doi: 10.1145/3462757.3466088.

I. Chalkidis et al., “LexGLUE: A Benchmark Dataset for Legal Language Understanding in English,” in Proc. 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 4310–4330. doi: 10.18653/v1/2022.acl-long.297.

L. Manor and J. J. Li, “Plain English Summarization of Contracts,” in Proc. Natural Legal Language Processing Workshop at EMNLP 2019, Hong Kong, China, 2019, pp. 1–11. doi: 10.18653/v1/D19-5001.

H. Chen, T. Cohn, and T. Baldwin, “Legal Judgment Prediction with Multi-Stage Case Representation Learning,” in Proc. 30th ACM International Conference on Information and Knowledge Management, Gold Coast, Australia, 2021, pp. 298–307. doi: 10.1145/3459637.3482324.

H. Westermann, J. Savelka, K. Benyekhlef, and K. D. Ashley, “Using Summarization to Discover Argument Facets in Online Ideological Dialog,” in Proc. 2022 Conference of the North American Chapter of the Association for Computational Linguistics, Minneapolis, MN, USA, 2022, pp. 1412–1422. doi: 10.18653/v1/2022.naacl-main.104.

J. Lee et al., “BioBERT: a pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020, doi: 10.1093/bioinformatics/btz682.

I. Beltagy, K. Lo, and A. Cohan, “SciBERT: A Pretrained Language Model for Scientific Text,” in Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 2019, pp. 3615–3620. doi: 10.18653/v1/D19-1371.

E. Alsentzer et al., “Publicly Available Clinical BERT Embeddings,” in Proc. 2nd Clinical Natural Language Processing Workshop, Minneapolis, MN, USA, 2019, pp. 72–78. doi: 10.18653/v1/W19-1909.

Y. Li, T. Wehbe, F. Ahmad, H. Wang, and Y. Luo, “Clinical-Longformer and Clinical-BigBird: Transformers for long clinical sequences,” 2022. doi: 10.48550/arXiv.2201.11838.

P. Colombo et al., “SaulLM-7B: A pioneering Large Language Model for Law,” 2024. doi: 10.48550/arXiv.2403.03883.

T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” in Proc. 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 2020, pp. 38–45. doi: 10.18653/v1/2020.emnlp-demos.6.

S. Ruder, M. E. Peters, S. Swayamdipta, and T. Wolf, “Transfer Learning in Natural Language Processing,” in Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2019, pp. 15–18. doi: 10.18653/v1/N19-5004.

Y. Liu et al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” 2019. doi: 10.48550/arXiv.1907.11692.

N. Reimers and I. Gurevych, “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” in Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 2019, pp. 3982–3992. doi: 10.18653/v1/D19-1410.

R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni, “Green AI,” Commun. ACM, vol. 63, no. 12, pp. 54–63, 2020, doi: 10.1145/3381831.

Y. Bengio, A. Courville, and P. Vincent, “Representation Learning: A Review and New Perspectives,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 8, pp. 1798–1828, 2013, doi: 10.1109/TPAMI.2013.50.

A. Vaswani et al., “Attention is All You Need,” in Proc. 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 5998–6008.

J. Niklaus, V. Matoshi, M. Stürmer, I. Chalkidis, and D. E. Ho, “MultiLegalPile: A 689GB Multilingual Legal Corpus,” in Proc. Data and Machine Learning Research Workshop at ICLR 2023, Kigali, Rwanda, 2023, pp. 1–15.

N. Guha et al., “LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models,” in Proc. 37th Conference on Neural Information Processing Systems, New Orleans, LA, USA, 2023, pp. 1–15.

M. Kenton and L. K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” 2019. doi: 10.48550/arXiv.1810.04805.


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Efektivitas Pelatihan Awal Berbasis Domain Spesifik Legal-BERT Untuk Natural Language Processing Hukum: Replikasi Dan Perluasan Studi Casehold

Dimensions Badge

ARTICLE HISTORY


Published: 2025-09-20
Abstract View: 0 times
PDF Download: 0 times