keyboard_arrow_up
Comparison of LLM Few-Shot vs. Synthetic Data Approaches for Lithuanian Event Extraction

Authors

Arunas Ciuksys and Rita Butkiene, Kaunas University of Technology, Lithuania

Abstract

Automatic Event Extraction (EE) identifies events from unstructured text. For Lithuanian, a lack of annotated corpora limits the progress. This study compares two strategies: (1) ML models trained on synthetic data generated by LLMs and (2) few-shot prompting with advanced LLMs (OpenAI GPT, Google Gemini). Results show that while synthetic data offers broad coverage, it suffers from lower precision. Few-shot approaches achieve higher precision but are recall-sensitive and require advanced prompt engineering. A hybrid approach combining both methods could optimize outcomes. These findings provide insights for developing scalable EE solutions that address the unique challenges of resource-scarce languages.

Keywords

Event Extraction, Few-Shot Prompting, Synthetic Data, OpenAI GPT, Google Gemini, Lithuanian Language, NLP, Comparative Analysis