Arunas Ciuksys and Rita Butkiene, Kaunas University of Technology, Lithuania
Automatic Event Extraction (EE) identifies events from unstructured text. For Lithuanian, a lack of annotated corpora limits the progress. This study compares two strategies: (1) ML models trained on synthetic data generated by LLMs and (2) few-shot prompting with advanced LLMs (OpenAI GPT, Google Gemini). Results show that while synthetic data offers broad coverage, it suffers from lower precision. Few-shot approaches achieve higher precision but are recall-sensitive and require advanced prompt engineering. A hybrid approach combining both methods could optimize outcomes. These findings provide insights for developing scalable EE solutions that address the unique challenges of resource-scarce languages.
Event Extraction, Few-Shot Prompting, Synthetic Data, OpenAI GPT, Google Gemini, Lithuanian Language, NLP, Comparative Analysis