Fatimah Alshamari1,2 and Abdou Youssef1, 1The George Washington University, USA, 2Taibah University, KSA
A Mathematical Function Recognition (MFR) is an important research direction for efficient downstream math tasks such as information retrieval, knowledge extraction, and question answering. The aim of this task is to identify and classify mathematical function into a predefined set of function. However, the lack of annotated data is the bottleneck in the development of an MFR automated model. We begin this paper by describing our approach to creating a labelled dataset for MFR. Then, to identify five categories of mathematical functions, we fine-tuned a set of common pre-trained models: BERT base-cased, BERT baseuncased, DistilBERT-cased, and DistilBERT-uncased. As a result, our contributions in this paper include: (1) an annotated MFR dataset that future researchers can use; and (2) SOTA results obtained by finetuning pre-trained models for the MFR task. Our experiments demonstrate that the proposed approach achieved a high-quality recognition, with an F1 score of 96.80% on a held-out test set provided by DistilBERT-cased model.
Named entity recognition, Math information retrieval, Math language processing, Pre-trained Language models.