keyboard_arrow_up
Abbreviation Dictionary for Twitter Hate Speech

Authors

Zainab Mansur1, Nazlia Omar2 and Sabrina Tiun2, 1Omar Al-Mukhtar University, Libya, 2Universiti Kebangsaan Malaysia, Malaysia

Abstract

Informal methods of communication, like tweets, rely heavily on initialization abbreviations to reduce message size and time, making them difficult to mine and normalize using existing methods. Therefore, this present study compiled a lexicon repository to normalize the initialism abbreviations used in tweets in the English language. Several components were taken into consideration while compiling the repository. This included the Tweepy Python library, keyword list, small developed rules, and online dictionaries. A lexicon repository of 300 abbreviations and their complete forms was compiled. This will be used in an ongoing study to normalize Twitter hate speech data and to detect it.

Keywords

normalization, hate speech, Twitter, abbreviation, dictionary