Zainab Mansur1, Nazlia Omar2 and Sabrina Tiun2, 1Omar Al-Mukhtar University, Libya, 2Universiti Kebangsaan Malaysia, Malaysia
Informal methods of communication, like tweets, rely heavily on initialization abbreviations to reduce message size and time, making them difficult to mine and normalize using existing methods. Therefore, this present study compiled a lexicon repository to normalize the initialism abbreviations used in tweets in the English language. Several components were taken into consideration while compiling the repository. This included the Tweepy Python library, keyword list, small developed rules, and online dictionaries. A lexicon repository of 300 abbreviations and their complete forms was compiled. This will be used in an ongoing study to normalize Twitter hate speech data and to detect it.
normalization, hate speech, Twitter, abbreviation, dictionary