Data Mining For Metadata In Telecom Sector
DOI:
https://doi.org/10.64252/btb1af93Keywords:
Data mining, Metadata, Text mining, Witten-Bell smoothing formula, Genetic Algorithm Cycle.Abstract
In the telecom sector, for metadata mining, IE, clustering, NLP, and complex algorithms such as TF-IDF, Witten-Bell smoothing method, BERT, Genetic Algorithm Cycle are used. Voices and messages gathered by the telecom industry from customers comprise customer reactions, service requests, network upkeep records and others which are unstructured textual data that go through NLP analysis. Since IE is the process of automatically identifying and structuring specific entities such as phone numbers, addresses, service kinds and technical terms from unstructured texts from several sources, IE enriches NLP. For telecom metadata to be analyzed, categorized, or sorted to make them easily discernible, comprehensible, and manageable, clustering is very vital. Another statistical measure called TF-IDF (Term Frequency-Inverse Document Frequency) is applied for analyzing the relevance of a word in a document regarding the importance of that word in a set of documents. The Witten-Bell smoothing formula as a way of estimating the occurrence of new words or events in language modeling in connection with the observed data.