Weight Based Feature Subse t Selection from High Dimensional Databases
DOI:
https://doi.org/10.64252/4d4tq183Keywords:
Datamining,high-dimensionaldata,clustering,featuresubsetselection.Abstract
The problem of curse of dimensionality is understood in the wake of data mining algorithms taking long time and consuming more resources in processing databases that contain high- dimensions. When there are more dimensions in the dataset, naturally it consumes more resources. However, there might be some attributes that are not needed for processing. In other words, it is possible to reduce dimensions by identifying representative features that provide full coverage of all attributes. In this paper we proposed an algorithm that based on weight, entropy and gain for identifying representative features in a high-dimensional data. Entropy and gain are computed systematically in order to have a weight associated with all attributes. Different pairs of attributes are correlated and representative features are selected. The representative features are the features that represent all features in the given dataset. We built a prototype application that demonstrates proof of the concept. The empirical results revealed that the proposed algorithm is effective and can be used in the data mining operations for improving performance. This is achieved by reducing dimensions through the proposed feature subset selection.