Biclustering (Co-clustering) of Categorical Data

Many real world data sets are characterized by categorical features. This type of features presents a challenge since they do not possess ordering relationship, the data sets usually have a high dimensionality and sparsity. Techniques that cope well with high-dimensional data are the biclustering or co-clustering approaches, they work by selecting and grouping a subset of the objects and features that are correlated locally inside the data matrix. This project studies the scalable approach known as HBLCoClust that finds a set of maximal biclusters by exploiting the hash collision of the objects signatures (Locality Sensitive Hashing). This algorithm will be applied to user profiling, topic modeling and social network analysis.

Graduate students: Andreia Gusmão

Related publications: