ADC Keynote Speaker
James C. Bezdek
University of West Florida
Approximate Data Mining in Very Large Relational Data
Clustering is widely used in data mining to find the underlying categories of objects in a database. For example, given a database of gene products in the form of pairwise relational similarity data, clustering can reveal groups of genes with similar functionalities.
A key challenge for using clustering techniques in practice is how to reduce their complexity on large databases. This talk will describe eNERF, an extended version of non-Euclidean relational fuzzy c-means for approximate clustering in very large relational data sets.
The eNERF procedure consists of four parts:
- selection of distinguished features to monitor during progressive sampling;
- progressively sampling a square relation matrix until the sample passes a goodness of fit test;
- clustering the sample; and
- extension of the results from the sample by using an iterative procedure to compute fuzzy membership values for all of the objects remaining after clustering of the accepted sample.
We will demonstrate the results of eNERF on an example of clustering gene product data.
Professor Bezdek is the William Craig Nystul Professor of Computer Science at the University of West Florida.
His research interests include Pattern Recognition, Optimization, Image Processing, and Medical Applications. Professor Bezdek holds a PhD degree from Cornell University. Jim is founding editor of International Journal of Approximate Reasoning and IEEE Transactions on Fuzzy Systems. His previous experience includes the directorship of Boeing's HTC Inf. Proc. Lab, and a term as head of Computer Science at the University of South Carolina.