Suggested unsupervised feature selection / extraction method for 2 class classification?
I've got a set of F features e.g. Lab color space, entropy. By concatenating all features together, I obtain a feature vector of dimension d (between 12 and 50, depending on which features selected. I usually get between 1000 and 5000 new samples, denoted x. A Gaussian Mixture Model is then trained with the vectors, but I don't know which class the features are from. What I know though, is that there are only 2 classes. Based on the GMM prediction I get a probability of that feature vector belonging to class 1 or 2. My question now is: How do I obtain the best subset of features, for instance only entropy and normalized rgb, that will give me the best classification accuracy? I guess this is achieved, if the class separability is increased, due to the feature subset selection. Maybe I can utilize Fisher's linear discriminant analysis? Since I already have the mean and covariance matrices obtained from the GMM. But wouldn't I have to calculate the score for each combination of features then? Would be nice to get some help if this is a unrewarding approach and I'm on the wrong track and/or any other suggestions?
One way of finding "informative" features is to use the features that will maximise the log likelihood. You could do this with cross validation. https://www.cs.cmu.edu/~kdeng/thesis/feature.pdf Another idea might be to use another unsupervised algorithm that automatically selects features such as an clustering forest http://research.microsoft.com/pubs/155552/decisionForests_MSR_TR_2011_114.pdf In that case the clustering algorithm will automatically split the data based on information gain. Fisher LDA will not select features but project your original data into a lower dimensional subspace. If you are looking into the subspace method another interesting approach might be spectral clustering, which also happens in a subspace or unsupervised neural networks such as auto encoder. Hope that helps
different results by SMO, NaiveBayes, and BayesNet classifiers in weka
Dicompose LinSVM model into binary classifiers
weigh group of features as one with Weka
How to specify strings in Weka file?
Evaluating Test set using Weka
Need help interpret weka results
Different results in Weka GUI and Weka via Java code
imbalanced data classification with boosting algorithms
How to create ARFF file for 2D data points?
How to use weighted vote for classification using weka
Convert Web page to ARFF File for Weka classification
Liblinear bias greater than 2 improving accuracy?
Weka: Does training helps if test run is followed by training run?
Difference between logistic regression with binary output and classification
Weka - How to find input format for classifiers
How to incorporate Weka Naive Bayes model into Java Code