classification


Suggested unsupervised feature selection / extraction method for 2 class classification?


I've got a set of F features e.g. Lab color space, entropy. By concatenating all features together, I obtain a feature vector of dimension d (between 12 and 50, depending on which features selected.
I usually get between 1000 and 5000 new samples, denoted x. A Gaussian Mixture Model is then trained with the vectors, but I don't know which class the features are from. What I know though, is that there are only 2 classes. Based on the GMM prediction I get a probability of that feature vector belonging to class 1 or 2.
My question now is: How do I obtain the best subset of features, for instance only entropy and normalized rgb, that will give me the best classification accuracy? I guess this is achieved, if the class separability is increased, due to the feature subset selection.
Maybe I can utilize Fisher's linear discriminant analysis? Since I already have the mean and covariance matrices obtained from the GMM. But wouldn't I have to calculate the score for each combination of features then?
Would be nice to get some help if this is a unrewarding approach and I'm on the wrong track and/or any other suggestions?
One way of finding "informative" features is to use the features that will maximise the log likelihood. You could do this with cross validation.
https://www.cs.cmu.edu/~kdeng/thesis/feature.pdf
Another idea might be to use another unsupervised algorithm that automatically selects features such as an clustering forest
http://research.microsoft.com/pubs/155552/decisionForests_MSR_TR_2011_114.pdf
In that case the clustering algorithm will automatically split the data based on information gain.
Fisher LDA will not select features but project your original data into a lower dimensional subspace. If you are looking into the subspace method
another interesting approach might be spectral clustering, which also happens
in a subspace or unsupervised neural networks such as auto encoder.
Hope that helps

Related Links

getting paragraph representation for unseen paragraphs in doc2vec
Does Weka setClassIndex and setAttributeIndices start attribute from different rage?
Criteria to classify retail customers as churn Y or N
How to quantify similarity of tree models? (XGB, Random Forest, Gradient Boosting, etc.)
Logistic Regression(Classification Technique) on Time-dependent Predictors/variables Data
High Relative absolute error and Root relative squared error in classification
voting with average of probabilities in weka
Weka : how to use cross validation in code
Decision Tree relevent classification for this task?
Accuracy of a naive bayes classifier
Weka library java: how to get the prospect of a classification?
Multilabel Text Classification NLTK
Loss function for class imbalanced binary classifier in Tensor flow
can we use GMDH for two or three class classiication
How to normalize close range data?
Query about NaiveBayes Classifier

Categories

HOME
oop
grizzly
macros
stanford-nlp
salesforce
pelican
aix
skypedeveloper
arcgis
singleton
sms
multipartform-data
docker-compose
autocad
vsixmanifest
apache-httpclient-4.x
angular-universal
yui
opendaylight
hql
filehelpers
cucumberjs
minimagick
matlab-gui
rvm
uipath
rancher
nsopenpanel
resx
federated
2checkout
arp
textview
availability
android-scrollview
android-geofence
client-server
stencyl
queryover
latency
delphi-xe
redbean
synchronized
line-endings
anonymous-function
hierarchical-clustering
dotnet-cli
perldoc
multi-targeting
ios-keyboard-extension
stringbuilder
android-tabs
xcode7.1
magick.net
shtml
periodic-processing
web-performance
controlpanel
tvp
mojolicious
syncano
properties-file
azure-mobile-app
avrcp
joomla3.3
eaaccessory
litedb
stereotype
log-analysis
grails-3.0.9
esb-toolkit-2.1
system.diagnostics
data-quality-services
google-experiments
cordova-3
react-os
nsurlrequest
dirname
livescript
disparity-mapping
step
webshim
enquire.js
pidgin
power-management
nop
thucydides
google-mirror-api
google-voice
friendly-id
misfire-instruction
tlbimp
symbian3
coinbase-php
spawn
linear-interpolation
motordriver
expresso-store
php-gettext
sessiontracking
http-response-codes
table-valued-parameters
silverlight-2.0
ihtmldocument2
ironpython-studio

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App