Machine Learning Classification and predication in weka
I am very new to machine learning. Sorry if there are any mistakes in my English. I am using the weka J48 Classification for prediction in true or false. I have almost 999K training set which i used to train the model. I used the cross validation method with 3 folds to train the Model which gives me accuracy of ~84%. Now after storing the model. i tried to test it on 50k dataset. which is giving very bad results and 50% of them are mismatch. I have 11 attributes with nominal and numeric fields. I dont know why its happening. I have two questions. How can i train to perform better on test set. what could be possible issues. I am using weka api in java.
It means that your model is overfit for your 999k training set and doesn't generalize well to your 50k testing set. You should look into cross-validating with (a good portion, but not all) of your 50k dataset in addition to your 999k. You may also want to try something higher than a k=3, k-fold crossvalidation, because k=3 folds may be too "coarse". Good luck!
Stacking Filters Weka Explorer
Converting separate text files containing training data and its labels to ARFF format
SVM-pref package from Cornell university
Text classification using Weka
different results by SMO, NaiveBayes, and BayesNet classifiers in weka
Dicompose LinSVM model into binary classifiers
weigh group of features as one with Weka
How to specify strings in Weka file?
Evaluating Test set using Weka
Need help interpret weka results
Different results in Weka GUI and Weka via Java code
imbalanced data classification with boosting algorithms
How to create ARFF file for 2D data points?
How to use weighted vote for classification using weka
Convert Web page to ARFF File for Weka classification
Liblinear bias greater than 2 improving accuracy?