classification


WEKA - Classification - Training and Test Set


I am performing a classification problem using 3 different classifiers namely, Decision Tree, Naive Bayes and IBK. I have two data sets which are the same in layout and attribute names but the values in each are different.
Training Set Example;
State
Population
HouseholdIncome
FamilyIncome
perCapInc
NumUnderPov
EducationLevel_1
EducationLevel_2
EducationLevel_3
UnemploymentRate
EmployedRate
ViolentCrimesPerPop
Crime
Rate
8, 0.19, 0.37, 0.39, 0.4, 0.08, 0.1, 0.18, 0.48, 0.27 ,0.68 ,0.2 ,Low
I would like my decision tree to predict using the 12 attributes if the Target Class value is Low, Med or High based on the ViolentCrimesPerPop figure which in this example is 0.2.
My question is.... On my Test set do I just provide more un-seen examples in the same format or should I take away one of the attributes so i can see if it has learnt anything?
It is not a good thing to test your classifier over your same training data, because your model has learnt, hopefully, to classify those instances correctly.
The usual set up is to train over the training dataset and then test it over a different dataset (with the same format/structure), to see how it performs.
It is a good idea to separate your dataset into three separate sets: Training, Testing and Validation.
The training set is used to train each of the models that you are building. This is usually checked for performance using a testing set. As the designer continues to adjust the parameters of their model (for example, pruning options on Decision Trees and k for k-NN or Neural Network parameters), you can see how well the model is performing against the testing set.
Finally, once these parameters have been completed for your model, you can then run these against a validation set to confirm that the model did not over-fit on the testing data (due to parameter adjustments applied to the model itself).
A further discussion of these sets may be found here.
Generally, I have used a data split of 60-20-20, however it is common to use 50-25-25 as well, it really comes down to how much data you have to play with.
I hope this helps!

Related Links

Weka : how to use cross validation in code
Decision Tree relevent classification for this task?
Accuracy of a naive bayes classifier
Weka library java: how to get the prospect of a classification?
Multilabel Text Classification NLTK
Loss function for class imbalanced binary classifier in Tensor flow
can we use GMDH for two or three class classiication
How to normalize close range data?
Query about NaiveBayes Classifier
Suggested unsupervised feature selection / extraction method for 2 class classification?
WEKA - Classification - Training and Test Set
Chromosome representation in GA and DEAP
How to extract support vectors from SVMLight model
Stanford Classifier: generating model for on the fly classification (eg big data stream)?
Classification model using xgboost package
Is there an attribute for vectors in ARFF for Weka?

Categories

HOME
gcc
stata
cors
weblogic
mifare
servicenow
country-codes
revit-api
minimax
jcodemodel
facebook-sharer
email-attachments
mbed
fastreport
project-reactor
civicrm
custom-component
dev-c++
connectiq
procedural-generation
prepros
akavache
multiple-inheritance
checkmarx
intermec
tikz
visual-studio-debugging
availability
incapsula
tunnel
lighting
const
blueimp
office-interop
ssdp
quartz
irr
datagridviewcombobox
codepen
mef
activeandroid
riemann
kendo-scheduler
liferay-ide
critical-section
node.js-client
rmq
trusted-computing
disassembly
catalyst
sharing
toggleclass
x12
hdr
django-taggit
android-progressbar
video-embedding
jsondoc
askbot
dynamics-nav-2013
github-for-mac
opendolphin
recode
pushwoosh
tess4j
hexagonal-tiles
dpkt
httpmodule
static-files
crtdbg.h
grunt-contrib-connect
mind-manager
novell
react-os
nsnotificationcenter
cocos2d-x-2.x
dotcmis
disabled-control
connections
mdp
xtify
randomaccessfile
aspnet-compiler
octokit
nokogiri
extensibility
junction-table
bho
gmail-imap
configurationsection
azure-appfabric
coderush-xpress
tracd

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App