classification


WEKA - Classification - Training and Test Set


I am performing a classification problem using 3 different classifiers namely, Decision Tree, Naive Bayes and IBK. I have two data sets which are the same in layout and attribute names but the values in each are different.
Training Set Example;
State
Population
HouseholdIncome
FamilyIncome
perCapInc
NumUnderPov
EducationLevel_1
EducationLevel_2
EducationLevel_3
UnemploymentRate
EmployedRate
ViolentCrimesPerPop
Crime
Rate
8, 0.19, 0.37, 0.39, 0.4, 0.08, 0.1, 0.18, 0.48, 0.27 ,0.68 ,0.2 ,Low
I would like my decision tree to predict using the 12 attributes if the Target Class value is Low, Med or High based on the ViolentCrimesPerPop figure which in this example is 0.2.
My question is.... On my Test set do I just provide more un-seen examples in the same format or should I take away one of the attributes so i can see if it has learnt anything?
It is not a good thing to test your classifier over your same training data, because your model has learnt, hopefully, to classify those instances correctly.
The usual set up is to train over the training dataset and then test it over a different dataset (with the same format/structure), to see how it performs.
It is a good idea to separate your dataset into three separate sets: Training, Testing and Validation.
The training set is used to train each of the models that you are building. This is usually checked for performance using a testing set. As the designer continues to adjust the parameters of their model (for example, pruning options on Decision Trees and k for k-NN or Neural Network parameters), you can see how well the model is performing against the testing set.
Finally, once these parameters have been completed for your model, you can then run these against a validation set to confirm that the model did not over-fit on the testing data (due to parameter adjustments applied to the model itself).
A further discussion of these sets may be found here.
Generally, I have used a data split of 60-20-20, however it is common to use 50-25-25 as well, it really comes down to how much data you have to play with.
I hope this helps!

Related Links

How to cut a dendrogram in r
Building weka classifier
Does Orange data mining software has multi-layer perceptron classification?
User Classification in RapidMiner - output should be the user based on a fed test data
Error in building mean image file(Caffe)
caffe: probability distribution for regression / expanding classification (softmax layer) to allow 3D output
Does MLE produce a generative or discriminative classifier?
Basic Hidden Markov Model, Viterbi algorithm
Where do I write the code for LIBSVM?
How to understand the output of ADTree classification in WEKA
Issues regarding classification instead of regression using deep learing
Caffe produces negative loss values (Multi label classification with lmdb)
ibm watson document classification
Sparse Representation Classifier Accuracy
Multi-Class Classification in Caffe of HDF5 data
Unknown identification using Random Forest

Categories

HOME
oauth-2.0
primefaces
virtual-machine
pyspark
cors
openacc
ldap
dronekit
x86-64
nltk
snmp
procmon
python-3.4
browserstack
pip
samsung-gear-s2
wkhtmltopdf
cracking
http-status-codes
browsermob
swift3.1
lambda-calculus
dompdf
css-float
smooth-scrolling
powerpivot
openoffice-basic
include-path
rvm
ps
inspec
expressionengine
scripting-language
ocl
lines
scene7
maven-2
nsdateformatter
icu
google-chrome-console
datagridviewcombobox
rexx
nshttpurlresponse
niagara-ax
standard-error
extjs3
dotnet-cli
recursive-query
vertex-shader
breadcrumbs
easy-thumbnails
racket-student-languages
networkstream
altova
rowname
geokit
kik
google-app-engine-php
kramdown
pdcurses
transparentproxy
teamwork
chunks
iphone-6
system.io.packaging
powershell-v1.0
agent-based-modeling
parallel-port
rpt
angular-file-upload
xpand
dateadd
banana-pi
android-relativelayout
satisfiability
fastcgi-mono-server
google-checkout
rbm
correctness
invalidoperationexception
junit-rule
netbiscuits
datasheet
jqueryi-ui-buttonset
datejs
wdm
funambol
meego-harmattan
response-time
fxcopcmd
mysql-logic
glassfish-embedded
atmega16
projectgen
content-delivery-network

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App