classification


WEKA - Classification - Training and Test Set


I am performing a classification problem using 3 different classifiers namely, Decision Tree, Naive Bayes and IBK. I have two data sets which are the same in layout and attribute names but the values in each are different.
Training Set Example;
State
Population
HouseholdIncome
FamilyIncome
perCapInc
NumUnderPov
EducationLevel_1
EducationLevel_2
EducationLevel_3
UnemploymentRate
EmployedRate
ViolentCrimesPerPop
Crime
Rate
8, 0.19, 0.37, 0.39, 0.4, 0.08, 0.1, 0.18, 0.48, 0.27 ,0.68 ,0.2 ,Low
I would like my decision tree to predict using the 12 attributes if the Target Class value is Low, Med or High based on the ViolentCrimesPerPop figure which in this example is 0.2.
My question is.... On my Test set do I just provide more un-seen examples in the same format or should I take away one of the attributes so i can see if it has learnt anything?
It is not a good thing to test your classifier over your same training data, because your model has learnt, hopefully, to classify those instances correctly.
The usual set up is to train over the training dataset and then test it over a different dataset (with the same format/structure), to see how it performs.
It is a good idea to separate your dataset into three separate sets: Training, Testing and Validation.
The training set is used to train each of the models that you are building. This is usually checked for performance using a testing set. As the designer continues to adjust the parameters of their model (for example, pruning options on Decision Trees and k for k-NN or Neural Network parameters), you can see how well the model is performing against the testing set.
Finally, once these parameters have been completed for your model, you can then run these against a validation set to confirm that the model did not over-fit on the testing data (due to parameter adjustments applied to the model itself).
A further discussion of these sets may be found here.
Generally, I have used a data split of 60-20-20, however it is common to use 50-25-25 as well, it really comes down to how much data you have to play with.
I hope this helps!

Related Links

imbalanced data classification with boosting algorithms
How to create ARFF file for 2D data points?
How to use weighted vote for classification using weka
Convert Web page to ARFF File for Weka classification
Liblinear bias greater than 2 improving accuracy?
Weka: Does training helps if test run is followed by training run?
Difference between logistic regression with binary output and classification
Weka - How to find input format for classifiers
How to incorporate Weka Naive Bayes model into Java Code
RapidMiner: Classifying new examples without re-running the existing trained model
How to check whether data is being overfiited for that model in weka
Feature Extraction for Face Dectection
rapid-miner formating datsets with many parameter
text classification methods? SVM and decision tree
Multilabel classification with SVM using rapidminer
Add values from multiple columns in pivot table

Categories

HOME
ckan
domain-driven-design
cors
pandoc
css-selectors
richtextbox
aggregate-functions
schema
zabbix
install
vsixmanifest
google-my-business-api
mamp
unreal-engine4
solidity
lambda-calculus
ringcentral
alert
beacon
aws-devicefarm
turn.js
matlab-gui
uipath
functional-dependencies
waterfall
utc
semantic-ui-react
jboss-arquillian
opencsv
rtos
imageprocessor
apache-falcon
tunnel
wunderground
cocoa-scripting
leaderboard
mediastream
intervention
smooth-streaming
f#-fake
angular-fullstack
android-doze-and-standby
jenkins-2
jcs
appcmd
vertex
aws-rds
activesupport
sharpssh
seyren
multi-select
dimensionality-reduction
dbi
ldd
cosign-api
nd4j
nonblocking
opendolphin
gnu-sort
imageicon
bioconductor
ruby-1.8.7
mojolicious
windows-phone-8-emulator
android-togglebutton
cyberduck
avrcp
asp.net-2.0
crtdbg.h
zoneminder
sat4j
satisfiability
db2-connect
codio
xml-validation
google-voice
ie-developer-tools
azure-pack
extensibility
usersettings
android-library
dynamic-expresso
symbian3
msbuild-task
biztalk-deployment
motordriver
aspmenu-control
pdf-reader
mysql-connector
idtabs
log-shipping
jquery-click-event
cryptolicensing
datarelation
versions
uccapi
carbon-emacs
appointment
scala-designer
user-preferences
wtsapi32
disk-based
managed-code
weak-typing
urlscan

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App