classification


Query about NaiveBayes Classifier


I am building a text classifier for classifying reviews as positive or negative. I have a query on NaiveBayes classifier formula:
| P(label) * P(f1|label) * ... * P(fn|label)
| P(label|features) = --------------------------------------------
| P(features)
As per my understanding, probabilities are multiplied if the events occur together. E.g. what is the probability of A and B occurring together. Is it appropriate to multiply the probabilities in this case? Appreciate if someone can explain this formula in a bit detail. I am trying to do some manual classification (just to check some algorithm generated classifications which seem a tad off, this will enable me to identify the exact reason for misclassification).
In basic probability terms, to calculate p(label|feature1,feature2), we have to multiply the probabilites to calculate the occurrence of feature 1 and feature 2 together. But in this case I am not trying to calculate a standard probability, rather the strength of positivity/negativity of the text. So if I sum up the probabilities, I get a number which can identify the positivity/negativity quotient. This is a bit unconventional but do you think this can give some good results. The reason is the sum and product can be quite different. E.g. 2*2 =4 but 3*1 = 3
The class-conditional probabilities P(feature|label) can be multiplied together if they are statistically independent. However, it's been found in practice that Naive Bayes still produces good results even for class-conditional probabilities that are not independent. Thus, you can compute the individual class-conditional probabilities P(feature|label) from simple counting and then multiply them together.
One thing to note is that in some applications, these probabilities can be extremely small, resulting in potential numerical underflow. Thus, you may want to add together the logs of the probabilities (rather than multiply the probabilities).
I understand if the features were different like what is the probability of a person being male if the height was 170 cm and weight 200 pounds. Then these probabilities have to be multiplied together as these conditions (events) occur together. But in case of text classification, this is not valid as it really doesn't matter if the events occur together.. E.g. the probability of a review being positive given the occurrence of word best is 0.1 and the probability of a review being positive given the occurrence of word polite is 0.05, then the probability of the review being positive given the occurrence of both words (best and polite) is not 0.1*0.05. A more indicative number would be the sum of the probabilities (needs to be normalized),

Related Links

Evaluating Test set using Weka
Need help interpret weka results
Different results in Weka GUI and Weka via Java code
imbalanced data classification with boosting algorithms
How to create ARFF file for 2D data points?
How to use weighted vote for classification using weka
Convert Web page to ARFF File for Weka classification
Liblinear bias greater than 2 improving accuracy?
Weka: Does training helps if test run is followed by training run?
Difference between logistic regression with binary output and classification
Weka - How to find input format for classifiers
How to incorporate Weka Naive Bayes model into Java Code
RapidMiner: Classifying new examples without re-running the existing trained model
How to check whether data is being overfiited for that model in weka
Feature Extraction for Face Dectection
rapid-miner formating datsets with many parameter

Categories

HOME
actionscript-3
redux
alasql
pyspark
okhttp3
hiveql
sms
autocad
handlebars.js
samsung-gear-s2
angularjs-ng-repeat
resharper
element
console-application
jsfiddle
bookmarklet
smooth-scrolling
draw2d
aiml
gpo
vala
inline-assembly
wai-aria
quality-center
jira-zephyr
scalatest
multiple-inheritance
akamai
snapkit
ports
outlook-2010
jquery-inputmask
file-conversion
text-extraction
jenkins-docker
activesupport
programming-pearls
multi-select
lattice
fileopendialog
computed-properties
distcc
alfred
change-password
sprockets
objectify
user-profile
ideavim
factorization
wonderware
web-performance
recode
javapackager
php-gd
biginsights
autoscaling
autopep8
pymol
google-experiments
qxorm
caption
launcher
os.walk
register-allocation
indic
nspopupbutton
webshim
node-mongodb-native
xcode6.1-gm-seed
thucydides
grape-api
objectdatasource
source-depot
outlook-object-model
windows-scheduler
boost.build
zepto
volatility
jquery-1.8
scsf
httppostedfilebase
qvariant
noise-reduction
nscharacterset
web-search
mdd
coderush-xpress

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App