classification


Query about NaiveBayes Classifier


I am building a text classifier for classifying reviews as positive or negative. I have a query on NaiveBayes classifier formula:
| P(label) * P(f1|label) * ... * P(fn|label)
| P(label|features) = --------------------------------------------
| P(features)
As per my understanding, probabilities are multiplied if the events occur together. E.g. what is the probability of A and B occurring together. Is it appropriate to multiply the probabilities in this case? Appreciate if someone can explain this formula in a bit detail. I am trying to do some manual classification (just to check some algorithm generated classifications which seem a tad off, this will enable me to identify the exact reason for misclassification).
In basic probability terms, to calculate p(label|feature1,feature2), we have to multiply the probabilites to calculate the occurrence of feature 1 and feature 2 together. But in this case I am not trying to calculate a standard probability, rather the strength of positivity/negativity of the text. So if I sum up the probabilities, I get a number which can identify the positivity/negativity quotient. This is a bit unconventional but do you think this can give some good results. The reason is the sum and product can be quite different. E.g. 2*2 =4 but 3*1 = 3
The class-conditional probabilities P(feature|label) can be multiplied together if they are statistically independent. However, it's been found in practice that Naive Bayes still produces good results even for class-conditional probabilities that are not independent. Thus, you can compute the individual class-conditional probabilities P(feature|label) from simple counting and then multiply them together.
One thing to note is that in some applications, these probabilities can be extremely small, resulting in potential numerical underflow. Thus, you may want to add together the logs of the probabilities (rather than multiply the probabilities).
I understand if the features were different like what is the probability of a person being male if the height was 170 cm and weight 200 pounds. Then these probabilities have to be multiplied together as these conditions (events) occur together. But in case of text classification, this is not valid as it really doesn't matter if the events occur together.. E.g. the probability of a review being positive given the occurrence of word best is 0.1 and the probability of a review being positive given the occurrence of word polite is 0.05, then the probability of the review being positive given the occurrence of both words (best and polite) is not 0.1*0.05. A more indicative number would be the sum of the probabilities (needs to be normalized),

Related Links

Logistic Regression(Classification Technique) on Time-dependent Predictors/variables Data
High Relative absolute error and Root relative squared error in classification
voting with average of probabilities in weka
Weka : how to use cross validation in code
Decision Tree relevent classification for this task?
Accuracy of a naive bayes classifier
Weka library java: how to get the prospect of a classification?
Multilabel Text Classification NLTK
Loss function for class imbalanced binary classifier in Tensor flow
can we use GMDH for two or three class classiication
How to normalize close range data?
Query about NaiveBayes Classifier
Suggested unsupervised feature selection / extraction method for 2 class classification?
WEKA - Classification - Training and Test Set
Chromosome representation in GA and DEAP
How to extract support vectors from SVMLight model

Categories

HOME
repository
ckan
kendo-ui-angular2
itunesconnect
fortran
postgresql-9.3
pelican
glsl
azure-resource-manager
query-string
opengl-es
psexec
kohana
browserstack
obd-ii
urllib2
watson-dialog
minimum
popup
jsfiddle
amazon-data-pipeline
ringcentral
jquery-easyui
udeploy
turn.js
cherrypy
screen-scraping
sasl
visual-studio-community
android-cursorloader
vtd-xml
epub
voyager
webmock
wit-ai
hippocms
linkurious
compiler-warnings
chrome-custom-tabs
ocpjp
xmldom
pyopenssl
playframework-2.3
hellosign
ios-keyboard-extension
qt3d
portaudio
stripes
virtualenvwrapper
instaparse
key-value-store
csc
angular-i18n
razorengine
pebble-js
clique
nugetgallery
nonblocking
app-data
commercetools
tvp
transport
om
exchange-server-2007
consul-template
litedb
playn
random-access
report-builder2.0
xmlserializer
parsefloat
cocos2d-x-2.x
drawable
nspopupbutton
cs193p
enquire.js
farpoint-spread
android-jack-and-jill
xcode5.1
infomaker
wpa
infinity.js
qmainwindow
nsautoreleasepool
microsoft-speech-platform
metapost
report-viewer2010
teamsystem
custom-cursor
commonsware
mass-emails
fxcopcmd
xcdatamodel
silverlight-2.0
for-xml
code-golf
dojox.gfx
pixel-bender
user-preferences
disk-based
tracd
paperless

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App