classification


Query about NaiveBayes Classifier


I am building a text classifier for classifying reviews as positive or negative. I have a query on NaiveBayes classifier formula:
| P(label) * P(f1|label) * ... * P(fn|label)
| P(label|features) = --------------------------------------------
| P(features)
As per my understanding, probabilities are multiplied if the events occur together. E.g. what is the probability of A and B occurring together. Is it appropriate to multiply the probabilities in this case? Appreciate if someone can explain this formula in a bit detail. I am trying to do some manual classification (just to check some algorithm generated classifications which seem a tad off, this will enable me to identify the exact reason for misclassification).
In basic probability terms, to calculate p(label|feature1,feature2), we have to multiply the probabilites to calculate the occurrence of feature 1 and feature 2 together. But in this case I am not trying to calculate a standard probability, rather the strength of positivity/negativity of the text. So if I sum up the probabilities, I get a number which can identify the positivity/negativity quotient. This is a bit unconventional but do you think this can give some good results. The reason is the sum and product can be quite different. E.g. 2*2 =4 but 3*1 = 3
The class-conditional probabilities P(feature|label) can be multiplied together if they are statistically independent. However, it's been found in practice that Naive Bayes still produces good results even for class-conditional probabilities that are not independent. Thus, you can compute the individual class-conditional probabilities P(feature|label) from simple counting and then multiply them together.
One thing to note is that in some applications, these probabilities can be extremely small, resulting in potential numerical underflow. Thus, you may want to add together the logs of the probabilities (rather than multiply the probabilities).
I understand if the features were different like what is the probability of a person being male if the height was 170 cm and weight 200 pounds. Then these probabilities have to be multiplied together as these conditions (events) occur together. But in case of text classification, this is not valid as it really doesn't matter if the events occur together.. E.g. the probability of a review being positive given the occurrence of word best is 0.1 and the probability of a review being positive given the occurrence of word polite is 0.05, then the probability of the review being positive given the occurrence of both words (best and polite) is not 0.1*0.05. A more indicative number would be the sum of the probabilities (needs to be normalized),

Related Links

Classification using Mallet and MaxEntropy
Measuring Error Correlation of Classifiers
caffe: Confused about regression
How to cut a dendrogram in r
Building weka classifier
Does Orange data mining software has multi-layer perceptron classification?
User Classification in RapidMiner - output should be the user based on a fed test data
Error in building mean image file(Caffe)
caffe: probability distribution for regression / expanding classification (softmax layer) to allow 3D output
Does MLE produce a generative or discriminative classifier?
Basic Hidden Markov Model, Viterbi algorithm
Where do I write the code for LIBSVM?
How to understand the output of ADTree classification in WEKA
Issues regarding classification instead of regression using deep learing
Caffe produces negative loss values (Multi label classification with lmdb)
ibm watson document classification

Categories

HOME
actionscript-3
internet-explorer
optimization
gaussian
x86-64
where
opengl-es
psexec
docker-compose
rethinkdb
sitemap
wavefront
google-api-oauth
ephesoft
rename
minimax
ndepend
ldap-query
websharper
http-status-codes
saucelabs
eclipse-orion
lambda-calculus
responsive
nsoperation
ibatis
views
aws-devicefarm
scotty
include-path
bootstrap-accordion
orgchart
pypy
javapns
arp
oracle-nosql
wunderground
geocomplete
dom4j
android-databinding
lighting
oracle-bpm-suite
framebuffer
iterm2
blueimp
mixed-models
compiled-query
ape-phylo
web-technologies
vpython
head
npoco
togglebutton
jsfl
monaca
fluent-nhibernate-mapping
photoshop-script
spring-cloud-aws
patching
camellia
mod
alchemy
instabug
angular-promise
biginsights
windows-phone-8-emulator
om
identifying
syncano
tess4j
tokudb
data-representation
connection-timeout
shunting-yard
crtdbg.h
crystal-reports-7
illegalargumentexception
conditional-comments
redmine-plugins
nspopupbutton
android-relativelayout
azure-caching
iwork
ie-developer-tools
leap-year
select2-rails
magicsuggest
database-restore
cron4j
jquery-1.8
mysql-connector
rjs
execjs
soapexception
youtube.net-api
dojox.gfx
scala-designer
weak-typing

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App