Consider the problem of classifying documents by their content, for example into spam and non-spam E-mails. Imagine that documents are drawn from a number of classes of documents which can be modelled as sets of words where the (independent) probability that the i-th word of a given document occurs in a document from class C can be written as
(For this treatment, we simplify things further by assuming that the probability of a word in a document is independent of the length of a document, or that all documents are of the same length).
Then the probability of a given document D, given a class C, is
The question that we desire to answer is: "what is the probability that a given document D belongs to a given class C?"
Now, by their definition, (see Probability axiom)
and
Bayes' theorem manipulates these into a statement of probability in terms of likelihood.
Assume for the moment that there are only two classes, S and ¬S.
and
Using the Bayesian result above, we can write:
Dividing one by the other gives:
Which can be re-factored as:
Thus, the probability ratio p(S | D) / p(¬S | D) can be expressed in terms of a series of likelihood ratios[?]. The actual probability p(S | D) can be easily computed from ln(p(S | D) / p(¬S | D)) based on the observation that p(S | D) + p(¬S | D) = 1.
Taking the logarithm of all these ratios, we have:
This technique of "log-likelihood ratios" is a common technique in statistics. In the case of two mutually exclusive alternatives (such as this example), the conversion of a log-likelihood ratio to a probability takes the form of a sigmoid curve: see logit for details.
In real life, the naive Bayes approach is more powerful than might be expected from the extreme simplicity of its model; in particular, it is fairly robust in the presence of non-independent attributes wi. Recent theoretical analysis has shown why the naive Bayes classifier is so robust.
See also:
External links:
Search Encyclopedia
|
Featured Article
|