Encyclopedia > Naive Bayesian classification

  Article Content

Naive Bayesian classification

Here is a worked example of naive Bayesian classification which is an application of Bayesian inference to the document classification[?] problem.

Consider the problem of classifying documents by their content, for example into spam and non-spam E-mails. Imagine that documents are drawn from a number of classes of documents which can be modelled as sets of words where the (independent) probability that the i-th word of a given document occurs in a document from class C can be written as

<math>p(w_i \vert C)</math>

(For this treatment, we simplify things further by assuming that the probability of a word in a document is independent of the length of a document, or that all documents are of the same length).

Then the probability of a given document D, given a class C, is

<math>p(D\vert C)=\prod_i p(w_i \vert C)</math>

The question that we desire to answer is: "what is the probability that a given document D belongs to a given class C?"

Now, by their definition, (see Probability axiom)

<math>p(D\vert C)={p(D\cap C)\over p(C)}</math>

and

<math>p(C\vert D)={p(D\cap C)\over p(D)}</math>

Bayes' theorem manipulates these into a statement of probability in terms of likelihood.

<math>p(C\vert D)={p(C)\over p(D)}\,p(D\vert C)</math>

Assume for the moment that there are only two classes, S and ¬S.

<math>p(D\vert S)=\prod_i p(w_i \vert S)</math>

and

<math>p(D\vert\neg S)=\prod_i p(w_i\vert\neg S)</math>

Using the Bayesian result above, we can write:

<math>p(S\vert D)={p(S)\over p(D)}\,\prod_i p(w_i \vert S)</math>

<math>p(\neg S\vert D)={p(\neg S)\over p(D)}\,\prod_i p(w_i \vert\neg S)</math>

Dividing one by the other gives:

<math>{p(S\vert D)\over p(\neg S\vert D)}={p(S)\,\prod_i p(w_i \vert S)\over p(\neg S)\,\prod_i p(w_i \vert\neg S)}</math>

Which can be re-factored as:

<math>{p(S)\over p(\neg S)}\,\prod_i {p(w_i \vert S)\over p(w_i \vert\neg S)}</math>

Thus, the probability ratio p(S | D) / p(¬S | D) can be expressed in terms of a series of likelihood ratios[?]. The actual probability p(S | D) can be easily computed from ln(p(S | D) / p(¬S | D)) based on the observation that p(S | D) + p(¬S | D) = 1.

Taking the logarithm of all these ratios, we have:

<math>\ln{p(S\vert D)\over p(\neg S\vert D)}=\ln{p(S)\over p(\neg S)}+\sum_i \ln{p(w_i\vert S)\over p(w_i\vert\neg S)}</math>

This technique of "log-likelihood ratios" is a common technique in statistics. In the case of two mutually exclusive alternatives (such as this example), the conversion of a log-likelihood ratio to a probability takes the form of a sigmoid curve: see logit for details.

In real life, the naive Bayes approach is more powerful than might be expected from the extreme simplicity of its model; in particular, it is fairly robust in the presence of non-independent attributes wi. Recent theoretical analysis has shown why the naive Bayes classifier is so robust.

See also:

External links:



All Wikipedia text is available under the terms of the GNU Free Documentation License

 
  Search Encyclopedia

Search over one million articles, find something about almost anything!
 
 
  
  Featured Article
East Hampton North, New York

... housing units at an average density of 155.8/km² (403.6/mi²). The racial makeup of the town is 82.46% White, 7.39% African American, 0.08% Native American, 1.62% ...

 
 
 
This page was created in 26.7 ms