The five most common words appearing in spam emails areshipping!, today!, here!, available, and fingertips!. Many spamfilters separate spam from ham (email not considered to be spam)through application of Bayes' theorem. Suppose that for one emailaccount, in every messages is spam and the proportions of spammessages that have the five most common words in spam email aregiven below.
shipping!       0.050     Â
today!            0.047
here!             0.034
Available      0.016
fingertips!     0.016
Also suppose that the proportions of ham messages that have thesewords are
shipping! | 0.0016 |
today! | 0.0021 |
here! | 0.0021 |
available | 0.0041 |
fingertips! | 0.0010 |
Round your answers to three decimal places.
If a message includes the word shipping!, what is theprobability the message is spam?
If a message includes the word shipping!, what is theprobability the message is ham?
Should messages that include the word shipping! beflagged as spam?
b. If a message includes the word today!, what is theprobability the message is spam?
If a message includes the word here!, what is theprobability the message is spam?
Which of these two words is a stronger indicator that amessage is spam?
Why?
Because the probability is
c. If a message includes the word available, what is theprobability the message is spam?
If a message includes the word fingertips!, what is theprobability the message is spam?
Which of these two words is a stronger indicator that amessage is spam?
Why?
Because the probability is
d. What insights do the results of parts (b) and (c)yield about what enables a spam filter that uses Bayes' theorem towork effectively?
Explain.
It is easier to distinguish spam from ham when a wordoccurs in spam and less often in ham.