Offline
St Louis

lol that is an epic reply.

Offline
Detroit, Michigan

Last edited by snesei (Mar 5, 2015 9:39 pm)

Offline

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am Samstag, 8. Januar 2005 22:55 schrieb Fajar Priyanto:
> On Sunday 09 January 2005 04:47 am, Matt Kettler wrote:
[..]

> > Train spam as spam, train ham as ham. Let the statistics deal with the
> > overlap. By trying to avoid training "spamish" ham or "hamish" spam
> > you're just doing your training a big disservice by making it
> > unrealistic.
>
> Thanks Matt,
> So talking statistically, does it mean I have to train SA about 'ham' as
> many as 'spam'? Right now, I train SA mostly on spams.

You must train ham and spam. How should the Bayes filter now what is ham if
you didn't train it?

As far as I understand the Bayes filter searches for tokens in the email. If a
token was found in 30 spam and 10 ham mails then the propability for being
spam is 75%. But if you only train spam the Bayes filter would say: if have
learned 30 spam mails but no ham so the propability for being spam is 100%.

(The bayes calculation is done with some ham/spam tokens. How many tokens are
taken into account I don't know)

If you only / mostly train spam this will poison your database and the
FalsePositves will grow. To keep FalsePositive low, you should teach SA all
ham.

Its unlikely to train as much ham as spam because there is more spam. But this
is no harm. The Bayesian filter work on tokens found. Lets assume you have
teached 200 spam and 100 ham. 100 spam and 100 ham contained the token x. If
x is found in an new message, then the spam prob is 50% even if the
propability of being in a ham message is 100%.

If you teach only half the ham messages the spam-ham ratio would be 100 to 50
which gives a propability of 66% for being spam.


Regards

Thomas

- --
icq:133073900
http://www.t-arend.de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (GNU/Linux)

iD8DBQFB4RLeHe2ZLU3NgHsRAjgSAKCHYwQWLMJExHdtrgb0OLXHHy00XwCeKIyw
Y7oZeRBZ22sOlpZFmc5Ln7M=
=i9Cw
-----END PGP SIGNATURE-----

Offline
Atlanta, GA

10/10

Offline
Minneapolis, MN

SELECT lol_wut,
              idk

FROM   chipmusicdotorg.RUBIXCUBE8     RX8
JOIN     chipmusicdotorg.why                     why
   ON    (why.idk =
              RX8.idk)