Forgot password?
|
|
|
|
We were unable to sign you in.
Please verify your user name and password and try again. If you do not have a TEC account, register now.


If you receive errors when attempting to view this white paper, please install the latest version of Adobe Reader.

"Roaring Penguin's CanIt® software solutions will stop spam before it reaches your mail server. CanIt products provide per user Bayesian analysis including access to the RPTN Bayesian training database."
Source: Roaring Penguin Software

Resources Related to The Roaring Penguin Training Network:

The Roaring Penguin Training Network

Bayesian E-mail Filtering is also known as : Bayesian Approach to Filtering Junk E-Mail, Bayesian Spam Filtering, RPTN Bayes Method E Mail Filtering, Anti-Spam Solution Software, Bayesian Spam Protection Canit, Unsolicited Mail Bayesian Methods Whitepapers, Mailboxes Security,
Fraud Junk E Mail Spam Blocker Solution Canit Roaring Penguin, RP Training Network Bayes Statistics Junk E Mail Protection, Roaring Penguin Training Network RPTN, Blocker Solution Software Roaring Penguin, Canit Products Roaring Penguin Bayesian Technique , Canit Email Filtering RTPN Database Access, Roaring Penguin RPTN Bayes Statistics, Roaring Penguin Software RPTN Database Access, Roaring Penguin Training Network Bayesian E-Mail Filtering, Bayesian Filtering Canit Training Network, Bayes Theroem Roaring Penguin Software Inc, Encrypting E-Mail Mechanism, Fighting Spam Bayes Method Whitepaper, Bayesian Approach to Filtering Junk E-Mail Whitepaper, GNU Privacy Guard GPG Signature, Blocker Software Roaring Penguin Whitepaper, Bayes Theroem Article.

Abstract

This white paper describes the Roaring Penguin Training Network (RPTN), a mechanism for sharing Bayes votes among multiple CanIt installations.

1 Introduction

A recent paper at the 2004 USENIX Large Installation System Administration conference by Jeremy Blosser and David Josephsen presented evidence that Bayesian filtering could be quite effective using a shared Bayes database, even among hundreds or thousands of different users. This lead to the Roaring Penguin Training Network (RPTN), a mechanism for sharing Bayes votes among different CanIt customers.

The theory of RPTN is as follows:

  1. Each CanIt installation whose administrator volunteers to submit data to RPTN keeps track of messages that are hand-trained.
  2. Periodically, the CanIt installation contacts the central RPTN server and authenticates itself for the purpose of RPTN data upload.
  3. The central RPTN server grants permission to the CanIt installation to upload a certain number of spam and non-spam signatures.
  4. The CanIt installation randomly samples the hand-trained mail and sends up to the requested number of signatures to the RPTN server. This is called an RPTN Report.
  5. Periodically, the RPTN server runs quality-assurance checks over the submitted signature sets, and aggregates those that pass the QA stage.
  6. The aggregated data is made available to CanIt installations. Periodically, a CanIt installation can download the aggregated Bayes statistics. Administrators (in the case of CanIt-PRO, stream owners) can choose to inherit from the RPTN Bayes data. In this case, the RPTN Bayes statistics are added to the individual CanIt installation (or stream) statistics, and the combined statistics are used for Bayesian analysis.

2 Submitting to RPTN

By default, CanIt will not submit information to RPTN. There may be some privacy concerns, because tokens from e-mail messages are submitted to Roaring Penguin's server, and they may appear in the aggregated statistics.

For example, consider the following e-mail message:

Subject: Meeting tomorrow

Hi, everyone.

We'll meet tomorrow in the boardroom to discuss
RPTN. Please bring the notes from the design
meeting last week; I'll supply coffee and muffins.

Regards,

David.

The message would result in the following signature being submitted to RPTN:

David:1;I'll:1;I'll+supply:1;Please:1;Please+bring:1;RPTN:1; RPTN+Please:1;Regards:1;Regards+David:1;We'll:1;We'll+meet:1; boardroom:1;boardroom+discuss:1;bring:1;bring+notes:1;coffee:1; coffee+muffins:1;design:1;design+meeting:1;discuss:1;discuss+RPTN:1; everyone:1;everyone+We'll:1;last:1;last+week:1;meet:1;meet+tomorrow:1; meeting:1;meeting+last:1;muffins:1;muffins+Regards:1;notes:1; notes+design:1;s*Meeting:1;s*Meeting+tomorrow:1;s*tomorrow:1;supply:1; supply+coffee:1;tomorrow:1;tomorrow+boardroom:1;week:1;week+I'll:1

As you can see, the words and word-pairs from the original message are visible. While it is a little difficult to reconstruct the original message, the information imparted by a signature may be a privacy concern. If you feel that submitting signatures to RPTN may be a privacy risk, please do not enable RPTN submission.

Naturally, the more people submitting to RPTN, the better, and if you do not submit data, then your "votes" don't count. It's up to each CanIt administrator to weigh the privacy risks before deciding to enable RPTN submissions. Roaring Penguin will guard the signatures very carefully, but our personnel will have access to the cleartext signatures. Note that if you have Gnu Privacy Guard installed, then RPTN reports are encrypted using our RPTN public key before being e-mailed. This makes it practically impossible for an eavesdropper to obtain the signatures.

The privacy risk from the aggregated statistics is much lower, because the aggregated statistics consist of tokens from thousands of signatures from many different submitters, and there is no indication of which installation submitted which tokens.

3 Authentication

Before a CanIt installation is allowed to submit data to RPTN, it must authenticate itself. The installation connects via HTTPS to the RPTN server and logs in using your Roaring Penguin download username and password. If authentication is successful, the RPTN server returns four values to the CanIt installation:

  1. Ns, the maximum number of "spam" votes the server will accept in the next RPTN submission.
  2. Nn, the maximum number of "non-spam" votes the server will accept in the next RPTN submission.
  3. C, a 160-bit random cookie that identifies the RPTN submission.
  4. S, a 160-bit secret used to authenticate the RPTN submission.
 

4 Submission

RPTN reports are submitted back to RPTN via e-mail. E-mail was chosen for the following reasons:

  • CanIt installations must already be able to send out e-mail, so submitting via e-mail does not require any firewall changes.
  • E-mail is robust in the face of server failures; if the RPTN server is temporarily unavailable, the RPTN reports will queue until it comes back up.
  • GPG is a free, robust and well-known mechanism for encrypting e-mail, and shipping our public key with CanIt makes it simple to securely encrypt RPTN reports.

A CanIt installation constructs an RPTN report as follows:

  1. It places the cookie C in the Subject: header
  2. It gathers all the signatures for submission and compresses them with bzip2 to yield data D.
  3. If GNU Privacy Guard is installed, it encrypts D with our RPTN public key to yield ciphertext E.
  4. It calculates the SHA1 hash of S (the 160-bit secret) concatenated with E (or D if GPG is not installed.) This result is placed in an X-Message- Authenticator: header. This authenticator is designed to stop people from tampering with the SMTP report en route to the RPTN server.
  5. The data E (or D if GPG is not installed) is Base-64 encoded and placed in the message body.

The RPTN report is then e-mailed to a special address that accepts RPTN report submissions.

Before accepting an RPTN report, the RPTN server takes the following actions:

  1. It looks up the cookieC from the Subject: header. If no such cookie is found, or if a previous submission using the same cookie has been submitted, the report is rejected.
  2. It retrieves the secret S associated with C and validates the message hash. If the message cannot be validated, it is rejected.
  3. If the signatures have been encrypted, they are decrypted using our RPTN private key.
  4. The server notes that cookie C can no longer be reused, and it places the report in an incoming spool area. In addition, it stores the login name of the user who submitted the report (obtained by looking upC in the authentication database.)
 

5 Quality Assurance

Because RPTN distributes Bayes data to many customers, we need to ensure that it's difficult for bad data to contaminate the aggregated statistics. Quality assurance is done using two methods: static analysis and dynamic analysis.

5.1 Static Analysis

In the static analysis phase, we extract the list of signatures submitted by each user and use them to run Bayesian analysis against a corpus of spam and non-spam messages. Signature sets that have an unacceptably high error rate are discarded.

5.2 Dynamic Analysis

In dynamic analysis, we compare each signature set against all the other signature sets. If a signature from customer i is marked as spam, but the data from the other N -1 customers would rate it as non-spam, we count a false-positive. Similarly, if i is marked as non-spam, but all the other signatures say that it is spam, we count a false-negative. Signature sets with an unacceptable high false-positive or falsenegative rate are discarded; we weight false-positives higher than false-negatives when determing whether or not to drop a signature set.

6 Aggregation and Download

Those signature sets that survive the quality assurance phase are aggregated and dumped to a CSV file. The file is compressed with bzip2, signed with gpg and made available for CanIt installations to download.

When a CanIt installation notices a new RPTN data set available (indicated with a special DNS lookup), it downloads the aggregated statistics, verifies the GPG signature, uncompresses the file, and loads the statistics into the Bayes database. The statistics are loaded into a stream called @@RPTN. Streams can choose to inherit Bayes data from this stream if they wish to use RPTN statistics.

Searches related to The Roaring Penguin Training Network:
Roaring Penguin Training Network RPTN Whitepaper | Canit Products Roaring Penguin Bayesian Technique Whitepaper | Canit Email Filtering RTPN Database Access | RPTN Bayes Method E Mail Filtering | Roaring Penguin RPTN Bayes Statistics | Blocker Solution Software Roaring Penguin | Roaring Penguin Software RPTN Database Access | RP Training Network Bayes Statistics Junk E Mail Protection | Fraud Junk E Mail Spam Blocker Solution Canit Roaring Penguin | Roaring Penguin Training Network Bayesian E-Mail Filtering | Bayesian Filtering Canit Training Network | Bayesian Spam Protection Canit | Bayes Theroem Roaring Penguin Software Inc | Anti-Spam Solution Software |
Fighting Spam Bayes Method Whitepaper | Bayesian Approach to Filtering Junk E-Mail Whitepaper | GNU Privacy Guard GPG Signature | Blocker Software Roaring Penguin Whitepaper | Bayesian Approach to Filtering Junk E-Mail Whitepaper | Bayes Theroem Article | Bayesian Analysis Article | Roaring Penguin Training Network RPTN | Bayes Votes | Canit Installations | Track of Messages | Central RPTN Server | RPTN Data Upload | And Non-spam Signatures | Hand-trained Mail | Bayes Statistics | Canit Pro | RPTN Bayes Data | RPTN Bayes Statistics | Bayesian Analysis | E-Mail Messages | RPTN Submission | RPTN Reports | GNU Privacy Guard | RPTN Public Key Encryption | Roaring Penguin Download Username | Votes | GPG Mechanism for Encrypting E-Mail | GPG | Encrypting E-Mail Mechanism | Securely Encrypt RPTN Reports | RPTN Private Key | Static Analysis | Dynamic Analysis | Quality Assurance RPTN | GPG Signature | Bayes Database | Bayesian Spam Filtering | Roaring Penguin Software Inc | Canit Email Filtering | E Mail Filtering | Anti Spam Software | Anti Spam | Anti-Spam Solution | Blocker Software | E Mail Filtering Roaring Penguin | Blocker Software Roaring Penguin | RTPN Method | Bayes Theroem | Bayes Stats | Canit SMB Anti-Span Solution | Anti-Spam Software Solution | Email Protection | Professional Mails Protection | Protection | Viruses Protection | Anti-Viruses Software | Phising Attempts Protection | Junk E-Mail Protection Software | Stop Spam | Mail Server Security | Mail Server Spam | Email Protection | Junk Email Filtering | Filetring Solution | E-Mail Filtering & Virus Protection Software | Fighting Spam | Canit Anti Spam Solutions | Virus-fighting Software Products | Filtering Mail Servers | Mailboxes Security | Mailboxes Protection | Flood of E-Mail Viruses | Mail Filtering Tools | Junk Email Filtering Whitepaper | Unsolicited Mail | Software Solutions Against Spam | Defense | Identify Spam E-Mail | Mail Filtering Program | Bayesian Spam Filtering Mechanism | Method to Detect Spam | Server-Side Email Filters | Bayesian Spam Filtering Techniques | Applications of Bayesian Filtering | Quarantine Mechanisms | Spaminess | Unsolicited Mail Bayesian Methods Whitepapers | Spamicity | Bayesian Spam Detection Software | Naive Bayes Classifier | Bayesian Poisoning | Bayesian Methods | Virus Attachments | Bayesian Approach to Filtering Junk E-Mail | Stopping E-Mail Abuse | Anti-Spam Techniques | Computer Viruses | Spamfighter | Spambayes | Reader | Roaring Penguin Technology Evaluation Centers TEC | RTPN Canit Whitepaper Technology Evaluation Centers TEC | Canit Installation Roaring Penguin Bayesian E Mail Filtering TEC | Stop Spam Article | E-Mail Filtering Virus Protection Software Whitepaper | Identify Spam E-Mail Whitepaper | Anti-Spam Techniques Bayesian Methods Whitepapers |
Home  |   Careers  |   Contact Us  |   Glossary  |   Special Offers  |   Software Features & Functions  |   Software Selection Shortcuts  |   Feedback  |   Terms of Use  |   Privacy Policy

©2012 Technology Evaluation Centers Inc. All rights reserved. Search powered by Google