Talk:Spam Patrol
From Katrina Help Info
| Table of contents |
SpammerBlockPattern
- Pending and Past Updates for SpammerBlockPattern
- When a user attempts to save a page with filtered text, she will be redirected to MediaWiki:Spamprotectiontext, which links to Spam Protection Comment
- SpammerBlockPattern latest version
- Get latest version: http://www.myseattle.com/mediawiki/wgSpamRegex.txt
- Test latest version: http://www.myseattle.com/mediawiki/index.php/More:Spam_Box
- As of 3-17-2006, updates are processed daily by a cron job [1] (http://groups.yahoo.com/group/admin-katrina/message/1216)
- SpammerBlockPattern contacts at KatrinaHelp.info
- Administrator: user:jwalling
- Tech support: Rudi Cilibrasi and AnnaLissa Cruz
- Post request at http://groups.yahoo.com/group/admin-katrina/
- Send request to mailto:admin-katrina@yahoogroups-dot-com (yahoogroups.com)
LocalSettings.php
To prevent a spammer from saving wiki edits with problematic content, use the variable '$wgSpamRegex'. Set the variable in LocalSettings.php (overriding the value appearing in DefaultSettings.php). Set it to a regular expression (RegEx) for matching on any URLs (or parts of URLS) which you do not want to allow users to link to. You can also match any other bad content which you wish to ban. Users are presented with an explanatory message, indicating which part of their edit text is not allowed.
RegEx Examples
- Sample Regular Expressions to block common spam fragments
$wgSpamRegex = "/overflow:\s*auto;\s*height:\s*\dpx/"; #big net $wgSpamRegex = "/height:\s*\dpx/"; #bigger net
Regular Expressions Howto
The SpammerBlockPattern is formatted using Regular Expression (RegEx) syntax.
- Resources for learning RegEx
- http://en.wikipedia.org/wiki/Regular_expression
- http://etext.lib.virginia.edu/services/helpsheets/unix/regex.html
- http://www.regular-expressions.info/
Spam Blacklist Extension
The above approach may become too cumbersome. Another approach is to have a long blacklist identifying many known spamming URLs and spam terms, in a more readable format (not a single regular expression). With the Spam Blacklist extension you can allow some of your users to edit the blacklist on a wiki page, and you can fetch updates from external sources.
- Resources
- http://meta.wikimedia.org/wiki/SpamBlacklist_extension
- http://meta.wikimedia.org/wiki/Spam_blacklist
- http://cvs.sourceforge.net/viewcvs.py/wikipedia/extensions/SpamBlacklist/
- http://cyber.law.harvard.edu/globalvoices/wiki/index.php/Spam_blacklist
- http://cyber.law.harvard.edu/dyn/globalvoices/wiki/index.php/User_talk:Sj#Spam_blacklist
More External Resources to Fight Wiki Spam
- Wikimedia: Wiki Spam (http://meta.wikimedia.org/wiki/Wiki_Spam)
- Wikimedia: Anti-spam Features (http://meta.wikimedia.org/wiki/Anti-spam_Features)
- Wikipedia: Link spam (http://en.wikipedia.org/wiki/Link_spam)
- Spam Chongqing (http://chongq.blogspot.com/) [2] (http://chongqed.org/chongqed.html)[3] (http://chongqed.blogspot.com/2004/11/two-reasons-why-indexing-kept-pages-is.html)
- Interview with a link spammer (http://www.theregister.co.uk/2005/01/31/link_spamer_interview/)
- Bad Behavior (http://www.ioerror.us/software/bad-behavior/) is a set of PHP scripts which prevents spambots from accessing your site by analyzing their actual HTTP requests and comparing them to profiles from known spambots. It goes far beyond User-Agent and Referer, however. Bad Behavior is available for several PHP-based software packages, and also can be integrated in seconds into any PHP script.
- Installing and Using Bad Behavior on MediaWiki (http://www.ioerror.us/software/bad-behavior/installing-and-using-bad-behavior/on-mediawiki/)
- Fighting spam in Wikka (http://wikka.jsnx.com/WikkaSpamFighting?show_comments=1&showall=1)
- PHP Naive Bayesian Filter (http://www.phpgeek.com/pragmacms/index.php?layout=main&cslot_1=14)
RSS checkpoint
This is a RSS checkpoint
- --jwalling 21:46, 15 Mar 2008 (CET)
- --jwalling 20:43, 19 Mar 2008 (CET)
- --jwalling 10:34, 1 Apr 2008 (CEST)
- --jwalling 23:03, 14 Apr 2008 (CEST)
- --jwalling 22:38, 29 Apr 2008 (CEST)
- --jwalling 04:15, 3 May 2008 (CEST)
- --jwalling 08:14, 9 May 2008 (CEST)
- --jwalling 20:57, 9 May 2008 (CEST)
- --jwalling 10:00, 3 Jun 2008 (CEST)
- --jwalling 21:47, 12 Jun 2008 (CEST)
- --jwalling 20:37, 20 Jun 2008 (CEST)
- --jwalling 07:45, 2 Jul 2008 (CEST)
- --jwalling 21:11, 7 Jul 2008 (CEST)
- --jwalling 21:38, 20 Jul 2008 (CEST)

