CAPTCHAs and Geeklog - Another tool for combating spam bots?
- Saturday, September 16 2006 @ 01:21 pm EDT
- Contributed by: mevans
- Views: 25,094
There has been a lot of discussion here recently regarding strange users registering on my site. There have been several potential solutions discussed as well. One of the solutions discussed is to use CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) to prevent spam bots from registering on your site. To address this need, I have released gl-captcha-1.0, a CAPTCHA implementation for Geeklog utilizing the custom registration feature.
gl-captcha-1.0 is a combination of the previous beta releases and contains both the dynamic and static image support. This version also supports the use of a language file and improvements to the memberdetail.thtml template to allow users to refresh the CAPTCHA image and to email the administrator if having difficulties registering.
Why another CAPTCHA implementation?
I spent a lot of time logging and reviewing how spam bots were registering on my sites. What I found is that most of them completely bypass the users.php registration screen; instead they call the users.php module directly, posting the required variables. This can easily be done using a tool called curl, where you can automatically create an account. For example:
curl -d mode=create -d username=somename -d email=somewhere@email.com http://www.geeklog.net/users.php
This command will usually create an account on any standard Geeklog install. Even with the Bad Behavior plugin installed many of these requests will still get through. So what I found was that any solution that relied solely on the registration screen would fail as a protection method since the registration screen can be completely bypassed.
What I've done is develop a CAPTCHA implementation that uses PHP's session variable to store the CAPTCHA string. During the registration processing (the HTTP POST to users.php; i.e.; submit button), I validate the user entered CAPTCHA string is equal with the string set in the PHP session variable. If the PHP session variable is NULL (empty) or the user entered CAPTCHA string is NULL, then I force the user back to the registration screen. This prevents bots for bypassing the user registration screen and posting directly to users.php. So far, this has been a successful method to prevent spam bots from registering on my sites.
Whether or not a CAPTCHA implementation is the correct solution to meet your needs is only a question you can answer. CAPTCHA's do have drawbacks; the main drawback to any CAPTCHA implementation is that is makes it almost impossible for visually impaired individuals to use. In some cases, even those users who are not visually impaired may have a difficult time reading the CAPTCHA string since they are designed to be difficult to read. Also, there may be accessibility laws in your area that you must conform to as well.
To minimize these drawbacks, gl-captcha-1.0 will provide a link on the custom registration screen to allow a potential new user to email the site admin with a request for registration ($_CONF['emailuserloginrequired'] must be set to 1 in Geeklog�s config.php for this feature to work). It also states on the registration screen that a screen refresh will provide another CAPTCHA string, giving users the ability to try again if they are having difficulty in reading the current string.
CAPTCHA's are not fool proof and they are not a final solution against spam bots. OCR (Optical Character Recognition) has been used to break many CAPTCHA implementations. I have tried to use various fonts and background noise in generating the CAPTCHA images to minimize the risk, but there is no assurance that a determined spammer cannot use OCR to break this implementation although I believe the chances are slim. Also, there have been reports on using cheap 'sweat shop' labor to get around CAPTCHA implementations by having people perform the registrations en mass. See Wikipedia for a more detailed discussion on drawbacks and how CAPTCHA can be circumvented.
For me, using the Bad Behavior plugin, Dirk's SLV Spam-X class, trackback validation and gl-captcha-1.0 has proven to be a very successful arsenal against the various types of spam we Geekloggers face. I have no doubt that the spammers will continue to improve their technology and that the Geeklog community will also continue to answer the challenge and evolve our protections.