Welcome to Geeklog Tuesday, December 12 2017 @ 01:10 pm EST


Status: offline

remy

Forum User
Full Member
Registered: 09/06/2003
Posts: 141
Location:Rotterdam & Bonn
In GL 2.1 there is a new and highly appreciated logfile called 404.log.
This file contains all 404 errors encountered (dead links). Some of them are caused by normal users that just typo a real link. Some of them caused by bots scraping incorrect links from somewhere else.

However, I do see a lot of 404's listed that have a encoded link. So '%3F' for the question mark and '%26' for the ampersand. It seems that Apache cannot handle properly urlEncoded requests by the browser.
Than I see a lot of 404's starting with 'RK=0/RS='. This seems to come from Yahoo search results.

The following is a quick patch to .htaccess in order to correct the behaviour of Apache. There is a lot of information on the Internet on the subject. Just search for %3F or 'RK=0'.
The below patch did the job for me.

PHP Formatted Code

RewriteEngine On

# strange behaving bots, these are urls scraped from yahoo (botters scrapping for links, yahoo search link contain RK RS) tenants modification:
RewriteRule ^(.*)RK=0/RS= /$1 [L,NC,R=301]
RewriteRule ^(.*)RS=^ /$1 [L,NC,R=301]

RewriteRule ^(.*)\?(.*)$ /$1?$2
 

Status: offline

remy

Forum User
Full Member
Registered: 09/06/2003
Posts: 141
Location:Rotterdam & Bonn
Now I am seeing other 404's due to the fact that /RK=0/RS=... is appended to the query string. Root cause are dumb bots scraping links from yahoo. Apache is not so smart in dealing with this 'trick'.
The following snippet eliminated the 404's for me:
PHP Formatted Code

RewriteCond %{QUERY_STRING} ^(.*)/RK(.*)$
RewriteRule ^(.*)$ /$1?%1 [L,NC,R=301]

RewriteCond %{QUERY_STRING} ^(.*)/RS(.*)$
RewriteRule ^(.*)$ /$1?%1 [L,NC,R=301]
 

Just add it to .htaccess

Status: offline

Laugh

Site Admin
Admin
Registered: 27/09/2005
Posts: 1244
For those of you who don't realize it Geeklog does come with a file called 404.php that you can set your server to use when a 404 error is generated (ie when a visitor calls for a file that doesn't exist on your server). This generates a nicer error message than the one the server will produce.

This is the same 404 message that is returned by Geeklog when it gets asked for something that does not exist (ie a story or staticpage).
One of the Geeklog Core Developers.

All times are EST. The time is now 01:10 pm.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content