Welcome to Geeklog, Anonymous Wednesday, October 09 2024 @ 04:15 pm EDT
Geeklog Forums
avoid 404 errors due to apache not decoding url
Status: offline
remy
Forum User
Full Member
Registered: 06/09/03
Posts: 162
Location:Rotterdam & Bonn
In GL 2.1 there is a new and highly appreciated logfile called 404.log.
This file contains all 404 errors encountered (dead links). Some of them are caused by normal users that just typo a real link. Some of them caused by bots scraping incorrect links from somewhere else.
However, I do see a lot of 404's listed that have a encoded link. So '%3F' for the question mark and '%26' for the ampersand. It seems that Apache cannot handle properly urlEncoded requests by the browser.
Than I see a lot of 404's starting with 'RK=0/RS='. This seems to come from Yahoo search results.
The following is a quick patch to .htaccess in order to correct the behaviour of Apache. There is a lot of information on the Internet on the subject. Just search for %3F or 'RK=0'.
The below patch did the job for me.
RewriteEngine On
# strange behaving bots, these are urls scraped from yahoo (botters scrapping for links, yahoo search link contain RK RS) tenants modification:
RewriteRule ^(.*)RK=0/RS= /$1 [L,NC,R=301]
RewriteRule ^(.*)RS=^ /$1 [L,NC,R=301]
RewriteRule ^(.*)\?(.*)$ /$1?$2
This file contains all 404 errors encountered (dead links). Some of them are caused by normal users that just typo a real link. Some of them caused by bots scraping incorrect links from somewhere else.
However, I do see a lot of 404's listed that have a encoded link. So '%3F' for the question mark and '%26' for the ampersand. It seems that Apache cannot handle properly urlEncoded requests by the browser.
Than I see a lot of 404's starting with 'RK=0/RS='. This seems to come from Yahoo search results.
The following is a quick patch to .htaccess in order to correct the behaviour of Apache. There is a lot of information on the Internet on the subject. Just search for %3F or 'RK=0'.
The below patch did the job for me.
Text Formatted Code
RewriteEngine On
# strange behaving bots, these are urls scraped from yahoo (botters scrapping for links, yahoo search link contain RK RS) tenants modification:
RewriteRule ^(.*)RK=0/RS= /$1 [L,NC,R=301]
RewriteRule ^(.*)RS=^ /$1 [L,NC,R=301]
RewriteRule ^(.*)\?(.*)$ /$1?$2
17
10
Quote
Status: offline
remy
Forum User
Full Member
Registered: 06/09/03
Posts: 162
Location:Rotterdam & Bonn
Now I am seeing other 404's due to the fact that /RK=0/RS=... is appended to the query string. Root cause are dumb bots scraping links from yahoo. Apache is not so smart in dealing with this 'trick'.
The following snippet eliminated the 404's for me:
RewriteCond %{QUERY_STRING} ^(.*)/RK(.*)$
RewriteRule ^(.*)$ /$1?%1 [L,NC,R=301]
RewriteCond %{QUERY_STRING} ^(.*)/RS(.*)$
RewriteRule ^(.*)$ /$1?%1 [L,NC,R=301]
Just add it to .htaccess
The following snippet eliminated the 404's for me:
Text Formatted Code
RewriteCond %{QUERY_STRING} ^(.*)/RK(.*)$
RewriteRule ^(.*)$ /$1?%1 [L,NC,R=301]
RewriteCond %{QUERY_STRING} ^(.*)/RS(.*)$
RewriteRule ^(.*)$ /$1?%1 [L,NC,R=301]
Just add it to .htaccess
13
16
Quote
Status: offline
Laugh
Site Admin
Admin
Registered: 09/27/05
Posts: 1470
Location:Canada
For those of you who don't realize it Geeklog does come with a file called 404.php that you can set your server to use when a 404 error is generated (ie when a visitor calls for a file that doesn't exist on your server). This generates a nicer error message than the one the server will produce.
This is the same 404 message that is returned by Geeklog when it gets asked for something that does not exist (ie a story or staticpage).
One of the Geeklog Core Developers.
This is the same 404 message that is returned by Geeklog when it gets asked for something that does not exist (ie a story or staticpage).
One of the Geeklog Core Developers.
13
11
Quote
All times are EDT. The time is now 04:15 pm.
- Normal Topic
- Sticky Topic
- Locked Topic
- New Post
- Sticky Topic W/ New Post
- Locked Topic W/ New Post
- View Anonymous Posts
- Able to post
- Filtered HTML Allowed
- Censored Content