Welcome to Geeklog, Anonymous Saturday, May 18 2024 @ 07:55 pm EDT

Geeklog Forums

robots.txt to exclude "Print Format Page" of Stories?


tokyoahead

Anonymous
How would I write a robots.txt so that robots do not index my print-layout of stories? I found out that users search for things on google, find it on my homepage but are redirected on the print-layout without menus etc...

thanks
 Quote

Status: offline

Dirk

Site Admin
Admin
Registered: 01/12/02
Posts: 13073
Location:Stuttgart, Germany
That's not possible since the robots.txt standard does not support regular expressions or the like. It only allows substrings, so you can only exclude URLs that start with a certain string but not those that end in some string (like .../print for the printable pages).

It may be possible to do something about this with some .htaccess magic (anyone?) or you could disable the link to the printable format altogether.

And, yes, the bots love the printable pages (no clutter, just text). There's a link back to the article at the bottom of that page - you may want to make that more obvious for your human visitors.

bye, Dirk
 Quote

Status: offline

drshakagee

Forum User
Full Member
Registered: 10/01/03
Posts: 231
I don't know how well it works but you can add the rel=”nofollow” tag to your link and Google at least shouldn't follow it.
Yes I am mental.
 Quote

Status: offline

eg0master

Forum User
Regular Poster
Registered: 07/21/05
Posts: 73
Location:Stockholm
I've solved it using a hack in article.php and staticpages/index.php

This is the cide added in article.php
Text Formatted Code
if (0 == strcmp($mode,'print') &&
   0 != strncmp($_SERVER['HTTP_REFERER'], $_CONF['site_url'],strlen($_CONF['site_url']))) {
    echo COM_refresh($_CONF['site_url'] . '/article.php?story=' . $story);
    exit();
}
 


Maybe something like this should be in the geeklog distribution and control it from config.php.
Geeklog Plugins: http://plugincms.com
 Quote

Status: offline

LWC

Forum User
Full Member
Registered: 02/19/04
Posts: 818
Google (and other search engines?) do support some non standard commands in robots.txt, some of which may help you. For example:
[QUOTE robots.txt]
User-agent: Googlebot
Disallow: /*.php/*/print$
[/QUOTE]
 Quote

All times are EDT. The time is now 07:55 pm.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content