Welcome to Geeklog, Anonymous Thursday, April 25 2024 @ 01:41 am EDT

Question: Geeklog and search engines

Answer: Overall, Geeklog should be indexed just fine by the popular search engines. But, of course, there is always room for improvement.

The main rules for any site (powered by Geeklog or not) are still

  • have good, interesting content
  • have links pointing to your site

Where the latter automatically comes with the former ...

Google has a list of list of guidelines that also apply to Geeklog sites.

Here are some additional tips:


Don't bother with meta tags

To have the site's keywords in meta tags sounded like a good idea when the web was still in its infancy. These days, however, they have been misused by too many sites ("keyword spamming"), and are therefore ignored by all major search engines.

One exception, though, is the meta description that Google sometimes picks up. So it can't hurt to have a short summary of your site's content there (in the <head> section of header.thtml):

<meta name="description" content="Geeklog is an opensource content management system.">


Use URL rewriting

Activate Geeklog's URL rewriting feature (in config.php):

$_CONF['url_rewrite'] = true;

This will make URLs to stories and static pages look a little nicer and more likely to be visited by the spiders. So instead of
http://www.geeklog.net/article.php?story=20040101123456321
http://www.geeklog.net/staticpages/index.php?page=20040101123456321

URLs will look like
http://www.geeklog.net/article.php/20040101123456321
http://www.geeklog.net/staticpages/index.php/20040101123456321

For you can also edit the ID and change it into something that summarizes the content of the story or static page, e.g.
http://www.geeklog.net/article.php/geeklog-1.3.10
http://www.geeklog.net/staticpages/index.php/support-options


Please note that the URL rewriting feature does not currently work with Microsoft's IIS web server. If you're having probems with URL rewriting, please see this FAQ entry.


Add links to the article page

To make the spiders index the article.php page (i.e. the page that has the entire story text and not only the intro), it helps to make the story title a link to the actual story (in storytext.thtml and featuredstorytext.thtml):

<a href="{article_url}" class="non-ul">{story_title}

Since this would normally make the story title underlined, you can add this to your stylesheet (style.css):
   .non-ul {
text-decoration: none;
}

You can also take this one step further and make the story's headline an actual headline by using the <h1> tag:

<h1><a href="{article_url}" class="non-ul">{story_title}</a></h1>

Again, this may make the headline stand out too much (the effect depends on the theme you're using), so you may want to add this to your stylesheet:
    h1 {
font-size: 100%; display: inline;
}

The CSS changes are only for the human visitors of your site - experiment with them to make the headline look good with your theme. The search engine spiders don't care about the CSS - but they are very interested in headlines and links ...

Note: The Professional theme (Geeklog's default theme as of Geeklog 1.3.10) already uses all these tips.


Use the site index script

Tom Willet wrote a script that provides a site index, i.e. a list of all the stories on a site. While this can be useful for human visitors, it is also a feast for search engine spiders.


Excluding pages

Strange as this may sound, but it may even make sense to exclude certain pages from being indexed. Geeklog's submit form for stories, links, and events is such an example. It simply doesn't make sense for these pages to be indexed.

Spiders (also called robots) will look for a file called robots.txt that contains rules about which files and directories they should be indexing or not. The minimal robots.txt file looks like this:
    User-agent: *
Disallow:

This means that all robots are welcome (line 1) and that there are no files or directories that they shouldn't index (line 2). Put this in a simple text file (named, of course, robots.txt) and upload it to the document root of your site (normally, that's the same directory that Geeklog's index.php resides in).

Now, to exclude files, simply add a line for each file or directory you want to exclude:
    User-agent: *
Disallow: /submit.php
Disallow: /admin/

(Note: Spiders won't usually see a link to the admin directory on a Geeklog site - this line is there for illustration purposes only)

Also see this story for more information about the robots.txt file.

Hits: 799

FAQ » General » Geeklog and search engines