Welcome to Geeklog, Anonymous Thursday, March 28 2024 @ 09:32 am EDT

Geeklog Forums

Feature Requests: Search Engine Optimization


martingale

Anonymous

There are a few things that geeklog does that makes it a bad choice for a site you'd like to rank well in Google. They're easy to fix, and in my copy, I have fixed them. I just want to list them here in the hopes that future versions of geeklog might have it this way by default:

1. Use the URL rewriting stuff in search results, etc., I went through my copy and hand-edited the URLs in a lot of the PHP modules so that they would produce the URL-rewritten versions. It's important that all the links to a page are the same link as the one Google is going to index. Two different URLs will look like two different pages to Google and so rank lower.

2. Delete the site name from the title. The title should be just the page name, or at the very least the site name should come AFTER the title.

3. I didn't do this but it would be great: Have a short description / keyewords for each article when you submit it that you could put in the tags and early on the page. The goal would be to make this the text the search engine shows on the results page.

4. Create a sitemap listing all the articles on the site in a google friendly way. Someone made a hack of one and I copied it. I'd like it to be prettier: I'd like it to show each topic on the site as a heading with a list of the articles under it. Preferably two columns or something.

5. Some randomly varying text on each page. If you don't randomly vary the text on your page, and you get popular, there are 301 redirect hacks people can use to hijaack your rank in Google. Randomly varying the content on each page somehow (quotes? dynamic content) hardens your site against this kind of attack. I know you can plug in some quotes thing, but I think this should be a feature in every geeklog--the idea is to have a block with a bunch of different messages you can show.

6. Support for cloaking (see my other post). You want to be able to return slightly different content to the spiders than to regular users. You don't need to change the main content on the screen (and doing that would probably get you banned) but you do want to suppress some things that would confuse the spider, and include some helpful keyword texts, etc.

7. Add "nofollow" to the links to a users profile, or search for more by user. Otherwise Google will think the profile page for your main story posters is the most important page on your site, and make it the top result. You want your homepage to be the top result, not the profile of the site maintainer.

8. Add "nofollow" to any URLs comments posted by users in the comments and forum sections, so that you don't bleed PageRank to comment spammers. Alternately translate every link into a link that uses the built in link module instead of a direct URL.

I'm sure there's more.. I hope this helps. Geeklog is an excellent engine, and none of the others get this better. A few tweaks is all it takes to turn Geeklog into something that can rank well.
 Quote

martingale

Anonymous
Oh and some that are just template issues but should be fixed in professional, etc:

1. Use H1, H2, etc., for headings, not div/font something. Googlebot doesn't know what fonts look like, but it knows H1 means important.

2. Get the page title repeated early on the page before all the administrative junk

3. Move the administrative junk to the right block so it is lexically last on the page, leaving the content-oriented stuff (with keywords and things to be spidered) on the left so that it is lexically earlier on the page. Google thinks the first things you say are more relevant than the last things, you don't want google to think your page is about "Login" or "Forum Menu", and move the search box to the right/bottom for the same reason.

4. Make the site slogan a link to the sitemap or somewhere useful, the slogan likely has useful keywords in it so it should be anchor text.

 Quote

martingale

Anonymous
Something related.... and tricky and complicated... this one is designed to make the site maximally relevant to someone who comes in off a SERP:

If they hit the homepage with google, msn, or yahoo search as the referrer you can detect that, extract the query term from the search, feed that query into geeklog's internal search, and make sure that any matching articles appear on the main page for that user. Or at least the "featured article" would be chosen for them based on their query term.

People coming in from a SERP look at a page for maybe 10 seconds before clicking away, they probably don't search your site, so searching it in advance for them helps them determine quicker whether you have relevant content or not.
 Quote

Status: offline

beewee

Forum User
Full Member
Registered: 08/05/03
Posts: 969
Location:The Netherlands, where else?
H1 and H2 works fine indeed, it's also important to make the story titles link to the topic. And use the topic-icon (linked to topic) in the stories.

Most important is to put a description of you site on the top of your page, in H1/H2 tags. You can adjust the font-size of these tages in the stylesheet.

You can also place blocks with appropiate text in the left column. And adjust config.php to put at least 10 stories at the frontpage. Make appropiate storytitles, edit the storyID as well.

And use the site-index file, with a link to it at the top of your page.
Dutch Geeklog sites about camping/hiking:
www.kampeerzaken.nl | www.campersite.nl | www.caravans.nl | www.caravans.net
 Quote

Status: offline

tstockma

Forum User
Full Member
Registered: 07/22/03
Posts: 169
These are a bunch of great suggestions. I have some work to do on my web site now, and this thread gets bookmarked...thanks!
Tom
www.southparkcity.com
 Quote

Status: offline

jlawrence

Forum User
Chatty
Registered: 12/30/04
Posts: 49
Location:Plymouth, Devon, UK
Quote by martingale:
1. Use the URL rewriting stuff in search results, etc.

Yes, using the url_rewrite=true, does some of the rewriting but not all.
Although for google it's not such a big deal as googlebot understands ?'s.
Changing the external link section is important - including link categories.

Quote by martingale:
2. Delete the site name from the title. The title should be just the page name, or at the very least the site name should come AFTER the title.

That should be pretty easy to fix in a template

Quote by martingale:
3. I didn't do this but it would be great: Have a short description / keyewords for each article when you submit it that you could put in the tags and early on the page. The goal would be to make this the text the search engine shows on the results page.


That should be doable - you also want different meta tags per page.
I'm in the process of adding a meta tag input box to some of the page creation forms - topics & staticpages.

Quote by martingale:
4. Create a sitemap listing all the articles on the site in a google friendly way

You could create this pretty easy as a static php page.
It would be useful for getting things quickly into the SE indexes.

Quote by martingale:
5. Some randomly varying text on each page. If you don't randomly vary the text on your page, and you get popular, there are 301 redirect hacks people can use to hijaack your rank in Google.

It's not just 301 redirect hacks. GoogleBot prefers sites with changing content and will visit more often.

Quote by martingale:
6. Support for cloaking (see my other post).

Avoid like the plague. cloaking is a BIG no no. If google find that you're cloaking you'll at the minimum get penalised, at worst dropped from the index.

Quote by martingale:
7. Add "nofollow" to the links to a users profile

Every profile should have a link back to the main index page.

Quote by martingale:
8. Add "nofollow" to any URLs comments posted by users in the comments and forum sections, so that you don't bleed PageRank to comment spammers. Alternately translate every link into a link that uses the built in link module instead of a direct URL.

I don't think that bleeding page rank is the issue. OK, you don't want to add to a spammers page rank but doing so won't 'bleed' rank from your site as such - you don't loose rank whilst adding to anothers ranking. The problem with comment spam, is that if you don't delete it quick enough you get penalised.
Allowing googlebot to index comments, helps with the regular content changing part. If there are links out from the comments all well and good - links out from a site are important (not quite as important as links in but still important).
www.plymouthcricketclub.com - providing cricket for all ages in the Plymouth area.
 Quote

martingale

Anonymous
jlawrence:

rewriting: the problem isn't that googlebot won't follow the link, the problem is you have two different urls for the same content, so google won't realize that there are two different links to it. page rank suffers. all links to the same content should use the same url. (this is also why you should 301 redirect the non-www vefrsion of your site to the www.sitename.com version, so that the links to sitename.com get counted as links to www.sitename.com)

page title: my understanding is it's PGP code that computes the title and all you can do in the stylesheet is put it there, is my understanding wrong? I just have a var called "page_title" in my title, i had to edit .php to get it to be SEO'd.

301 hack: if your site gets popular you will eventually learn about the 301 hack. google has a broken algo in its "duplicate content" filter whereby with the appropriate 301 redirect and cloaking a hijaacker can fool google into thinking their site is the original and yours is the copy. to defeat this you need every page on your site to be changing all the time so that they can never get an exact copy of your page.

cloaking: google accepts some cloaking so long as the content you return to the user is basically the same as what you return to the SE. they're on record as saying it is OK to optimize your page by returning meta tags only to spiders, eg., to save bandwidth. also replacing an ad block with an ad for your site wouldn't violate, the ad content rotates anyway. cloaking the entire content so you return porn to surfers but recipe's to googlebot will get you banned fast.

nofollow on comments: ok bleed pagerank is wrong. but google does penalize a site if it links to a "bad neighborhood", and of course almost all comment spam links to a bad neighborhood. so if you allow googlebot to follow these links you risk winning yourself a big fat 0 pagerank as your penalty.

i also think that "administrative" links should be nofollow'd as well, like links to profiles and such, because you don't want google to think those links are important. this is my own theory maybe it is BS. the stuff i've posted above is all pretty generally accepted on SEO forums.






 Quote

Status: offline

jlawrence

Forum User
Chatty
Registered: 12/30/04
Posts: 49
Location:Plymouth, Devon, UK
Agreed, there should only evre be one url to get you to a certain page - at least as far as the SE's are concerned - and yes always 301 the non-www url.

I've not yet looked into the title, but I'm pretty sure you're right. eg, for topics, it puts the topic name in the title. That shouldn't take too much changing.

I'd be interested to know more about this 301 hack. Don't you mean 302 Smile

Comment spam is a problem, what you do about it would depend on your good/spam comment ratio. I personally like having comments indexed, it might be worth banning html comments ?
www.plymouthcricketclub.com - providing cricket for all ages in the Plymouth area.
 Quote

Status: offline

beewee

Forum User
Full Member
Registered: 08/05/03
Posts: 969
Location:The Netherlands, where else?
Yes, using the url_rewrite=true, does some of the rewriting but not all.
Although for google it's not such a big deal as googlebot understands ?'s.


MSN also spiders my site all the way down...sometimes 3 times a day...without any url rewrite...

It's also worth trying to use the new-stories-bytopic block in top of the left column, especially with URL rewrite=true
Dutch Geeklog sites about camping/hiking:
www.kampeerzaken.nl | www.campersite.nl | www.caravans.nl | www.caravans.net
 Quote

Status: offline

ByteEnable

Forum User
Full Member
Registered: 10/20/03
Posts: 138
To help people coming off from a serp, we need a real "Whats Related" at the end of the story/article, whereas, other article links at the site that are really related are shown.

Currently, the "Whats related" should be named "Story Links".
 Quote

martingale

Anonymous
One more request, a tricky one to implement, but highly desired:

Instead fo this:

www.mysite.com/article/ArticleID

I want this:

www.mysite.com/topic/ArticleID

I use the wonderful(!thank you!) new feature of being able to set my article ID to a valid keyword, so if we could get the above done, essentially i could have URLs like this for the spider:

www.mysite.com/main-keyword/sub-keyword

I also think this benefits not just SEO but also users. The topic URLs would be quite memorable. If I can't have what I really want, at least give me this:

www.mysite.com/articles/topic/story-id

The goal here is to get the topic (an important keyword on almost any site) into the URL.

Why? Because now when anyone links to that story they are sticking my important keywords right into their link, no matter what anchor text they use. It's good stuff. Google like it.
 Quote

martingale

Anonymous
OK, ya, 302 hack. For those interested there is an epic thread about page jacking here: http://www.webmasterworld.com/forum3/25638.htm

Re-reading it you're right, it's a 302. All I remembered is it was a redirect and the defense against it is to make sure that your site never looks like a duplicate of what they put on their cloaked page. One way to do that is to have some randomly varying text so the page googlebot gets is almost certainly going to be a bit different than the one they have up.

Of course nobody knows for sure exactly what to do to defend because Google won't tell exactly how the algorithm they run works.
 Quote

Status: offline

RickW

Forum User
Full Member
Registered: 01/28/04
Posts: 240
Location:United States
Quote by martingale:
1. Use the URL rewriting stuff in search results

2. Delete the site name from the title.

3. I didn't do this but it would be great: Have a short description / keyewords for each article when you submit it that you could put in the tags and early on the page.

4. Create a sitemap listing all the articles on the site in a google friendly way.

5. Some randomly varying text on each page.

6. Support for cloaking (see my other post). You want to be able to return slightly different content to the spiders than to regular users.

7. Add "nofollow" to the links to a users profile, or search for more by user.

8. Add "nofollow" to any URLs comments posted by users in the comments and forum sections


1. I had to make a lot of corrections to the code too, in order to make static links consistent across the entire site. I've pointed this out before and they know about the problem.

2. I don't really agree... although you can improve your keyword density, you also lose name branding within the search results. As important as the title tag is to results and page rank, sometimes it's best to design based on what the user would like and not just search engines. In the docs you'll find alternative variables to use so you can just put the article title into the title tag if you want - so #2 is already taken care of.

3. I've suggested a delimited keywords field too, so you can populate the meta keywords. But they made a good point, that the major search engines don't look at this - but to me every bit helps. I think this feature would be most powerful if you setup a "related articles" routine to list x number of articles that also share those keywords.

4. I agree. Like you pointed out, there is already a hack available. But it would be nice to have a fully functional site map as part of the geeklog release, to replace the site stats. Or combine the two.

5. I have no idea what you're talking about. Sad

6. Clocking is bad. I've seen some pretty heated arguments on seo forums about this. At the very least it is a spammers technique. You should really focus on providing content to your visitors and the good search engines will weight that content accordingly. If you're cloaking, then you're covering up a flaw in your writing technique and/or site design.

7. & 8. Easily accomplished - and more efficient - with robots.txt. Here is mine:

Text Formatted Code
User-agent: *
Disallow: /comment.php
Disallow: /submit.php
Disallow: /profiles.php
Disallow: /calendar.php
Disallow: /usersettings.php
Disallow: /forum/createtopic.php
 

www.antisource.com
 Quote

Status: offline

RickW

Forum User
Full Member
Registered: 01/28/04
Posts: 240
Location:United States
Quote by martingale: One more request, a tricky one to implement, but highly desired:

www.mysite.com/main-keyword/sub-keyword

www.mysite.com/articles/topic/story-id


I wanted the EXACT same thing, and it's not hard to implement at all. The code for reading those static looking links is actually only a few lines, it parses out the first token after article.php/ I don't think it would be hard to change it so it takes the 2nd token. But then I made a realization that this isn't to our benefit!

It was while I was tweaking the forums plugin for better seo. Originally the links were generated as:

/forum/viewtopic.php?forum=2&showtopic=4692

Forum then topic. Blaine made a change in his next release so you could access the thread with just the topic id:

/forum/viewtopic.php?showtopic=4692

That made the link shorter and less messy looking, plus google will pay more attention to a link if it only has 1 dynamic variable rather than several. But then it hit me - the advantage was that if you move that topic to another forum category, the link stays the same!

Apply that to your article links. Let's say I have a generic "Antispam" topic. At some point I may have 200 articles, and I decide to get more granular. So then I create the topics "Antispam Software", "Antispam Appliances", "Antispam Services", "Phishing", "Realtime Blacklists", "Spam News" and move my articles into them accordingly. The links stay the same in the search results, and I don't have to worry about breaking any referrals I've gained. Wink
www.antisource.com
 Quote

martingale

Anonymous
RickW, good points, just two comments:

1- The nofollow thing needs to be on the links IN the comments. Comments get pasted at the end of stories and other places and the search engine will still find them. Yes you should delete them fast, etc., but you never know when google is going to update, and if you didn't get the comments fast enough now google thinks you are linking to a "bad neighbourhood"

2- Cloaking is NOT an automatic ban. It depends WHAT you cloak. If you cloak in order to help the bot properly categorize your page you won't get banned. For example, if you return the "printer friendly" version of your page to bots you won't get banned. The printer friendly wouldn't have all the ads, login links, etc., that would only confuse a bot, but otherwise identical content to the user page. What WILL get you banned is cloaking your CONTENT.

Stated this way, without the word "cloaking" that gets everyone's back up, it's less controversial: "I only return feature and media rich pages to IE, Firefox, and Opera. All other user agents get the printer-friendly version." That is not abuse. That is helping blind people, people with text only browsers, and bots, access your website.
 Quote

Status: offline

RickW

Forum User
Full Member
Registered: 01/28/04
Posts: 240
Location:United States
Quote by martingale: RickW, good points, just two comments:

1- The nofollow thing needs to be on the links IN the comments. Comments get pasted at the end of stories and other places and the search engine will still find them. Yes you should delete them fast, etc., but you never know when google is going to update, and if you didn't get the comments fast enough now google thinks you are linking to a "bad neighbourhood"


Oh I see what you're saying, you're talking about Google's new tag. In case others don't know about this, this is what the new tag looks like:

Text Formatted Code
<a href="http://www.geeklog.net/" rel="nofollow">The GeekLog CMS</a>

 


The initiative is being supported by other search engines. It's a good idea.

2- Cloaking is NOT an automatic ban. It depends WHAT you cloak. If you cloak in order to help the bot properly categorize your page you won't get banned. For example, if you return the "printer friendly" version of your page to bots you won't get banned. The printer friendly wouldn't have all the ads, login links, etc., that would only confuse a bot, but otherwise identical content to the user page. What WILL get you banned is cloaking your CONTENT.


Well we can agree to disagree. Smile I think if you want to help the bot categorize your page, you use h1 and bold where needed, keep your most important content closer to the top of the page, and assign alt text to images accordingly. Or, add this to your article template:

Text Formatted Code
<meta name="googlebot" content="noarchive, nofollow, noarchive">
<meta name="msnbot" content="noindex, nofollow">

 


And then let the spiders follow your printer friendly page. Razz

Stated this way, without the word "cloaking" that gets everyone's back up, it's less controversial: "I only return feature and media rich pages to IE, Firefox, and Opera. All other user agents get the printer-friendly version." That is not abuse. That is helping blind people, people with text only browsers, and bots, access your website.


If you really want to do it this way, you could create a 2nd theme that is the seo-friendly/text/blind friendly version, then put in the php that determines if it's a spider that is crawling your site, and serve them that theme instead. Although I'm not sure what php you could use to determine if a blind person is visiting your site. Neutral
www.antisource.com
 Quote

martingale

Anonymous
Blind users can be identified by their user agent, just like googlebot; maybe some other headers sent as well.

As for cloaking.. I have sent a query in to Google search asking whether they would consider swapping the ads for other materials to be Black Hat territory. Once I get a response I will post it here (G is usu. pretty good about answering emails; I've mailed them before and usu. get a response in 5-6 days).

I agree with the comments about meta's.. mostly ignored these days. The main optimization that can hopefully be communicated to theme designers is to stop using div/class for titles and headlines and to go back to the regulr old H1, H2 that the spiders understand. Also that the first link on every page be a link to either a sitemap or hte home page, etc., these are all trivial template edits... but it would be nice to have them out of the box. Many ppl probably don't know to make these changes, so GL ought to ship with it.
 Quote

Status: offline

knuff

Forum User
Full Member
Registered: 12/17/04
Posts: 340
Location:Sweden
Quote by martingale: The main optimization that can hopefully be communicated to theme designers is to stop using div/class for titles and headlines and to go back to the regulr old H1, H2 that the spiders understand.


Isn´t this a little the egg and the chicken syndrom.
The major search engines are continuosly updating the way they index and rank pages by their content.
In the old days, the H1, H2 was considered to be the most important part for getting indexed, and at a certain point in time the value of these tags was seriously limited and the focus was moved to the actual content, actually ignoring pages that contained too much of these tags and no actual content.

Maybe today it will give you again some serious advantage, but unless you are a major player fighting for rank 1 and 2 on very popular search words, the use of H1 will only give you a minor advantage to my opinion (I must admit I gave up a while on the spider optimization, and only do the basic stuff).

So if as a theme designer you have to keep these in mind and in the near future they get penalized by the spider bots you have to adapt them again (the chicken and the egg).

So why adapt to the spiders, since the spiders are continuosly adapting to the changing way of how webpages are created and people try to spoof them.

Anyway, the default theme professional does already the job and includes the H1 tags around the title.

I just think that if you want to play with CSS driven templates (especially the second generation) the use of div is more easy and flexible and most probably in due time the spiders will adapt (if not already).

Quote by martingale:Also that the first link on every page be a link to either a sitemap or hte home page, etc ...


Just as the professional comes with the H1 for titles, most other themes have the menu bar under the title graphic, which is in fact a direct link to your index page.

So if the profesional theme would move the title menu under the Geeklog logo and add the siteindex plugin to the footer it would all come out of the box Mr. Green

Anyway, I think geeklog already does a great job with regards to getting ranked with search engines. I recently moved a static site with content that has been online for a few years to a geeklog enhanced site and the ranking rocks. From nowhere to number 2 for "pizzabullar".

So allthough I agree it can be optimized (what can´t) I do think a vanilla Geeklog does a great job already.

Just my Just my two cents
Vanrillaer.com - our Family Portal
 Quote

martingale

Anonymous
So I got a complicated answer from my email to Google on whether it would be OK to cloak AdSense ads. The answer is this:

-- Dropping the AdSense ads probably would be OK with Google search providing the other content would be the same

-- Adding any other content (desc. my site) would NOT be ok (probably)

"Probably" because they don't comment on specific sites, but their answer seemed to imply all that.

However, they also forwarded my query to the AdSense folks, and the AdSense people frowned on the idea that AdSense would not be returned to googlebot, but didn't actually come right out and say it violated their TOS. Basically, you don't screw with AdSense so that's really a "don't do it".

However you might be able to do this with some OTHER ad service, as it was the google AdSense people who had the problem with this, not the google search people.

 Quote

All times are EDT. The time is now 09:32 am.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content