Welcome to Geeklog, Anonymous Friday, October 11 2024 @ 10:53 am EDT

Geeklog Forums

Google sitemaps for Geeklog?

Page navigation


Status: offline

jlhughes

Forum User
Full Member
Registered: 04/25/02
Posts: 154
curious
Google has announced a new service direct its crawlers in indexing sites. You can read more about it at
https://www.google.com/webmasters/sitemaps/docs/en/about.html
.

Would it be possible to adapt the Site Map plugin to work with Google's sitemap feature?
 Quote

tagstar

Anonymous
You beat me to the post. Yes, absolutely need this feature. Google crappily indexes my site!
 Quote

Status: offline

ScurvyDawg

Forum User
Full Member
Registered: 11/06/02
Posts: 523
Well Google indexes my site perfectly but anything that would assist them in doing so better would be a good thing.

btw: If you have trouble with Google indexing your GL you may need to make changes to your theme. There are many good tips on this in these forums.

I look forward to hearing how other peoples tests with this new google tool go. I unfortunately will not be able to test it for at least a week or more.

By then I expect others will have had their say about it already.

Smile
 Quote

Anonymous coward

Anonymous
Wow -- I had just come here to write about this topic, and everybody beat me to it.

I will be one of those working on some cheap implementation. I am sure somebody will do it better, but it will be a fun weekend project.
 Quote

Status: offline

THEMike

Forum User
Moderator
Registered: 07/25/03
Posts: 141
Location:Sheffield, UK
I've been looking at the possibility of using the 1.3.12 (CVS) syndication system to generate the sitemaps. Not sure if it's going to work, or whether it could be done via a plugin. I suspect it needs to be part of core if it's included.
 Quote

tagstar

Anonymous
google sucks at iindexing, so the sooner this can be added to geeklog the better.
 Quote

realpanama

Anonymous
My, admittedly, temporary solution to generating google sitemaps. Just my two cents It only works for articles, nothing else yet... read on for my fix to that.

Because I am very new at geeklog, and hardly know any PHP, I cobbled together this procedure.

I tried using the automated sitemap creator tools... they don't seem to like GL and do not index. Using the google-supplied script is useless too, so...

My solution has a hundred ugly things about it, but it provides a starting point for the creation of a true sitemap solution by a real programmer...

First, create a static page:
Title: whatever you want, I named mine xmlsitemap
Add to Menu: leave unchecked
Label: whatever you want - it won't be used.
Page format: BLANK PAGE
ID: sitemap.xml
(I use the rewrite URL option... I thought this would be good, then I learned Google Sitemap wants the file on the ROOT of the website... we'll get to that later.)

This code refers to my website, www.realpanama.org, make the appropriate changes using your site name and table names. For some reason, if I use variables for site URL, etc., I get SQL errors so I resorted to using the actual table names. Somebody care to enlighten me?

The original code I "stole" from pigstye.net. Thanks to TomW who created it... Pigstye.net My original intention was to create an HTML page as the first page (www.realpanama.org/index.html), because it is totally different from the rest of the site... but I still wanted to get a listing of stories and to feature the top 4 stories on the site. And because I had trouble getting a static page to do it... this came about.

Text Formatted Code
    $result = DB_query("Select sid, title, unix_timestamp(date) AS day FROM gl_stories WHERE (date <= NOW()) AND (draft_flag = 0) ORDER BY date DESC");
    $nrows = DB_numRows($result);
    $retval .= '<?xml version="1.0" encoding="UTF-8"?>'. chr(13) . chr(10);
    $retval .= '<urlset xmlns="http://www.google.com/schemas/sitemap/0.84"'. chr(13) . chr(10);
    $retval .= 'xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"' . chr(13) . chr(10);
    $retval .= 'xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84'. chr(13) . chr(10);
    $retval .= 'http://www.google.com/schemas/sitemap/0.84/sitemap.xsd">' . chr(13) . chr(10);
        $retval .= '<url>'. chr(13) . chr(10);
        $retval .= '<loc>' . 'http://www.realpanama.org</loc>'. chr(13) . chr(10);
        $retval .= '<lastmod>' . date('Y-m-d') . '</lastmod>'. chr(13) . chr(10);
        $retval .= '<changefreq>daily</changefreq>'. chr(13) . chr(10);
        $retval .= '<priority>1.0</priority>'. chr(13) . chr(10);
        $retval .= '</url>'. chr(13) . chr(10);


    for ($i = 1; $i <= $nrows; $i++) {
        $A = DB_fetchArray($result);
        $retval .= '<url>'. chr(13) . chr(10);
        $retval .= '<loc>' . 'http://www.realpanama.org/article.php/' . $A['sid'] . '</loc>'. chr(13) . chr(10);
        $retval .= '<lastmod>' . date('Y-m-d',$A['day']) . '</lastmod>'. chr(13) . chr(10);
        $retval .= '<changefreq>weekly</changefreq>'. chr(13) . chr(10);
        $retval .= '<priority>0.5</priority>'. chr(13) . chr(10);
        $retval .= '</url>'. chr(13) . chr(10);

}
        $retval .= '</urlset>'. chr(13) . chr(10);

    return $retval;
 


Centerblock: leave unchecked.
In a block: leave unchecked.
Set to: EXECUTE PHP
Exit type: leave unchecked.

Save your static page.

Now, in my site I get the following static page:

sitemap.xml

I thought I was done, and was ready to submit to Google, when I realized Google requires that pages listed in the index are in the same folder or below to where the sitemap resides. Not good! Banging your head

I lack the ability to change the URL, so I had to do a Server Side Include trick.

Create a blank file named "sitemap.shtml", and put in the following content (should work on most servers).


Text Formatted Code
<!--#include virtual="/staticpages/index.php/sitemap.xml" -->
 
(On firefox it looks ugly, but it looks OK on IE... go figure... probably something that I did wrong, but I am sure someone will point it out).

Google doesn't seem to care if you offer a file named "sitemap.shtml", and it accepted my offering.

BUT... if you really want it to be named something like "sitemap.xml", then you must modify your .htaccess file on your root directory and make it look like this:

Text Formatted Code
ErrorDocument 404 /siteindex.php
 AddType text/x-server-parsed-html .html .htm .shtml .xml
 


(btw, the first line has nothing to do with this, it is my way of sending people that might type a wrong or dead address to the GL sitemap page)

That modified .htaccess file now allows for server parsing of html, htm, shtml and XML files... you see where I am going with this?

Rename your siteindex.shtml file to siteindex.xml and you are done.

Submit it to google sitemaps.

There, that should do it.

I have checked MY xml file with xml checkers and they all say the file checks out.

I am sorry I need to do this in such a roundabout way... I just don't know a way to get it to do this *cleanly*... perhaps some more knowledgeable people in this forum will review this and update it.

If I was to add other things, I would fool around with the SQL query that gives life to this thing, but for now, all I wanted or need indexed is the articles themselves and the main page.

You will notice the main page, www.realpanama.org itself would not be generated by the code. I hacked that by simply adding it manually. It is the first section of the code. I use the php date function to make sure it always says the day's date and gave it a priority of 1.0 because it is the most important page. Everything else is set to 0.5, but you make the call and change it to whatever suits you.

If it works for you, I'd appreciate you post about it on this thread. It is very much a work in progress. It needs several things like making sure bad characters don't get through, etc. I will add them, and when I have them working, I will update this thread.

If I wanted to do it cleanly I would:

1. make it a real php script.
2. use php to write out a file from the return value and put it wherever I wanted.

Someday...

Finally, this code is working on a GL 1.3.11. Have to add that "sr1" soon...

Thanks.

realpanama.org

 Quote

realpanama

Anonymous
Forgot to add/meant to say.

That file, sitemap.shtml or sitemap.xml must be saved at the root of your site.

That way, google can index all pages. If you put it somewhere else, Google won't like it.

Thanks

www.realpanama.org
 Quote

Status: offline

ronack

Forum User
Full Member
Registered: 05/27/03
Posts: 612
I found this 3rd party PHP Google Sitemap Generator which seems to work pretty well. I first tried the RC1 version which didn't work too well so I then tried the stable vers which worked too well. But at least it works and here are my suggestions.

First, Follow the directions for installation. Basically unzip it and put the directory in your Admin directory, then copy sitemap.xml to your root. That's all there is to installation.

Next, navigate to the new directory http://Your_domain.com/admin/phpSitemapNG/index.php.

Make sure the paths are correct.

It has directories and files to disallow and here is my suggestion, start out by disallowing everything and run it and take a look at sitemap.xml in your root. Then go back and allow files and directories you think you want and run it again until you have what you want in the listed. I ran it with most everytihing allowed and max'd it out. It will capture everything. That is why I suggest to disallow everything first.

Once you have what you want in the site map, you then will have to submit it to Google.
 Quote

realpanama

Anonymous
Update -- My google xml feed was successfully picked up.

At least we know it works.

If you don't want to go through all my steps, here is another solution that you do not have to install at all on your server or change files or... do a lot of stuff.

http://www.auditmypc.com/free-sitemap-generator.asp

This will run a Java applet on your machine and index your entire site. Very neat, and it worked for both a Miva Merchant site with 220 product pages that I am working on and an old-fashioned html website with 1200+ pages that I can only access through FrontPage and FTP, but not telnet.

The html site runs on an NT server that I don't have much control over, and I could not get the php scripts to execute, so this one did a bang up job of indexing.

It did it over my adsl connection of 256K in just a few minutes for the small site and a little longer for the 1200 page site.

It generates both a XML and a TXT file for you.

Google took my submission and 12 hours later had accepted it (OK).

My first submission had an "error"... The code was good, but when I went to the google siteindex page, I put my site as "http://realpanama.org" instead of "http://www.realpanama.org". It rejected the site because all the pages listed in the xml file had the www prefix. Important to know!!!

Take care.
 Quote

tagstar

Anonymous
I was hoping for an easy generator like the RSS feed generator, altered slightly to make sitemaps. Is that a better way to go?
 Quote

tagstar

Anonymous
Noticed that whatever google site maps does, it didn't index your images.
 Quote

Status: offline

ronack

Forum User
Full Member
Registered: 05/27/03
Posts: 612
All my images were indexed when I used the phpsitemap generator.
 Quote

Larry

Anonymous
Why bother putting images in the sitemap? If your site is graphics oriented perhaps this makes sense. People looking for images in search engines normally do so because they want to use/steal them.

I created a sitemap nine days ago. Google checks this file regularly. I don't see where it has improved the search engine's indexing on my site. Their bot takes pages assigned lower priorities before those with higher ones and pages marked as changing yearly over those which change daily. It also continues to request pages I purposely left out of the sitemap before finishing those which are in it.

Maybe eventually this feature will be prove useful. From what I've watched so far,, I don't think it matters if a site has one of these sitemaps or not.
 Quote

Status: offline

ronack

Forum User
Full Member
Registered: 05/27/03
Posts: 612
I should qualify what I said, the PHP Sitemap Generator initially indexed my images. I have since included the Image Directory in the disallowed list.

I don't see the need to index images either. Unless you are offering up those images for download. I have one person who creates graphic pictures and he allowes folks to download them. So I could understand his site being indexed for images.
 Quote

Status: offline

frisco3

Forum User
Junior
Registered: 02/06/04
Posts: 23
Location:Burlington Vermont
I must be missing something. I tried using the phpsitemapNG and it created a sitemap of all my geeklog files, NOT my articles. Clearly, I want it to do the opposite.

Since the articles live on a dB, not as files in my directory, does this tool even work for articles?
 Quote

Status: offline

ronack

Forum User
Full Member
Registered: 05/27/03
Posts: 612
Yeah that's a problem with many search engines especially Google. Although Google will spider some articles, it has a hard time of it. Has a lot to do with the "?", "=". and also Google doesn't like more than 3 arguments.

What I found was that Google would hit my site, index one page then leave only to return a few minutes later do another one then leave. Google and other SE's spent HOURS on my site.

One way around it is the Apache rewrite, but I didn't have that available because I'm not using Apache so I couldn't begin to tell you how to set that up.

Now what I did which seems to be working at least for now is the php page redirect. I have a directory with about 400 redirects which I include in the sitemap of cousre all of those are for the shopping cart but it would work stories too. I use it as a shortURL or like the tinyurl site.

This is what it looks like.

Text Formatted Code
<?php
$shorturl = 'http://www.nc-firefighter.com/staticpages/index.php?page=photogallery';
header("Location: $shorturl");
?>
 
So far it seems that Google doesn't mind this type of redirect of course the Google folks change things daily and what works today may not work tomorrow.

And you could probably develope some code to automagically create the file.
 Quote

realpanama

Anonymous
Frisco et al,

Quote by frisco3: I must be missing something. I tried using the phpsitemapNG and it created a sitemap of all my geeklog files, NOT my articles. Clearly, I want it to do the opposite.

Since the articles live on a dB, not as files in my directory, does this tool even work for articles?


That's specifically why I "wrote" (hacked, stole code to make it work) my solution. I never even thought phpsitemapNG or a tool like it would actually do it, unless it queries the server by "connecting to it from outside".

By the way... My Google PR was 0 a few days ago. Now it is reporting "3", and this tool, Future PageRank (http://www.seochat.com/seo-tools/future-pagerank/) from www.seochat.com says it is going to be 5 very soon. Big Celebration

But then again, I have talked important websites with high rankings into listing me in exchange for a link. We are very specialized in our subject area (prisoners in Panama), so that also has something to do with it, IMHO.

 Quote

Larry

Anonymous
The sitemap does not help PR. Links from other sites, especially high ranking ones, definitely do. Thanks for the SEO tool link.

I didn't use phpsitemapNG. Although it's a nice tool, as you pointed out it doesn't create urls for dynamically generated content. What I did was write my own script which queries the database then creates urls to such content. I've made so many changes to Geeklog this version wouldn't work on a regular setup.
 Quote

Status: offline

ronack

Forum User
Full Member
Registered: 05/27/03
Posts: 612
I went ahead and tried your solution and it didn't work. I don't know if I have something set wrong or it's my server.

First my static page looks like this.

http://www.miplanet.com/staticpages/index.php?page=sitemap.xml

Not this

http://www.miplanet.com/staticpages/index.php/sitemap.xml

When I go to the page all I get is this

http://www.miplanet.com 2005-07-18 daily 1.0 http://www.miplanet.com/article.php/200501252251582 2005-01-25 weekly 0.5

It doesn't look like the XML needed for Google. It doesn't look like it's picking up the formatting.
 Quote

Page navigation

All times are EDT. The time is now 10:53 am.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content