Welcome to Geeklog, Anonymous Friday, September 13 2024 @ 05:12 pm EDT
Geeklog Forums
Desperately Need mod_rewrite hack for GL 1.4.1
Page navigation
Status: offline
winnerdk
Forum User
Full Member
Registered: 04/24/05
Posts: 339
Location:Panama City, Republic of Panama
I'm looking into the need for an Apache mod_rewrite hack. The most recent information I can find goes back to 2004, and I need something more current.
I own and have been working to build http://www.panama-guide.com since August 2004 and now have more than 3,700 articles in the database. The site is very popular and regular fans keep returning. I've got more information in one place about Panama in English than any other site on the Internet.
But, Google is not indexing my site well. And, my page-rank goes up and down usually between a 4 and 5.
So, the obvious question is - If I have such a content rich site, how come I'm not getting rewarded in the Google, Yahoo, MSN, (etc) search engine game? I’ve got hundreds of times more and better content than most of the other sites out there (in English about the Republic of Panama) but many other nickel and dime sites are beating me every day in the rankings.
The reason, I'm learning, is because as far as Google and most of the other search engines are concerned, most of the articles on my site simply don’t exist (or, better said) the search engines are not indexing them well.
For example, I've got more than 80 articles in the "Law and Lawyers" topic category. But unless Google happens to index my site while those articles are on the front page they basically don't exist as far as Google is concerned. When I examine the site with spider and crawler simulators they can only “see†the links and articles that are on the homepage. Most of the site is ignored and very poorly indexed.
I know this has been an issue or a problem for years. I feel this should be a critically important issue for the future of Geeklog. Or better said, it’s a critically important issue to me and my continued use of Geeklog. I’m doing everything I can think of to work around this issue - robot.txt files, site maps, site tweaks with IBP and Alrelis, meta tag analysis, etc. Nothing I do works and it can’t work because the problem is apparently built into the Geeklog CMS.
In short, I either have to get it fixed or (unfortunately) migrate to another CMS that has addressed this problem. As much as I like Geeklog, and as much as I don’t want to migrate, I will have to unless I can figure out a work around or fix. I have invested literally thousands of hours in building http://www.panama-guide.com and I’ve been on Geeklog since day-one. The thought of possibly risking my database by having to go through a migration scares the heck out of me. I would much rather have a Geeklog fix.
Can anyone help? I need an update to the old plug-in/hack from 2004 that I ran into on a demo site somewhere in the past couple of days. It was really little more than a packaged mod_rewrite hack, and it came out at about the 1.3.9 timeframe. I looked at the files but was afraid to stick it into 1.4.1.
And, I don’t know enough to hack the mod_rewrite myself - I've spend days confirming my ignorance. And, I really need to get this fixed.
Can anyone tell me what I need to do? Help! :pray:
I own and have been working to build http://www.panama-guide.com since August 2004 and now have more than 3,700 articles in the database. The site is very popular and regular fans keep returning. I've got more information in one place about Panama in English than any other site on the Internet.
But, Google is not indexing my site well. And, my page-rank goes up and down usually between a 4 and 5.
So, the obvious question is - If I have such a content rich site, how come I'm not getting rewarded in the Google, Yahoo, MSN, (etc) search engine game? I’ve got hundreds of times more and better content than most of the other sites out there (in English about the Republic of Panama) but many other nickel and dime sites are beating me every day in the rankings.
The reason, I'm learning, is because as far as Google and most of the other search engines are concerned, most of the articles on my site simply don’t exist (or, better said) the search engines are not indexing them well.
For example, I've got more than 80 articles in the "Law and Lawyers" topic category. But unless Google happens to index my site while those articles are on the front page they basically don't exist as far as Google is concerned. When I examine the site with spider and crawler simulators they can only “see†the links and articles that are on the homepage. Most of the site is ignored and very poorly indexed.
I know this has been an issue or a problem for years. I feel this should be a critically important issue for the future of Geeklog. Or better said, it’s a critically important issue to me and my continued use of Geeklog. I’m doing everything I can think of to work around this issue - robot.txt files, site maps, site tweaks with IBP and Alrelis, meta tag analysis, etc. Nothing I do works and it can’t work because the problem is apparently built into the Geeklog CMS.
In short, I either have to get it fixed or (unfortunately) migrate to another CMS that has addressed this problem. As much as I like Geeklog, and as much as I don’t want to migrate, I will have to unless I can figure out a work around or fix. I have invested literally thousands of hours in building http://www.panama-guide.com and I’ve been on Geeklog since day-one. The thought of possibly risking my database by having to go through a migration scares the heck out of me. I would much rather have a Geeklog fix.
Can anyone help? I need an update to the old plug-in/hack from 2004 that I ran into on a demo site somewhere in the past couple of days. It was really little more than a packaged mod_rewrite hack, and it came out at about the 1.3.9 timeframe. I looked at the files but was afraid to stick it into 1.4.1.
And, I don’t know enough to hack the mod_rewrite myself - I've spend days confirming my ignorance. And, I really need to get this fixed.
Can anyone tell me what I need to do? Help! :pray:
21
22
Quote
Status: offline
jmucchiello
Forum User
Full Member
Registered: 08/29/05
Posts: 985
Check you config.php. I know this has been around since 1.3.11 at least. What about this doesn't work for you?
// This feature, when activated, makes some of Geeklog's URLs more crawler
// friendly, i.e. more likely to be picked up by search engines.
// Only implemented for stories, static pages, and portal links right now.
//
// Note: Works with Apache (Linux and Windows successfully tested).
// Unresolvable issues with systems running IIS; known PHP CGI bug.
$_CONF['url_rewrite'] = false; // false = off, true = on
// This feature, when activated, makes some of Geeklog's URLs more crawler
// friendly, i.e. more likely to be picked up by search engines.
// Only implemented for stories, static pages, and portal links right now.
//
// Note: Works with Apache (Linux and Windows successfully tested).
// Unresolvable issues with systems running IIS; known PHP CGI bug.
$_CONF['url_rewrite'] = false; // false = off, true = on
27
23
Quote
Status: offline
winnerdk
Forum User
Full Member
Registered: 04/24/05
Posts: 339
Location:Panama City, Republic of Panama
Sure, I have that turned on (always has been.) The problem I'm having is that it's "Only implemented for stories, static pages, and portal links right now."
So, what happens in effect is that as I produce and publish more content on the website, the older stuff gets "pushed off" into the topic categories. If Google or the other search engines don't happen to index my site while those articles are on the front page then they (literally) never see them. And once they go off of the front page and are behind the ? in the url's for the topics, as in: http://www.panama-guide.com/index.php?topic=law then the spiders can't get down into the article archives to index the entire site.
I use search engine simulators to see what the different spiders, bots, and crawlers can see and sure enough they are only getting to about 4% of my website. Most of the database is practically invisible to them. Each individual article url is rewritten (through the config file) but once they are off of the front page the only way the search engines can get there is by following the topic category links. In trying to do that, they stop at the ? dynamic url generator.
The same thing happens with the images in the system:
[urlhttp://www.panama-guide.com/coppermine/thumbnails.php?album=12[/url]
So, I need a mod_rewrite hack to put into my .htaccess file that changes the ? to a / but I don't have the skills to write it myself.
So, what happens in effect is that as I produce and publish more content on the website, the older stuff gets "pushed off" into the topic categories. If Google or the other search engines don't happen to index my site while those articles are on the front page then they (literally) never see them. And once they go off of the front page and are behind the ? in the url's for the topics, as in: http://www.panama-guide.com/index.php?topic=law then the spiders can't get down into the article archives to index the entire site.
I use search engine simulators to see what the different spiders, bots, and crawlers can see and sure enough they are only getting to about 4% of my website. Most of the database is practically invisible to them. Each individual article url is rewritten (through the config file) but once they are off of the front page the only way the search engines can get there is by following the topic category links. In trying to do that, they stop at the ? dynamic url generator.
The same thing happens with the images in the system:
[urlhttp://www.panama-guide.com/coppermine/thumbnails.php?album=12[/url]
So, I need a mod_rewrite hack to put into my .htaccess file that changes the ? to a / but I don't have the skills to write it myself.
26
18
Quote
Status: offline
Dirk
Site Admin
Admin
Registered: 01/12/02
Posts: 13073
Location:Stuttgart, Germany
The article directory (which you don't seem to have linked on your site) should help a bit with that problem. As should Tom's site index script.
Check the FAQ: Geeklog and search engines
There's also a Google sitemap generator. And then there's this, but it's old and I never tried it myself ...
bye, Dirk
Check the FAQ: Geeklog and search engines
There's also a Google sitemap generator. And then there's this, but it's old and I never tried it myself ...
bye, Dirk
20
21
Quote
Status: offline
Laugh
Site Admin
Admin
Registered: 09/27/05
Posts: 1470
Location:Canada
I'm surprised Google doesn't find your articles. Google has always been able to find all pages on my sites. Of course I have always had a generated site index page (I use a modified version of this http://www.geeklog.net/filemgmt/index.php?id=128). A good thing to install is the GUS plugin (for stats). It will show you all the pages google and other search engines spiders.
Another useful tool is Googles Webmaster Tools
https://www.google.com/webmasters/tools/docs/en/about.html
This is where you can specify all sorts of things about your site for the google spider, etc.... The more info Google has on your site in my opinion, the better. You may also want to try out Google Analytics as well.
It would be nice to get the index.php to support url rewrite, I believe someone has made a feature request for this already.
On a side note, I have done a few small experiements and have found google does seems to give higher page rank to pages that have url rewrite turned on. On a PR 3 site the staticpages went from PR 0 to PR 2 when url rewrite was turned on and then back to PR 0 when it was turned off (this all took place over a few months). I don't know if it sends much more traffic though. My major site has the url write turned off, I would like to turn it on but I am worried about the results of google respidering the site, and duplicate content issues, etc.... I'm pretty sure it would help turned on but, this is a site wide change and if it doesn't work out by chance, I would be in trouble!
One of the Geeklog Core Developers.
Another useful tool is Googles Webmaster Tools
https://www.google.com/webmasters/tools/docs/en/about.html
This is where you can specify all sorts of things about your site for the google spider, etc.... The more info Google has on your site in my opinion, the better. You may also want to try out Google Analytics as well.
It would be nice to get the index.php to support url rewrite, I believe someone has made a feature request for this already.
On a side note, I have done a few small experiements and have found google does seems to give higher page rank to pages that have url rewrite turned on. On a PR 3 site the staticpages went from PR 0 to PR 2 when url rewrite was turned on and then back to PR 0 when it was turned off (this all took place over a few months). I don't know if it sends much more traffic though. My major site has the url write turned off, I would like to turn it on but I am worried about the results of google respidering the site, and duplicate content issues, etc.... I'm pretty sure it would help turned on but, this is a site wide change and if it doesn't work out by chance, I would be in trouble!
One of the Geeklog Core Developers.
23
15
Quote
Status: offline
winnerdk
Forum User
Full Member
Registered: 04/24/05
Posts: 339
Location:Panama City, Republic of Panama
Thanks. I just installed GUS (nice).
I've been looking at all of the Google Webmaster Tools and Analytics as part of my continuing effort to get all of the search engines to index my site better. What's driving me nuts is there are other CMS platforms out there that have this all figured out, and they have written in features that should be in Geeklog. For example, it would be nice (sweet) if the system would generate article specific meta keywords for every item posted, and identify them as such.
My main problem is that, as far as Google is concerned, my website is brand new practically every time I post an article. Crusty old webpages built with DOS 2.0 beat me in the rankings because they've been sitting there collecting dust for years. Meanwhile, I'm collecting up relevant data from all over the web and (basically) getting penalized for updating every day.
So, the good news is that I'm using a great CMS that can dynamically generate content with all the bells and whistles. The bad news is that the search engine algorythms don't like the way the articles are presented, especially true for the data that's dropping off of the homepage and into the topics.
I've thought about putting like 15 or 20 of the most popular articles on the homepage as static pages, and making the rest available in the topics as daily generated content. The search engines would love it, but I'd be throwing a curve ball at the 3,000 people or so who visit the site every day. I'll probably end up doing a combination of the two, like a migration to a different presentation that's both Geeklog and Google friendly. But with 3,700 articles in the database on a very tightly focused area, I should be kicking butt.
And, what I really (really) still want is a simple hack for the mod_rewrite thing that would fix my problem once and for all. I guess I'm just going to have to bang my way through it. If the lights go out in the North East, at least you'll know why.
Don
http://www.panama-guide.com
I've been looking at all of the Google Webmaster Tools and Analytics as part of my continuing effort to get all of the search engines to index my site better. What's driving me nuts is there are other CMS platforms out there that have this all figured out, and they have written in features that should be in Geeklog. For example, it would be nice (sweet) if the system would generate article specific meta keywords for every item posted, and identify them as such.
My main problem is that, as far as Google is concerned, my website is brand new practically every time I post an article. Crusty old webpages built with DOS 2.0 beat me in the rankings because they've been sitting there collecting dust for years. Meanwhile, I'm collecting up relevant data from all over the web and (basically) getting penalized for updating every day.
So, the good news is that I'm using a great CMS that can dynamically generate content with all the bells and whistles. The bad news is that the search engine algorythms don't like the way the articles are presented, especially true for the data that's dropping off of the homepage and into the topics.
I've thought about putting like 15 or 20 of the most popular articles on the homepage as static pages, and making the rest available in the topics as daily generated content. The search engines would love it, but I'd be throwing a curve ball at the 3,000 people or so who visit the site every day. I'll probably end up doing a combination of the two, like a migration to a different presentation that's both Geeklog and Google friendly. But with 3,700 articles in the database on a very tightly focused area, I should be kicking butt.
And, what I really (really) still want is a simple hack for the mod_rewrite thing that would fix my problem once and for all. I guess I'm just going to have to bang my way through it. If the lights go out in the North East, at least you'll know why.
Don
http://www.panama-guide.com
23
20
Quote
Status: offline
winnerdk
Forum User
Full Member
Registered: 04/24/05
Posts: 339
Location:Panama City, Republic of Panama
I'm on a server running Apache right now, and it's all set up to do the mod_rewrite (and it works) but the only thing I don't know how to do is to write the actual script for the .htaccess file to make the changes I need. At this point I'm working to learn enough about the different mod_rewrite scripts and commands to figure out what I need to do. Unfortunately I don't have any experience in this kind of scripting so I'm starting from step one and figuring it out as I go. I was hoping that someone else had already overcome this issue using a .htaccess script, but from the lack of response it looks like I'm going to have to figure out how to write it myself.
Don
http://www.panama-guide.com
Don
http://www.panama-guide.com
22
23
Quote
Status: offline
jmucchiello
Forum User
Full Member
Registered: 08/29/05
Posts: 985
I don't use mod_rewrite so I can't help you with actual samples. You need learn about regular expressions (regex) in order to understand how mod_rewrite works, though. Google "regex tutorial mod_rewrite" for some promising examples.
21
18
Quote
Status: offline
winnerdk
Forum User
Full Member
Registered: 04/24/05
Posts: 339
Location:Panama City, Republic of Panama
Thanks. I'm punching through it today. Hopefully by this afternoon I'll have something up that works.
Don
http://www.panama-guide.com
Don
http://www.panama-guide.com
22
22
Quote
Status: offline
jmucchiello
Forum User
Full Member
Registered: 08/29/05
Posts: 985
So why not send them an email and ask if they can send you a copy of their rewrite rules?
21
18
Quote
Status: offline
winnerdk
Forum User
Full Member
Registered: 04/24/05
Posts: 339
Location:Panama City, Republic of Panama
I wrote to them, hoping for a response. The problem is that the site appears to have been pretty much abandoned since about mid-2005 and is probably running an old version of Geeklog. I need something that will work on the most recent version.
Don
http://www.panama-guide.com
Don
http://www.panama-guide.com
22
23
Quote
Status: offline
Blaine
Forum User
Moderator
Registered: 07/16/02
Posts: 1232
Location:Canada
Don, I know you are using glMenu on your site and wanted to suggest that glMenu 2.x may work better as it's a pure CSS based version of the menu plugin so all the menu contents (links) are in the page content and will be indexed better by the search engines. The new version glMenu 2.1 is available as free download for all v1.x owners.
Geeklog components by PortalParts -- www.portalparts.com
Geeklog components by PortalParts -- www.portalparts.com
16
19
Quote
Status: offline
winnerdk
Forum User
Full Member
Registered: 04/24/05
Posts: 339
Location:Panama City, Republic of Panama
Really? Cool. So, I'll kill the "Sections" section and link to all topics through glmenu. Rockin', like Janet Reno...
I'm tracking the search engines pretty closely with IBP and Arelis so I should be able to spot the difference (if there is one.) I'll let you know. Off to download the upgrade now. 1.0.7 was java, right?
Don
http://www.panama-guide.com
I'm tracking the search engines pretty closely with IBP and Arelis so I should be able to spot the difference (if there is one.) I'll let you know. Off to download the upgrade now. 1.0.7 was java, right?
Don
http://www.panama-guide.com
19
23
Quote
Yeh, glMenu 1.x uses the Milonic Javascript library which also needed to be licensed and effectively doubled the cost of the plugin. Milonic is a very flexible library and can give you a lot of control because it used the full power of Javascript but as seen on my site, the CSS Based menus that glMenu2 creates are not your basic rollover style menus. You can even move your mouse off the menu item (as in the drop down or slide out menu) and the menu stays open.
[begin commercial]
So It's faster, cleaner and cheaper but still has the flexibility of the advanced online administration and permission based control for even the non-techie to create a custom menu for their site.
[/end commmercial]
Geeklog components by PortalParts -- www.portalparts.com
[begin commercial]
So It's faster, cleaner and cheaper but still has the flexibility of the advanced online administration and permission based control for even the non-techie to create a custom menu for their site.
[/end commmercial]
Geeklog components by PortalParts -- www.portalparts.com
26
18
Quote
Page navigation
All times are EDT. The time is now 05:12 pm.
- Normal Topic
- Sticky Topic
- Locked Topic
- New Post
- Sticky Topic W/ New Post
- Locked Topic W/ New Post
- View Anonymous Posts
- Able to post
- Filtered HTML Allowed
- Censored Content