Welcome to Geeklog, Anonymous Thursday, April 03 2025 @ 05:12 am EDT

Geeklog Forums

Calendar? What calendar?

New Topic Post Reply

First
Previous
1
2
3
Next
Last

02/06/05 07:19am (Read 16,997 times)

Status: offline

LWC

Forum User

Full Member

Registered: 02/19/04

Posts: 818

I've just taken a look at my site's log and saw that my most popular Geeklog page is my calendar.

I don't use my calendar...

Is there some known security flaw in Geeklog's calendar which makes spammers look for it or something?

45 35 Quote

02/06/05 07:29am

Status: offline

Dirk

Site Admin

Admin

Registered: 01/12/02

Posts: 13073

Location:Stuttgart, Germany

Did you check who was visiting the calendar? Googlebot seems to love it, for example ...

bye, Dirk

32 38 Quote

02/06/05 09:40am

Status: offline

LWC

Forum User

Full Member

Registered: 02/19/04

Posts: 818

Actually, today alone, my referer log looks something like this:

Text Formatted Code

http://www.google.com/search?q=...-> /article.php/...

http://my-own-domain/article.php/... -> /images/speck.gif

http://www.google.com/search?q=...-> /article.php/...

http://www.google.com/search?q=...-> /article.php/...

- -> /calendar.php

- -> /calendar.php

- -> /calendar.php

About 5 pages of "- -> /calendar.php" later...

- -> /calendar.php

- -> /calendar.php

- -> /calendar.php

And so on...

Until finally, when I checked it out myself:

http://my-own-domain/ -> /calendar.php

That's insane...
Also, since there's no referer, it also means that's it's not even internal (because that would have been shown as my-own-domain like the final entry). It's as if someone goes there manually/via favorites!

Let's say I'm wrong and it is internal - it's not possible that Geeklog loads it every time anyone goes to ANY Geeklog page, is it? Because why would it take up like 90% of my referer log?

32 41 Quote

02/06/05 10:16am

Status: offline

Dirk

Site Admin

Admin

Registered: 01/12/02

Posts: 13073

Location:Stuttgart, Germany

Don't you have access to your webserver's logfiles? Without that, it's all speculation ...

As I said, Googlebot and the other search engine spiders seem to love the calendar:

Text Formatted Code
lj2248.inktomisearch.com - - [05/Feb/2005:13:28:51 +0100] "GET /calendar.php?view=day&mode=&day=10&month=10&year=2003 HTTP/1.0" 200 17011 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

lj2035.inktomisearch.com - - [05/Feb/2005:17:00:20 +0100] "GET /calendar.php?month=1&year=2004 HTTP/1.0" 200 27367 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

lj2411.inktomisearch.com - - [05/Feb/2005:18:50:57 +0100] "GET /calendar.php?mode=&view=week&month=9&day=7&year=2003 HTTP/1.0" 200 11751 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

Etc., spread out over the entire day.

That's from a randomly selected logfile of one of my sites. It's Yahoo's spider in this case, but I've seen similar "walkthroughs" from other spiders, including Goolegbot and msnbot.

bye, Dirk

32 42 Quote

02/06/05 11:32am

Status: offline

LWC

Forum User

Full Member

Registered: 02/19/04

Posts: 818

What do you mean? This was my webserver's logfile...
It's called referer.log and it's formatted like this:
left side --> right side

where "left side"=the referer and "right side"=the referered page.

So if "left side" is empty, that means there's no "official" referer - which means someone who came in manually or via favorites, but also someone who shut off his/her referer's header.

29 31 Quote

02/06/05 12:07pm

Status: offline

Dirk

Site Admin

Admin

Registered: 01/12/02

Posts: 13073

Location:Stuttgart, Germany

Quote by LWC: What do you mean? This was my webserver's logfile...

Sorry, never seen a logfile like that. Doesn't it list the user agent (aka browser)?

Quote by LWC: So if "left side" is empty, that means there's no "official" referer - which means someone who came in manually or via favorites, but also someone who shut off his/her referer's header.

If you check what I quoted from my logfiles above, you'll see that those also came in without any referrer (that's the "-" bit), which is common for search engine robots.

bye, Dirk

36 37 Quote

02/06/05 01:25pm

Status: offline

LWC

Forum User

Full Member

Registered: 02/19/04

Posts: 818

Your format is the "combined" one, which I guess is the most popular one, but there are others like mine - two separate logs - referrer log and browser log.

But if they have no referer, what's inktomisearch.com?

Anyway, so what you're saying is that spiders, respected (e.g. Google, MSNBot, etc.) or otherwise (i.e. spammers) have no referers.
And they all come to stare at my blank calendar...

Well, maybe if I start writing a calendar, my site would be popular.

37 31 Quote

02/06/05 01:48pm

Status: offline

Dirk

Site Admin

Admin

Registered: 01/12/02

Posts: 13073

Location:Stuttgart, Germany

Quote by LWC: what's inktomisearch.com?

They're owned by Yahoo and provide the search results for them.

Everyone's talking about Google vs. MSN, but Yahoo has quietly aquired a lot of search engine technology (Inktome, Overture, Altavista, ...) to do their own thing ...

Quote by LWC: Anyway, so what you're saying is that spiders, respected (e.g. Google, MSNBot, etc.) or otherwise (i.e. spammers) have no referers.

Actually, spammers DO have referrers. Have you never wondered about all those porn and poker sites supposedly linking to you? Referrer spam is also very popular these days.

bye, Dirk

43 28 Quote

02/06/05 03:23pm

Status: offline

LWC

Forum User

Full Member

Registered: 02/19/04

Posts: 818

That's all nice and well, but, as you could see, when it comes to the calendar, the referers are empty.

Of course the log contained spam referers, but none that came to the calendar (usually just to the front page).

Is it a coincidence that usually the referer is logged, unless the accessed page is the calendar?

For example, if someone had kept it in his/her favorites and clicked it 500 times, that would have explained it.

Like you said, if it was Googlebot, it would say something like:

Text Formatted Code

http://googlebot.google.com - -> /calendar.php

36 42 Quote

02/06/05 03:38pm

Status: offline

Dirk

Site Admin

Admin

Registered: 01/12/02

Posts: 13073

Location:Stuttgart, Germany

Quote by LWC: That's all nice and well, but, as you could see, when it comes to the calendar, the referers are empty.

Yes, because that's what the search engine spiders do. I thought we already covered that?

bye, Dirk

36 34 Quote

02/06/05 07:35pm

Status: offline

LWC

Forum User

Full Member

Registered: 02/19/04

Posts: 818

Then I assume "inktomisearch.com" was the visitor and not the referer (which I mistaked it for in the earlier posts).
So in my case, it would have appeared only in the browser log.

Basically, that means that with separate logs, like I have, one can never
match referers with visitors.

Which means one can't pin point the visitors (that have empty referers) that visit the calendar...

35 32 Quote

03/07/05 12:56pm

Status: offline

tstockma

Forum User

Full Member

Registered: 07/22/03

Posts: 169

Some answers, and a question

Spiders and Search Engines try to find all your pages, and every day on your calendar is a link, even if you have nothing listed. They'll cheerfully spend eternity following up all the "links" your empty calendar contains.

You can use robots.txt to exclude search engines from your calendar entirely, but if you have some events, you'll exclude those events from the SEs.

My question: how can I turn off the link to a day on the main calendar page, if there's nothing listed? That link is what Calendar currently uses to allow entry of a new event, but could we set up a more generic "add callendar event" function that we could exclude with robots.txt, and you have to specify your date when you enter that process?

That would eliminate the SE hits we currently see. (It would also help my "broken links" searcher considerably, it's stupid and never stops looking at empty calendar days.)

Sorry if this has been asked before...thanks for any comments!

Tom
www.southparkcity.com

31 32 Quote

03/08/05 02:58pm

Status: offline

LWC

Forum User

Full Member

Registered: 02/19/04

Posts: 818

First of All, Dirk would be glad to know that my access log now also shows referrers (in addition to the specialized referrers log...). And I think you're right - a sample check proved that Googlebot is the main visitor of the calendar.

But the sample check also revealed that there's a MUCH (and I mean MUCH) worse file - submit.php! And again, it's Googlebot that won't stop visiting it!

Which brings us to tstockma...
[quote ...a snippet from my robots.txt]
User-agent: *
.
.
.
Disallow: /calendar.php
.
.
.
Disallow: /submit.php
.
.
.
[/quote]
And yet, these two are bombed with hits...

Before you blame the file, I try to test my robots.txt with a personal search engine from time to time to make sure it's valid.

So if even a "respected" bot like Google's ignores robots.txt anyway, I don't know if it's worth to come out with a solution to your request.

BTW, I've known for a long time that Google ignores my robots.txt file (and never managed to change that), but what I didn't know was how much traffic its bot causes those forbidden files!

30 42 Quote

03/08/05 03:12pm

Status: offline

Dirk

Site Admin

Admin

Registered: 01/12/02

Posts: 13073

Location:Stuttgart, Germany

Quote by LWC: BTW, I've known for a long time that Google ignores my robots.txt file (and never managed to change that), but what I didn't know was how much traffic its bot causes those forbidden files!

Hmm, I would be surprised if it ignored the robots.txt.

Are you sure your robots.txt is syntactically correct? Did you check it?

Also, are you sure it's really Googlebot and not some other bot claiming to be Googlebot? Check the IPs it's coming from - they should all belong to Google.

bye, Dirk

32 30 Quote

03/08/05 04:33pm

Status: offline

LWC

Forum User

Full Member

Registered: 02/19/04

Posts: 818

[quote ...a sample from the access log]
crawl-66-249-65-137.googlebot.com - - [06/Mar/2005:00:06:19 -0700] "GET /submit.php?type=event&mode=&month=04&day=11&year=2002&hour=16 HTTP/1.1" 200 14840 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
crawl-66-249-65-137.googlebot.com - - [06/Mar/2005:00:21:17 -0700] "GET /submit.php?type=event&mode=&month=05&day=07&year=2002&hour=14 HTTP/1.1" 200 14840 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
crawl-66-249-65-137.googlebot.com - - [06/Mar/2005:00:28:14 -0700] "GET /submit.php?type=event&mode=&month=07&day=26&year=2002&hour=23 HTTP/1.1" 200 14840 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
[/quote]
What you see there can go for pages!
Hmm, when I look at it, Googlebot never uses referrers.

And your site has approved my robots.txt

31 36 Quote

03/08/05 04:49pm

Status: offline

Dirk

Site Admin

Admin

Registered: 01/12/02

Posts: 13073

Location:Stuttgart, Germany

Yep, that's Googlebot. In that case I think you should email them and tell them that Googlebot has been a bad boy.

If you go to this URL, it already has an option "Googlebot is overloading my servers".

Keep us posted, I'd be interested in their response.

bye, Dirk

29 43 Quote

03/08/05 05:31pm

Status: offline

LWC

Forum User

Full Member

Registered: 02/19/04

Posts: 818

I feel there's too much risk involved.

The "overloading" form states sending the form would lead to Google coming less to my site.
That's not what I want at all! My site is not updated enough in Google as it is (the problems are ignoring my robots.txt and never neglecting outdates pages)...

These are probably automated forms so even if I tell them that, the only one who'd read it is some robot, which won't care for anything I write other than I filled a "please send me less traffic" form.

31 35 Quote

03/08/05 10:56pm

Status: offline

RickW

Forum User

Full Member

Registered: 01/28/04

Posts: 240

Location:United States

Do what I did and remove the calendar files:

http://www.geeklog.net/forum/viewtopic.php?forum=3&showtopic=38627

I had the same problem - I wasn't using the calendar, it was just an empty feature I didn't want visitors to see, and google was spidering it like crazy. I had 1000+ calendar pages in link:to-my-site.com, but they had zero page rank because it was all redundant and I had a feeling it was actually penalizing my site (google is sensitive to spamming).

This is what my robots.txt file looks like:

Text Formatted Code
User-agent: *

Disallow: /comment.php

Disallow: /submit.php

Disallow: /profiles.php

Disallow: /calendar.php

Disallow: /usersettings.php

Disallow: /forum/createtopic.php

Even after a proper robots.txt file, and removing the calendar, if you want to cover all your bases you could add this to your header template in with the other meta tags:

Text Formatted Code

<?php

$this_page = basename($_SERVER['PHP_SELF']);

switch ($this_page)

{

    case "comment.php":

    case "submit.php":

    case "profiles.php":

    case "calendar.php":

    case "usersettings.php":

    case "createtopic.php":

    echo "<meta name=\"robots\" content=\"noindex, nofollow, noarchive\">";

    break;

    default:

    echo "<meta name=\"robots\" content=\"index, follow\">";

}

?>

It works, just for kicks I'm testing it on my temp site now.

www.antisource.com

36 34 Quote

03/08/05 11:36pm

Status: offline

ronack

Forum User

Full Member

Registered: 05/27/03

Posts: 612

I just wanted to share my robots.txt file. I watched that blasted googlebot and really don't want it in some areas. Oh and the MSNbot was the worst, I had to Disallow the MSNbot completly. When the MSNbot hit my sites. It stayed for a very long time, hit every day in the calendar, every photo (one of my sites has over 1000 photos. My server slowed down to a crawl and sometimes froze up. Since I added the robots.txt file I rarely have a slow down or lockup.

Text Formatted Code
User-agent: *

Disallow: /comment.php

Disallow: /submit.php

Disallow: /forum/createtopic.php

Disallow: /calendar.php

Disallow: /admin/

Disallow: /layout/

Disallow: /images/

Disallow: /stats/

Disallow: /search.php

User-agent: msnbot

Disallow: /

38 36 Quote

03/09/05 05:59am

Status: offline

LWC

Forum User

Full Member

Registered: 02/19/04

Posts: 818

Well, if we're all sharing, here's my full robots.txt:

Text Formatted Code

User-agent: Googlebot

Disallow: /*.php/*/print$

User-agent: * 

Disallow: /calendar.php

Disallow: /comment.php

Disallow: /index.php?topic=

Disallow: /pollbooth.php?qid=

Disallow: /portal.php

Disallow: /profiles.php

Disallow: /search.php

Disallow: /submit.php

Disallow: /stats.php

Disallow: /users.php

Disallow: /admin/

Disallow: /chatterblock/

Disallow: /filemgmt/brokenfile.php

Disallow: /filemgmt/downloadhistory.php

Disallow: /filemgmt/ratefile.php

Disallow: /filemgmt/viewcat.php

Disallow: /filemgmt/visit.php

Update:
I've just checked and most of these are on Google despite the fact that they're mentioned in my robots.txt!

Ok, when I think about it, I've recently switched my site from http://lior.weissbrod.com/lior (synonymous with http://www.weissbrod.com/lior - i.e. just a simple redirect) to simply http://lior.weissbrod.com (simulating virtually a stand alone site)
And turns out that most of the forbidden pages are from the former - now outdated - site!

When I did the switch, I assumed that Google would be smart enough to emit all the outdated "/lior/" results by itself because they all now give 404 errors. I guess I was wrong...

Well, before you think this explains everything, this is just the general case. In some cases, Google indexes them on the new site (for example, the calendar) so it really does ignore my robots.txt (at least sometimes).

35 38 Quote

First
Previous
1
2
3
Next
Last

New Topic Post Reply

All times are EDT. The time is now 05:12 am.

Normal Topic
Sticky Topic
Locked Topic

New Post
Sticky Topic W/ New Post
Locked Topic W/ New Post

View Anonymous Posts
Able to post
Filtered HTML Allowed
Censored Content

Geeklog Forums

Calendar? What calendar?

LWC

Dirk

LWC

Dirk

LWC

Dirk

LWC

Dirk

LWC

Dirk

LWC

tstockma

LWC

Dirk

LWC

Dirk

LWC

RickW

ronack

LWC

Search

Resources

About

Getting started

Support

Development

Topics

User Functions

What's New

Articles last 4 weeks

Comments last 4 weeks

Pages last 4 weeks

Links last 4 weeks

Downloads last 4 weeks

Geeklog Forums

Page navigation

Page navigation

Search

Resources

About

Getting started

Support

Development

Topics

User Functions

What's New

Articles last 4 weeks

Comments last 4 weeks

Pages last 4 weeks

Links last 4 weeks

Downloads last 4 weeks