Welcome to Geeklog Friday, November 24 2017 @ 06:11 am EST


Status: offline

RickW

Forum User
Full Member
Registered: 28/01/2004
Posts: 240
Location:United States
Quote by LWC:
When I did the switch, I assumed that Google would be smart enough to emit all the outdated /lior/ results by itself. I guess I was wrong...


Why would Google consider that smart? It crawls links, indexes them, then reindexes what it already has in it's database over and over. You never explicitly told Google to drop it from their index. Over time, Google would have seen the duplicate content, which neither is outdated because they are the same content, and it would have decided first to drop page rank to 0, then eventually might drop them from the results.

Just admit that you made a web design mistake, stop trying to blame everyone and everything but yourself, learn and move on.

Put this in your .htaccess file (create one if you don't have one):

PHP Formatted Code

Redirect 301 /lior/ http://lior.weissbrod.com/

 


If Google tries to crawl any of your pages in the subdirectory instead of through the subdomain, it's going to be redirected to your subdomain and told that the page has permanently moved - and any of those pages in their index will be dropped really quick.
www.antisource.com

Status: offline

LWC

Forum User
Full Member
Registered: 19/02/2004
Posts: 818
Ah, but you underestimate me...
PHP Formatted Code

RewriteCond %{HTTP_HOST} !^lior\.weissbrod\.com$ [NC,OR]
RewriteRule ^.*$ http://lior.weissbrod.com/ [R,L]

 

(from my old .htaccess)
See? Google has no excuse...

Alright, alright, I'm willing to admit that I think the default R is 302 (MOVED TEMPORARILY) while you pointed out 301 (or, as I've found out, the easier to remember word "permanent").

I've just changed it into
PHP Formatted Code

RewriteCond %{HTTP_HOST} !^lior\.weissbrod\.com$ [NC,OR]
RewriteRule ^.*$ http://lior.weissbrod.com/ [R=permanent,L]


 

Let's see if Google catches on.

Status: offline

RickW

Forum User
Full Member
Registered: 28/01/2004
Posts: 240
Location:United States
Quote by LWC:
I've just changed it into
[quote ...a snipper from my new .htaccess]
RewriteCond %{HTTP_HOST} !^lior.weissbrod.com$ [NC]
RewriteRule ^.*$ http://lior.weissbrod.com [R=permanent,L]

Let's see if Google catches on.[/QUOTE]

What is it you are accomplishing with that code?

edit:
Okay if I understand what you're doing, if I tried to access the domain in any other way (just weissbrod.com or www.weissbrod.com) then I will be redirected to lior.weissbrod.com.

What I'm confused about, is above you made the statement:

I've recently switched my site from http://lior.weissbrod.com/lior to simply http://lior.weissbrod.com


Maybe you mean that you switched from WWW.weissbrod.com/lior to lior.weissbrod.com? If that is the case, then yes by using a 302 temporary redirect you are actually telling Google NOT to visit your new subdomain, and if googlebot does happen to crawl the new links, it will give precidence to your old links.

Oops.
www.antisource.com

Status: offline

LWC

Forum User
Full Member
Registered: 19/02/2004
Posts: 818
I've used to redirect outdated pages with a 302 error ("temporary redirect") and from now on, thanks to you, I'll redirect them with a 301 error ("permanent redirect").

If what you don't understand is the entire command, then it's called "mod_rewrite". Actually, it is your command that I'm not familiar with, but "rewrite" has many redirect options and they're not only based on http_host.
For example, I use it to (now permanently...) redirect referrer spam (BTW, I can't use "storyid:" because it adds article.php to the beginning of the URL).

Status: offline

RickW

Forum User
Full Member
Registered: 28/01/2004
Posts: 240
Location:United States
See above - you beat me to a reply before I finished my post edit.
www.antisource.com

Status: offline

LWC

Forum User
Full Member
Registered: 19/02/2004
Posts: 818
Actually, I've made a mistake, but not the one you think.

"www.weissbrod.com" (or "weissbrod.com") is a site on its own (I've made it, BTW, that the latter redirects to the former via the same mod_rewrite method in .htaccess).

"lior.weissbrod.com" is a subsite of that domain, but the ISP's server is smart enough to make it virtually a stand alone site (it has its own robots.txt, .htaccess, etc.).

However - and here lies the problem - it wasn't always like that. Before my ISP upgraded to their smart server, I've had to resort to a little CGI script (called "DomainDirector") that just made a simple redirect to "www.weissbrod.com/lior".
To make things even more complicated, during that time a new version of that CGI script soon came out and made it look better by redirecting it to "lior.weissbrod.com/lior" (the first "lior" is fake. Unlike now, the "subsite" was still "www").

...So when I've quoted my .htaccess in the previous post (after upgrading it using your suggestion), I only fixed the "www.weissbrod.com/lior" problem!

Alas, since the new version of that CGI script came almost as soon as I've started to use the script in the first place, that version of the site didn't last long - so it barely matters anyway (only 1 match in Google...).

The big problem is "lior.weissbrod.com/lior" and my .htaccess currently has no solution for that.

But since I no longer use "/lior/", every page that Google has with it throws back a 404 message!
This time don't blame me - Google's own FAQ states that "you don't need to bother us to remove your pages - just throw back 404 messages!"
Yet it has been a long time now and tons of "/lior/" pages still show up!

So my question is this - should I just give up on those 404 messages and use
PHP Formatted Code

RewriteCond %{HTTP_HOST} !^lior\.weissbrod\.com$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/lior/ [NC]
RewriteRule ^.*$ http://lior.weissbrod.com/ [R=permanent,L]

 
?

In other words, 404 (like Google supposedly wants, but doesn't seem to respect) or 301?

Status: offline

RickW

Forum User
Full Member
Registered: 28/01/2004
Posts: 240
Location:United States
Quote by LWC:
So now my question is this - is the new robots.txt enough?
Or do you think I should use the new robots.txt (now with "/lior/" in it), but also use
[quote in .htaccess]
RewriteCond %{REQUEST_URI} ^/lior/ [NC]
RewriteRule ^.*$ http://lior.weissbrod.com [R=permanent,L]
?[/QUOTE]

I would use the 301 redirect. I prefer the code I mentioned, it's just much cleaner looking. Also I'd suggest the php meta robots script I worked up for you to ensure your calendar and other files aren't getting indexed in case the robots.txt file isn't working 100%. The noindex and noarchive will promptly drop those pages from google.

I just checked out your 404 error via one of those links still in Google's index. It loads up Geeklog's 404.php page. Perhaps that is the problem? The page says 404, but the redirect that Geeklog uses might not indicate that to a search engine.

Dirk, if you're still following this thread, can you elaborate on the 404? Where is that code stored?

www.antisource.com

Status: offline

LWC

Forum User
Full Member
Registered: 19/02/2004
Posts: 818
First of all, I've updated my post (and also the one where I gave my entire robots.txt).
Basically, I've realized that it's absured to enter "/lior/" in my robots.txt because robots.txt is supposed to store existing links for files and folders.
Anyway, if you look at my new .htaccess quote, you'd see why it's probably better with "rewrite" (because it's two lines and uses "or" - can you do this with your method?).

Now, Dirk has nothing to do with 404 (other than providing nice text). Although some sites probably default to 404.php, you still better ensure this with, again, .htaccess:
[quote another snippet from my .htaccess]
ErrorDocument 404 /404.php
[/quote]
So it has to send back a real 404 error. Do you have a site or a software that shows you http responses (that means official errors and not just the text output)? I've tried one and indeed it officialy said "404".

Status: offline

RickW

Forum User
Full Member
Registered: 28/01/2004
Posts: 240
Location:United States
Quote by LWC: First of all, I've updated my post (and also the one where I gave my entire robots.txt).
Basically, I've realized that it's absured to enter "/lior/" in my robots.txt because robots.txt is supposed to store existing links for files and folders.
Anyway, if you look at my new .htaccess quote, you'd see why it's probably better with "rewrite" (because it's two lines and uses "or" - can you do this with your method?).

Now, Dirk has nothing to do with 404 (other than providing nice text). Although some sites probably default to 404.php, you still better ensure this with, again, .htaccess:
[quote another snippet from my .htaccess]
ErrorDocument 404 /404.php

So it has to send back a real 404 error. Do you have a site or a software that shows you http responses (that means official errors and not just the text output)? I've tried one and indeed it officialy said "404".[/QUOTE]

It's not absurd, because if Google still has those pages in it's index, it's going to try and recrawl them. For some reason the cached version is not getting dropped which is why they're still there. That's what the noarchive is for.

And I'm under the impression that Geeklog does do something with the 404.php page, because it forwards you there if an article ID doesn't exist. The code used to parse out the "static" looking links allows for all sorts of fake directories to be inserted in the path if you wanted. I could be wrong.

I don't know why you think your 3 lines are code are more efficient than:

Redirect 301 /lior/ http://lior.weissbrod.com/

Personal preference I guess.
www.antisource.com

Status: offline

LWC

Forum User
Full Member
Registered: 19/02/2004
Posts: 818
robots.txt is not supposed to be matched against a (search engine's) predefined list, but against links found in the current crawl.
For example, Googlebot comes into my site, sees a link, checks if robots.txt allows it to be indexed, if so - indexes it, if not - ignores it, checks another link and so on...at least that's how it's supposed to be.

What article ID? Try just article. An article, dynamic or not, is still a page. If a page is not there, says .htaccess, show them the page called 404.php

I think you get confuse by the fact that 404.php mentions which page is missing. It just takes that from http_referer. Before I entered it in .htaccess, my site just showed an internal default text ("file x is missing"). Ok, maybe, just maybe, Googlebot somehow ignores the 404 and thinks 404.php is just a - probably temorary - redirect.

And Google's cached versions are Google pages - not mine (except images, because Google rudely hotlinks them from the original pages).


Redirect 301 /lior/ http://lior.weissbrod.com/

And what about "www.lior.weissbrod.com", etc.? I take less chances by just using an "if not my chosen URL" statement.

Status: offline

Dirk

Site Admin
Admin
Registered: 12/01/2002
Posts: 13073
Location:Stuttgart, Germany
Quote by LWC: But since I no longer use "/lior/", every page that Google has with it throws back a 404 message!
This time don't blame me - Google's own FAQ states that "you don't need to bother us to remove your pages - just throw back 404 messages!"
Yet it has been a long time now and tons of "/lior/" pages still show up!

In my experience, 404s don't really help. I've resorted to 410 ("Gone") in cases where I really wanted to get rid of the old URL and use "redirect permanent" (301) everywhere else.

bye, Dirk

Status: offline

Dirk

Site Admin
Admin
Registered: 12/01/2002
Posts: 13073
Location:Stuttgart, Germany
Quote by LWC: Ok, maybe, just maybe, Geeklog somehow ignores the 404 and thinks 404.php is just a - probably temorary - redirect.

I assume you meant "Googlebot" here, not "Geeklog"?

When set up with ErrorDocument 404 /404.php, Googlebot will still get a 404 response code (instead of a 200) for Geeklog's 404 page. There's no way it could mistake that for a redirect.

bye, Dirk

Status: offline

LWC

Forum User
Full Member
Registered: 19/02/2004
Posts: 818
Alright, I've started to use the aforementioned new .htaccess (that permanently redirects "/lior/" URLs into the new main page too).

About 404 (fixed that error - thanks...), that's what I thought, but what can I tell you? Google just won't remove my dead pages...

Status: offline

RickW

Forum User
Full Member
Registered: 28/01/2004
Posts: 240
Location:United States
Quote by LWC:What article ID? Try just article. An article, dynamic or not, is still a page. If a page is not there, says .htaccess, show them the page called 404.php

I think you get confuse by the fact that 404.php mentions which page is missing. It just takes that from http_referer. Before I entered it in .htaccess, my site just showed an internal default text ("file x is missing"). Ok, maybe, just maybe, Geeklog somehow ignores the 404 and thinks 404.php is just a - probably temorary - redirect.


Yep you're right about geeklog and the 404, my bad. I just took it out of the htaccess to test it. With it out, if I put in a bad article id then I get redirected to the main page, or if I put in a bad path then I get a normal 404 error.

I can see what you're saying about the www.

Here's a theory why Google won't remove those pages. Those pages no longer have links going to them, so googlebot isn't crawling to them, so it's not encountering them in order to see the 404, so even with a meta noarchive it won't help. Your robots.txt is also telling google not to revisit those pages, so it's going to remain forever stale in it's index.

In the bottom of your main page, put in a really small link that nobody will notice that points to the old main page you want to get rid of. Take it out of your robots.txt for now. Googlebot will crawl it and go OOPS, 404, delete from index.
www.antisource.com

Status: offline

LWC

Forum User
Full Member
Registered: 19/02/2004
Posts: 818
RickW, quoting is for specific relevant issues. Don't abuse it...why quoting my entire posts - especially when you quote them only one post after I posted them (pardon the pun)? this topic is long enough as it is. Besides, I constantly update my posts anyway. So can you please edit some of your posts and erase some of those long quotes (when you quote an entire post)? Believe me, you'll thank me later if you come back to this topic.

Status: offline

RickW

Forum User
Full Member
Registered: 28/01/2004
Posts: 240
Location:United States
angry
Quote by LWC:Believe me, you'll thank me later if you come back to this topic.


I don't plan on coming back to this topic, I've lost my patience talking to you.
www.antisource.com

Status: offline

LWC

Forum User
Full Member
Registered: 19/02/2004
Posts: 818
RickW, why?

Hmm, strange. If someone tries to access story X (with "/lior/"), they now get to the topic in which the story is in and the location bar says "lior.weissbrod.com/?story=X" .

Maybe I should just use G (gone) instead of 301, like Dirk has suggested:
PHP Formatted Code

RewriteCond %{HTTP_HOST} !^lior\.weissbrod\.com$ [NC,OR]
RewriteCond %{REQUEST_URI} ^/lior/ [NC]
RewriteRule ^.*$ http://lior.weissbrod.com/ [G,L]

 

Update:
As for your theory, RickW (are you still here? Don't go...) - if that were true, Google would be completly worthless. When Google updates the index, it divides it into 2 parts: verifing existing pages and crawling in search for brand new pages. If they only did the latter part, they'd have not 8 billion, but 8 googol pages in their index...of course, the first part is not perfect.

Status: offline

RickW

Forum User
Full Member
Registered: 28/01/2004
Posts: 240
Location:United States
worried
Quote by LWC: RickW, why?


Sorry. It just seems like this entire thread has been a 1 way discussion, you're right and everyone and google is wrong. Maybe that's not your intention, so I apologize for being rude. Just acknowledge people are helping you instead of disagreeing with everything - if you knew all the answers then you wouldn't have started the thread.
www.antisource.com

Status: offline

LWC

Forum User
Full Member
Registered: 19/02/2004
Posts: 818
Not sure what you mean...I've applied the 301 instead of 302 suggestion (and thanked you for it). Now I've applied the 410 instead of 301 (and thanked Dirk for it). What exactly wasn't I convinced with? robots.txt? Because I didn't agree it was used for pre-defined lists? Well, sorry, I still don't think so. Or that 404.php is not called upon by Geeklog? But you admitted I was right.
I think what really happened is that you got upset over my "suggestion" not to quote whole posts. If that's so, I apologize if I sounded rude. I really didn't mean to.

Status: offline

RickW

Forum User
Full Member
Registered: 28/01/2004
Posts: 240
Location:United States
Quote by LWC:If that's so, I apologize if I sounded rude. I really didn't mean to.



www.antisource.com

All times are EST. The time is now 06:11 am.

  • Normal Topic
  • Sticky Topic
  • Locked Topic
  • New Post
  • Sticky Topic W/ New Post
  • Locked Topic W/ New Post
  •  View Anonymous Posts
  •  Able to post
  •  Filtered HTML Allowed
  •  Censored Content