Posted on: 11/03/05 10:10am
By: Anonymous (Munazip)
I i sent my geeklog site Zimdaily.com to Google News for crawling and this is the response i have just received.
Thank you for your inquiry regarding inclusion in Google News. After some
investigation, we've found that our system cannot crawl your articles
because of the format of their URLs. In order to have your articles
crawled by Google News, their URLs must contain a number consisting of at
least three digits.
For example, our news crawler would not crawl articles with the following
URLs:
www.google.com/news/article23.html
www.google.com/lemurs_in_the_mist.html
It would crawl these pages:
www.google.com/news/08112003/article.html
www.google.com/news/lemurs_in_the_mist/23467.html
An example of a site that we are able to crawl successfully is
http://english.chosun.com. Please note that each article on this site has
a highly unique URL.
We apologize for this limitation of our system. If you are able to make
changes on your end to allow us to crawl your content, please let us know.
Regards,
The Google Team
Someone out there please help
Google News Cant Crawl My Site
Posted on: 11/03/05 11:12am
By: eyecravedvd
Google News Cant Crawl My Site
Posted on: 11/03/05 12:40pm
By: Anonymous (Munazip)
Have done all that, you can check my links, i have done it what is said in your answer.
Google News Cant Crawl My Site
Posted on: 11/03/05 12:54pm
By: eyecravedvd
Interesting that other peoples sites can get added, but yours won't. I wonder if it has to do with your naming of the stories?
Google News Cant Crawl My Site
Posted on: 11/03/05 01:02pm
By: Anonymous (Munazip)
google sent me the e-mail above am trying to understand what they mean by
It would crawl these pages:
www.google.com/news/08112003/article.html
www.google.com/news/lemurs_in_the_mist/23467.html
and
For example, our news crawler would not crawl articles with the following
URLs:
www.google.com/news/article23.html
www.google.com/lemurs_in_the_mist.html
Here is an example link from my site http://zimdaily.com/news2/article.php/migrants_zim
How should i name the articles for google news to be happy, i named this one migrants_zim, any suggestions??
Google News Cant Crawl My Site
Posted on: 11/03/05 02:27pm
By: Dirk
Sounds like they insist on numbers in the story's URL (which I find odd, but then again I've never tried to submit a site to Google News).
bye, Dirk
Google News Cant Crawl My Site
Posted on: 11/03/05 02:50pm
By: Anonymous (Munazip)
Dirk, you mean i shouldn't the numbers?? and the other does the .html at the end make any difference because it looks like the example they have given me have .html and geeklog is php....
Google News Cant Crawl My Site
Posted on: 11/03/05 03:10pm
By: Dirk
[QUOTE BY= Munazip] In order to have your articles crawled by Google News, their URLs must contain a number consisting of at least three digits.[/QUOTE]
At first I thought this meant that they wanted numbers in your URLs, e.g. http://zimdaily.com/news/article.php/migrants_zim12345 or something like that.
But I just noticed the "news2" in your URL - maybe that's what they mean: They want at least three digits there. "news123" would be okay, "news12" wouldn't.
No guarantees. If in doubt, try asking them again.
bye, Dirk
Google News Cant Crawl My Site
Posted on: 11/03/05 03:44pm
By: Anonymous (Munazip)
Thanks DIRK ...have sent them an e-mail and will update you on what they say.
Thank you again
Google News Cant Crawl My Site
Posted on: 11/04/05 04:16am
By: Marites
I have been battling with this one for over a year. Our site posts 50 or so news items daily news.google cannot use the GL URL's which is not good as we are the oldest and busiest news web site in our country.
Google (the search engine side) however, have no problem with the GL URL's and our stories 25000 plus can be located on that part of Google without problem.
We spent a great deal of time and many exchanges of mail with news.google and in the end had a custom php page written which google bot many times a day to pick up URL's.
Now here is a result from a Google search using the following:
"security in Philippine skies"
and the result
30 - Security nightmare with unguarded Philippine skies ... Unguarded Philippine skies is no longer a wishful thinking. For military and strategic defense ...
news.balita.ph/html/article.php/20051030144608086 -
25k - Cached - Similar pages - Filter
I did not respond straight away to your posting as I emailed a friend at Google first and he says their bot has not problem with any url numbers letters, or combination of letters and numbers. So I assume you are trying to get links on news.google which as I say is not as cut and dry as with plain google..
I realise this reply does not address your problem directly only give you an insight to how we approached it.
I will say I upgraded the lately and all the URL's changed - Google put these changed URL's in place within 24 - 36 hours and I did not bother to write the script some said would be required for Google to show the new URL's.
I do understand your frustrations as they say I have been there and bought the tee-shirt.
Can I suggest you go to
my site[*2] look at the top and click Headlines - it is a page similar to this which Google access each day to extract our news.
It works for us so should in theory work for you also.
Marites
Google News Cant Crawl My Site
Posted on: 11/04/05 12:08pm
By: Anonymous (Munazip)
Marites thank you for your post and have been to your site. It looks fine. What do you suggest i do with my articles so that google news can be happy??
Google News Cant Crawl My Site
Posted on: 11/04/05 01:03pm
By: Marites
With Google.com (the search engine) there is no problem they will find and link you.
news.google is a different matter as they say they cannot read the GL URL's all I can suggest is you set up a page similar to our 'Headlines' and ask Google news if they will use it.
Over the past few weeks out of 500 plus stories these are the only ones news.google has picked up:
See here[*3]
I did bring this up with Dirk over a year ago - as there are a limited number of users of GL that have the need to have their items listed on news.balita I guess it is difficult for Dirk to know what news.google need in order to read GL URL's. I really do not understand why or what the difference is with the URL on google.com and news.google.
All I can do is wish you luck and hope you have more success than I have.
Any thoughts Dirk ? or better still any suggestions.
I am always around.
Late note: I have just looked back over my correspondence with news.google it seems they prefer a url to be like this
h**p://news.balita.ph/html/article//20051104110320755.(extention)
rather than:
h**p://news.balita.ph/html/article.php/20051104110320755 as with GL.
Perhaps it is just the .(extension) that is missing (ie htm, http, php .... )
Marites
[QUOTE BY= Munazip] Marites thank you for your post and have been to your site. It looks fine. What do you suggest i do with my articles so that google news can be happy??[/QUOTE]
Google News Cant Crawl My Site
Posted on: 11/04/05 03:12pm
By: Dirk
[QUOTE BY= Marites] Perhaps it is just the .(extension) that is missing (ie htm, http, php .... )[/QUOTE]
There should be nothing stoping you from using ".html" or ".php" as part of the story's ID ...
bye, Dirk
Google News Cant Crawl My Site
Posted on: 11/04/05 05:01pm
By: samstone
OK, here is my little help with what my little brain can come up with.
I think Google has given you guys a canned answer without investigating specificly what their system can't list you. They list my site with no problem so far. I use basic GL installation without even having url rewrite turned on. The only tweek is removing the hyperlink from the full article's title.
- For both Zimdaily.com and Balita.com, I think the problem could be that you forward your domain root to a sub directory (new2 or html).
- For Balita.com, you would need to remove the hyperlink from the full article title.
Other than that I don't see any problem.
Sam
Google News Cant Crawl My Site
Posted on: 11/04/05 06:09pm
By: Marites
Dirk A year ago I asked you about using .php or .html and how this could be done automatically - you may or may remember you told me that this was not possible. Bear in mind we do not post stories by hand they are posted automatically so renaming 50 - 100 stories a day would not be practicable. (I run 6 web sites all news and GL based).
So now you are saying there is nothing to stop this - can you say how - we can hack lib.common if need be.
Tess
[QUOTE BY= Dirk] [QUOTE BY= Marites] Perhaps it is just the .(extension) that is missing (ie htm, http, php .... )[/QUOTE]
There should be nothing stoping you from using ".html" or ".php" as part of the story's ID ...
bye, Dirk[/QUOTE]
Google News Cant Crawl My Site
Posted on: 11/04/05 06:14pm
By: Marites
Can't vouch for zimdaily but Balita worked with news.google on this one for almost 3 to 4 months as they wanted our news. We set up a test rig with Mambo and another paid CMS and both were readable by there bot. For various reasons our organization and writers prefer GL and want to stick with it.
I don't think it is the sub-directory structure as our other sites use differing methods of layout ie without the html and it does not make the slightest difference to the news.goggle bot.
Tess
[QUOTE BY= samstone] OK, here is my little help with what my little brain can come up with.
I think Google has given you guys a canned answer without investigating specificly what their system can't list you. They list my site with no problem so far. I use basic GL installation without even having url rewrite turned on. The only tweek is removing the hyperlink from the full article's title.
- For both Zimdaily.com and Balita.com, I think the problem could be that you forward your domain root to a sub directory (new2 or html).
- For Balita.com, you would need to remove the hyperlink from the full article title.
Other than that I don't see any problem.
Sam
[/QUOTE]
Google News Cant Crawl My Site
Posted on: 11/04/05 06:26pm
By: samstone
How about the hyperlink on the full story, which could get the bot caught in a loop, since it is linked to itself.
I am just helping you brain storm. Since mine works out of the box with the hyperlink on full story title. It makes me think that might be something you did, rather than you didn't, that changes the course.
Sam
Google News Cant Crawl My Site
Posted on: 11/04/05 08:00pm
By: Marites
Bit lost on this one ... surely each title has a hyperlink and that should be sufficient. Our government partner has their hyperlink on 'see more' also using a CMS (not GL) that can be read by news.google.
As I say with news.google we tried dozens of methods whereas zimdaily may have got a 'canned' answer we did not news.google even tried to write scripts to help. They are adament GL is at fault .... I am not certain what the problem is but I would have thought if there was a solution news.google programmers would have found it for us.
Having said that both CNN and the BBC are able to read our URL's and often link to our news stories.
Thanks tell me what you had in mind to hyperlink and I will get someone to try.
Tess
[QUOTE BY= samstone] How about the hyperlink on the full story, which could get the bot caught in a loop, since it is linked to itself.
I am just helping you brain storm. Since mine works out of the box with the hyperlink on full story title. It makes me think that might be something you did, rather than you didn't, that changes the course.
Sam[/QUOTE]
Google News Cant Crawl My Site
Posted on: 11/04/05 09:37pm
By: ByteEnable
[QUOTE BY= Marites]
As I say with news.google we tried dozens of methods whereas zimdaily may have got a 'canned' answer we did not news.google even tried to write scripts to help. They are adament GL is at fault .... I am not certain what the problem is but I would have thought if there was a solution news.google programmers would have found it for us.
[/QUOTE]
Your site has numerous html problems:
http://news.balita.ph/html/article.php/20051104110320755
# Error Line 318 column 138: end tag for element "A" which is not open.
# Error Line 361 column 141: there is no attribute "FRAMESPACING".
# Error Line 361 column 160: value of attribute "FRAMEBORDER" cannot be "NO"; must be one of "1", "0".
# Error Line 364 column 172: value of attribute "FRAMEBORDER" cannot be "NO"; must be one of "1", "0".
# Error Line 364 column 232: there is no attribute "ALLOWTRANSPARENCY".
# Error Line 367 column 194: end tag for element "A" which is not open.
# Error Line 368 column 29: end tag for element "NOLAYER" which is not open.
# Error Line 369 column 100: end tag for element "ILAYER" which is not open.
# Error Line 378 column 11: there is no attribute "SRC".
# Error Line 378 column 116: there is no attribute "WIDTH".
# Error Line 378 column 129: there is no attribute "HEIGHT".
# Error Line 378 column 146: there is no attribute "VISIBILITY".
# Error Line 378 column 162: there is no attribute "ONLOAD".
# Error Line 378 column 269: element "LAYER" undefined.
# Error Line 461 column 29: required attribute "TYPE" not specified.
# Error Line 471 column 71: required attribute "TYPE" not specified.
# Error Line 480 column 175: value of attribute "FRAMEBORDER" cannot be "NO"; must be one of "1", "0".
# Error Line 592 column 188: value of attribute "FRAMEBORDER" cannot be "NO"; must be one of "1", "0".
# Error Line 647 column 11: end tag for element "TABLE" which is not open.
Also on that page:
Your story title should be:
<h1>04 - Rescuers unable to find plane wreckage (with earlier report)</h1>
Not a url. Also your html is full of javascript and lots of unused whitespace.
Also your page is 167K in size. Google is not going to wait around too long for a page to load.
Byte
LinuxElectrons[*4]
Google News Cant Crawl My Site
Posted on: 11/05/05 04:31am
By: Dirk
[QUOTE BY= Marites] So now you are saying there is nothing to stop this - can you say how - we can hack lib.common if need be.[/QUOTE]
What I meant was that you could simply add ".php" or ".html" to the end of your story's ID (assuming this is really what Google News is looking for).
You didn't say how you post those news automatically. If you're relying on Geeklog to give them a story id (i.e. one of those numeric IDs like 20051105102630123), then you could hack COM_makesid (in lib-common.php) to add that extension (althoug that would then also apply to IDs for events and links).
Otherwise, you could modify whatever you use to post the news to add the extension there.
Again, I have no experience with getting a site into Google News so I don't know if this is really your problem. Others have pointed out other possible issues with your site, maybe you should look into those first ...
bye, Dirk
Google News Cant Crawl My Site
Posted on: 11/05/05 07:13am
By: Marites
The above blank posting actually shows the posting below to me but when submit pressed it comes up as blank. I cut and pasted it 2nd time it showed up.
Thanks ... basically I am using the Professional theme and as a matter of interest just set up an out of the box setup of 1.3.11 and run that through HTML Validate and get similar errors so it would seem that the theme has these errors already in place .... which is not good. Did you run it on Zim as my validator is throwing up also many errors. If you are running a site based on the Professional theme perhaps a similar test on that may also throw up errors.
The page size is not much more than that this site and google search do not run away from its size and within a hour google search (not news) have the article referenced. Although I consider the error issue very serious and I will see it is addressed early next week.
I sincerely thank you for running validate, something I should have done and not took it for granted that the Professional theme was error free.
Regards
Tess
I have just run the W3 validator and would make these comments ... 90% of the errors you relate come from the advertising bannercode which uses FRAMES. The majority of the other errors are actually caused by character problems, the origin of our text is Asia (i.e.) non SGML character number 147.
Our articles are emailed in from journalists using macs, pcs and such with the language set to who knows what. Our server on the other hand is set up correctly for US English/English in consequence differences in character codes throw up on a regular basis ... so I have discounted this type of error. The other 5 or 6 are sloppy code either by the theme or ourselves and we shall address them.
Funny news.google have never said the problem is W3 errors they concentrate only on URL problems I have written to the head of news.google who I have been dealing attaching a copy of your posting and will come back when he comments.
Google News Cant Crawl My Site
Posted on: 11/05/05 02:22pm
By: Marites
Thank you Dirk
I will get someone to check the addition of extensions in lib.common and see what happens. Not sure if a change at this time with 25000 stories in the archive would make sense as most would be without extension others with.
We have checked the errors most as said are language the 5 or six with real errors some caused by us and some in the theme templates have been corrected.
news.google say it is not our page size average 63 which is the problem as compared with CNN and BBC ours is quite compact. They also say the W3 errors should not raise a problem they are still ademant the problem is how GL produces the URL's. AS GL seems to be the only php type CMS that is not conforming to convention it is not worth news.google adjusting the crawler to conform, as they are aware we are the only major organization, (their terminology) using GL with such an input and output.
Our page hits for October were just short of 1.8 million with 700 plus posted news articles. 95% of our articles are emailed into GL through Sendmail and straight into the mysql and as stated originate from a variety of users set at who knows what language.
It is because of our unique content that news.google have spent so much time trying to get everything to work.
Dirk if you are interested I can send you the various technical responses google have sent ... but as I say the alterations needed are just not work the effort from GL development point of view.
Despite the problems Dirk I am sticking with GL - my additions to this thread were not by way of compaint but just to respond to our Zimbabwe collegue who I felt was wasting his time to expect news.google to index his site after the problems we have had.
Thanks as always Dirk.
Marites
[QUOTE BY= Dirk] [QUOTE BY= Marites] So now you are saying there is nothing to stop this - can you say how - we can hack lib.common if need be.[/QUOTE]
What I meant was that you could simply add ".php" or ".html" to the end of your story's ID (assuming this is really what Google News is looking for).
You didn't say how you post those news automatically. If you're relying on Geeklog to give them a story id (i.e. one of those numeric IDs like 20051105102630123), then you could hack COM_makesid (in lib-common.php) to add that extension (althoug that would then also apply to IDs for events and links).
Otherwise, you could modify whatever you use to post the news to add the extension there.
Again, I have no experience with getting a site into Google News so I don't know if this is really your problem. Others have pointed out other possible issues with your site, maybe you should look into those first ...
bye, Dirk[/QUOTE]
Google News Cant Crawl My Site
Posted on: 11/05/05 02:22pm
By: Anonymous (Munazip)
Thanks DIRK for your answer. I think Google News want a .html at the end of the Story ID, thats what they are saying exactly. Now have been to COM_makesid is lib.common and this is what i got.
* Makes an ID based on current date/time
*
* This function creates a 17 digit sid for stories based on the 14 digit date
* and a 3 digit random number that was seeded with the number of microseconds
* (.000001th of a second) since the last full second.
* NOTE: this is now used for more than just stories!
*
* @return string $sid Story ID
*
*/
function COM_makesid()
{
$sid = date( 'YmdHis' );
srand(( double ) microtime() * 1000000 );
$sid .= rand( 0, 999 );
return $sid;
}
Now i need to add the .html automatically in COM_makesid so that i wont have to add everytime i post a story manually. PLEASE HELP ME WITH THE HACK.
Thank you in advance
Google News Cant Crawl My Site
Posted on: 11/05/05 02:43pm
By: Dirk
[QUOTE BY= Munazip] Now i need to add the .html automatically in COM_makesid so that i wont have to add everytime i post a story manually.[/QUOTE]
I have to correct myself here: Changing COM_makesid would, as mentioned above, also change the IDs used for links and events. The problem there is that those are too short (20 characters, while an ID with attached ".html" would be up to 23 characters long).
So, plan B is to change it where COM_makesid is called. In admin/story.php:
$A['sid'] = COM_makesid() . '.html';
And in submit.php:
$A['sid'] = COM_makeSid() . '.html';
and a second time:
$A['sid'] = addslashes (COM_makeSid () . '.html');
Note that there are further calls to COM_makeSid in submit.php, but those are for events ($A['eid'] = ...)and links ($A['lid'] = ...) and should be left unchanged.
bye, Dirk
Google News Cant Crawl My Site
Posted on: 11/05/05 03:01pm
By: Dirk
[QUOTE BY= Marites] Dirk if you are interested I can send you the various technical responses google have sent ... but as I say the alterations needed are just not work the effort from GL development point of view.[/QUOTE]
If you have any definitive information about what it is that Google News doesn't like about Geeklog's URLs, I'd like to hear it. At the very least, we could make an FAQ entry for it.
bye, Dirk
Google News Cant Crawl My Site
Posted on: 11/05/05 03:35pm
By: Anonymous (Munazip)
Thanks DIRK, am trying to understand where u say and a second time.
Where do i add that?
Thank you in advance
Google News Cant Crawl My Site
Posted on: 11/05/05 03:37pm
By: Marites
Will put the relevant emails together and mail to you personally early next week.
Regards
- Tess -
[QUOTE BY= Dirk] [QUOTE BY= Marites] Dirk if you are interested I can send you the various technical responses google have sent ... but as I say the alterations needed are just not work the effort from GL development point of view.[/QUOTE]
If you have any definitive information about what it is that Google News doesn't like about Geeklog's URLs, I'd like to hear it. At the very least, we could make an FAQ entry for it.
bye, Dirk[/QUOTE]
Google News Cant Crawl My Site
Posted on: 11/05/05 03:44pm
By: Marites
Munazip ... Earlier this year on a test site we hand added .html extensions to about 50 articles - google still could not read them - I see you have added an extension to your first story on your site it will be interesting to see if news.google say they can read it or not.
- Tess -
[QUOTE BY= Munazip] Thanks DIRK, am trying to understand where u say and a second time.
Where do i add that?
Thank you in advance[/QUOTE]
Google News Cant Crawl My Site
Posted on: 11/05/05 03:46pm
By: Dirk
[QUOTE BY= Munazip] Where do i add that?[/QUOTE]
Just search for "makesid" (or "makeSid") in the two files and you will see.
bye, Dirk
Google News Cant Crawl My Site
Posted on: 11/05/05 03:51pm
By: Anonymous (Munazip)
Thankd Tess and DIRK, have done the necessary amendments but for some reason am failing to see where i can add this
$A['sid'] = addslashes (COM_makeSid () . '.html');
It looks like it works fine without the line above.
And the aother thing is my images now have a .html extension as well, will that affect anything??
Google News Cant Crawl My Site
Posted on: 11/05/05 03:59pm
By: Dirk
[QUOTE BY= Munazip] for some reason am failing to see where i can add this
$A['sid'] = addslashes (COM_makeSid () . '.html');
[/QUOTE]
This should be the second hit for "makeSid" in submit.php (assuming you're on Geeklog 1.3.11sr2 - haven't checked with any older versions). It's called when normal users (not admins) submit stories.
[QUOTE BY= Munazip] And the aother thing is my images now have a .html extension as well, will that affect anything??[/QUOTE]
I have to admit that I didn't think of this, but then again, they should have two extensions now (20051105215730123.html_1.gif etc.), and that should be fine.
bye, Dirk
Google News Cant Crawl My Site
Posted on: 11/05/05 05:00pm
By: Anonymous (ByteEnable)
The google bot can crawl through bad html, but its always best to have good html. You never know what will make the crawler fail. My site
LinuxElectrons[*4] has no problems with news.google.
The only major difference between my site and your site, is your story page still has the story title as a url. It's very hard to believe that the crawler wants a .html extension at the end. If this is true, it must be due to your story page title being a URL, instead of a real story title:
Yours:
My Story
Mine: My Story
Byte
Google News Cant Crawl My Site
Posted on: 11/05/05 05:02pm
By: Anonymous (ByteEnable)
Yours: <h2><a href="/article.php/2343243">My Story</a></h2>
Mine: <h1>My Story</h1>
Google News Cant Crawl My Site
Posted on: 11/05/05 08:55pm
By: Marites
I just did a search I find the uptake news.google of your stories is no larger than ours - I counted one from your front page. As I stated earlier news.google can pick up some items but not all. Putting an extension will make no difference what so ever we set up a test site earlier this year and each url had an html extension and the pickup rate was 2 or 3 from the 50.
May be we are all blaming the URL when maybe it is something else - I have tried one part story to a page and 20 and that makes no difference. I spent weeks with Google trying to find out what was the problem and at no time could 100% of the articles be picked up in a crawl. At Googles suggestion I set up the same test site in Mambo (the paid version) and there was no problem every item was picked up as it was with ArticleLive and others.
Sadly I am not a programmer so I can only stab in the dark at solutions and causes and no doubt all are wrong.
Could there be something between the URL on (we'll call it) page 1 and the link to the full article as I do find the smaller articles which don't require a 'Read More' do get picked up every time.
I reiterate google.com has no such problem and every article is picked up - it sounds crazy but what is the difference between the crawler on google.com and news.google.com.
- Tess -
[QUOTE BY= ByteEnable] The google bot can crawl through bad html, but its always best to have good html. You never know what will make the crawler fail. My site
LinuxElectrons[*4] has no problems with news.google.
The only major difference between my site and your site, is your story page still has the story title as a url. It's very hard to believe that the crawler wants a .html extension at the end. If this is true, it must be due to your story page title being a URL, instead of a real story title:
Yours:
My Story
Mine: My Story
Byte[/QUOTE]
Google News Cant Crawl My Site
Posted on: 11/05/05 09:11pm
By: samstone
Aha, it is alway the big guy that calls the shot!
See, as you said, CNN and BBC have no problem picking up the news. But Google never thought about spanking their bot for not picking up the news faithfully. Isn't it a little arrogant that they would rather try to fix your Geeklog, than look into their own bot?
Well, I feel for your problem, so I am just joining the rant. Sometimes Google deserve a stick.
Sam
Google News Cant Crawl My Site
Posted on: 11/05/05 09:14pm
By: Anonymous (ByteEnable)
Google News only retains news for 30 or 60 days. I can't remember exactly which one. I just did a news.google.com search and got:
Results 1 - 10 of about 333 for linuxelectrons
Search URL.
http://news.google.com/news?hl=en&ned=us&q=linuxelectrons&btnG=Search+News
On Yahoo I get:
NEWS STORIES - Results 1 - 10 of about 62 for linuxelectrons.
I did a search on your domain, and not one Search engine has you listed, except for the one link on www.google.com.
I even did a search on ww.google.com.ph, and there is no links. Its probably has something to do with all that java script breaking the crawler, got to be.
Byte
Google News Cant Crawl My Site
Posted on: 11/05/05 09:21pm
By: Marites
Try this URL
Look here[*5]
I guess I dends on the search criteria.
It's 30 days btw.
Tess
[QUOTE BY= ByteEnable] Google News only retains news for 30 or 60 days. I can't remember exactly which one. I just did a news.google.com search and got:
Results 1 - 10 of about 333 for linuxelectrons
Search URL.
http://news.google.com/news?hl=en&ned=us&q=linuxelectrons&btnG=Search+News
On Yahoo I get:
NEWS STORIES - Results 1 - 10 of about 62 for linuxelectrons.
I did a search on your domain, and not one Search engine has you listed, except for the one link on www.google.com.
I even did a search on ww.google.com.ph, and there is no links. Its probably has something to do with all that java script breaking the crawler, got to be.
Byte[/QUOTE]
Google News Cant Crawl My Site
Posted on: 11/06/05 02:06pm
By: Marites
Whether it is because I am of the opposite sex but everytime I get involved in a thread on this forum I receive disgusting suggestions from an/other member saying that should not be running a web site but a home instead. (Kept polite)
I know who this is as does he ... he has participated in this thread.
For everyones interest including his, Balita is a part-government owned site that has been online for 11 years now and yes we are able to work with people at news.google.
We are not a site that needs to make itself known to every search engine nor are we after accolades. We get that from our visitors.
I participated in this discussion to help a fellow GL'er who was experiencing similar problems to what our site had experienced.
Zimdaily does not carry Java which was suggested as the reason our site was not crawled .... as I pointed out our site was crawled hourly but it was the URL's that could not be captured in the same problem as Zimdaily had.
news.google have provided me with the explanation why our site cannot be crawled and it is not html errors, Java or page sizes I understand the same reason applies to ZimDaily (if he contact me proivately I will explain).
Enjoy your little chauvanist clique - I for one will read but will not participate in this forum again.
A million thanks go to Dirk and all the other polite members of the forum I have enjoyed their company.
Marites
Google News Cant Crawl My Site
Posted on: 11/06/05 02:34pm
By: ByteEnable
[QUOTE BY= Marites] Whether it is because I am of the opposite sex but everytime I get involved in a thread on this forum I receive disgusting suggestions from an/other member saying that should not be running a web site but a home instead. (Kept polite)
I know who this is as does he ... he has participated in this thread.
[/quote]
You should inform Dirk so he can take care of this problem.
I participated in this discussion to help a fellow GL'er who was experiencing similar problems to what our site had experienced.
Hmmm....looked like you were asking for help.
For others
Google says[*6] this:
# Make sure that your TITLE and ALT tags are descriptive and accurate.
# Check for broken links and correct HTML.
# If you decide to use dynamic pages (i.e., the URL contains a "?" character), be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them few.
Enjoy your little chauvanist clique - I for one will read but will not participate in this forum again.
Don't let one bad apple spoil it, your posts brought out discussion which has never before been discussed, which is great!
Google News Cant Crawl My Site
Posted on: 11/06/05 08:18pm
By: ByteEnable
I've been looking at this lately because these posts got me thinking again. For awhile I was diving into SEO technique's. I spent quite a bit of time on research. In the end, trying to be crafty required lots of time monitoring various search engines with very little payback. In one of Google's FAQ's they say don't try to get too crafty, just build a good website and they will come.
At any rate, a previous post in this thread mentioned that Google requires a ".extension" at the end of a URL. I still find this hard to believe, why would such a high tech outfit as google, have their bread and butter algorithm fail because of a .extension. I guess its possible. I'm just in shock and disbelief. However, lightly scanning through some URL's at news.google, most of them have a .extension.
I also found
another reference[*7] about the landing page should not be a URL but a h_tag:
we found the reason why our system may not crawl some your articles because your articles' headlines are still active links on the article page. If it is feasible, we recommend making the article headline inactive on the article page but not on the hubpage.
Google News Cant Crawl My Site
Posted on: 11/06/05 09:35pm
By: ByteEnable
Most search engine friendly URL FAQ's say stay away from ? in the URL. I also found an article thats over four years old that talks about search engine friendly URLs and specifically applies to how GeekLog does its url-rewrite.
There was previously one major drawback to this method. Google, and perhaps other search engines, would not index pages set up in this manner, as they interpreted the URL as being malformed. I contacted a Software Developer at Google and made them aware of the problem and I am happy to announce that it is now fixed.
That was
four years[*8] ago and according to the author was fixed.
So far all the articles I have ran into pretty much do it the way GeekLog does it. Except geeklog does not perform rewrites on topics, such as,
h**p://www.myglsite.com/index.php?topic=Google
I guess its possible that this may throw off a bot.
More on the
subject of url[*9] :
In addition to being more difficult to index, the URLs with the question marks in them do not take advantage of search engine algorithms that give points for search terms found in the actual URL. A URL like http://www.bookstore.com/books/legal_thriller/john_grisham/best_seller.html will receive extra points over a generic URL like http://www.bookstore.com/book.asp?ID=54321 when searching for "John Grisham books" as the keywords searched for are included in the URL.
Byte
Google News Cant Crawl My Site
Posted on: 11/28/05 12:10am
By: Anonymous (Timucin)
Canned answers...yes, I agree with you one of my GL news sites was indexed by google news more than 2 years without having any issue.
They dropped me 2 months ago because the site was not updated for a couple days due to a technical problem..
Now they are telling me they are not able to add it ..They pass the issue to their tecnical guys...
Thanks
Timucin
SEO Company[*10]
[QUOTE BY= samstone] OK, here is my little help with what my little brain can come up with.
I think Google has given you guys a canned answer without investigating specificly what their system can't list you. They list my site with no problem so far. I use basic GL installation without even having url rewrite turned on. The only tweek is removing the hyperlink from the full article's title.
- For both Zimdaily.com and Balita.com, I think the problem could be that you forward your domain root to a sub directory (new2 or html).
- For Balita.com, you would need to remove the hyperlink from the full article title.
Other than that I don't see any problem.
Sam
[/QUOTE]
herehere[*11]
Google News Cant Crawl My Site
Posted on: 02/22/06 09:38am
By: RichardTowler
I've just upgraded to 1.4sr1, from 3.11, and now google news has stopped crawling my site, I used the same code before, to generate the headline url link and name and its the same news template and it looks like its working but google news doesn't pick the news up anymore...any ideas?