Search engine crawlers dig up way too much

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

NEWS
Search engines spiders crawling the Web are increasingly stumbling upon passwords, credit card numbers, classified documents and even computer vulnerabilities that can be exploited by hackers. The problem is not new, security analysts say: ever since search robots began indexing the Web years ago, Web site administrators have found pages not meant for public consumption exposed in search results. But a new tool built into the Google search engine to find a variety of file types in addition to traditional Web documents is highlighting and in some cases exacerbating the problem. With Google's new file-type search tool, a wide array of files formerly overlooked by basic search engine queries are now just a few clicks from the average surfer -- or the novice hacker. The files include Adobe PostScript; Lotus 1-2-3 and WordPro; MacWrite; Microsoft Excel, PowerPoint, Word, Works and Write; and the Rich Text Format. "The overall problem is worse than it was in the early days, when you could do AltaVista searches on the word password and up come hundreds of password files," said Christopher Klaus, founder and chief technology officer of Internet Security Systems , a provider of information-security systems. "What's happening with search engines like Google adding this functionality is that there are a lot more targets to go after." Since Google's new tool launched earlier this month, surprised Web site owners have been busy pulling down or securing sensitive pages that have turned up in Google results. Google disavows responsibility for the security problem. But at the same time, the company has begun devising ways to catch sensitive pages before they wind up exposed to public view. "Our specialty is discovering, crawling and indexing publicly available information," said Google spokesman David Krane. "We define public as anything placed on the public Internet and not blocked to search engines in any way. The primary burden falls to the people who are incorrectly exposing this information. But at the same time, we're certainly aware of the problem, and our development team is exploring different solutions behind the scenes." Viral threats In addition to giving malicious hackers a handy tool for scouting out sensitive information or vulnerable computers, Google's file-type search could pose a risk to searchers who click on file types that are more susceptible than Web pages to viruses and other hostile code. "The security issue was a top thing I thought of when the new types were released," Danny Sullivan, editor of SearchEngineWatch.com, wrote in an e-mail interview. "It's great to have the additional coverage, but people might not realize when they click on a link that they could expose themselves to viruses. It's not something we've encountered with search engines before because HTML files are pretty safe," though JavaScript can be used in some exploits. Google searchers concerned about viral threats have the option of selecting the "View HTML" version of non-HTML file types. Search engines already go to some pains not to crawl where they are unwelcome. Web site administrators can add to their pages a simple "robots.txt" file that will turn the crawling bots away. Google also maintains a site for Webmasters giving them several options for curtailing or turning away search crawlers. But the consent-based option has its share of loopholes. Asking Web crawlers not to index a page does not make it inaccessible to the outside world. A robots.txt file can only succeed in turning away compliant search bots, leaving the door wide open to malicious crawlers. In addition, the robots.txt "keep out" sign could serve as an advertisement to hackers that valuable or sensitive information lies behind it. Security analysts concerned about the use of search engines for bad ends point to two problems. One is the exposure of sensitive, unsecured information such as passwords and credit card numbers. The second is the use of search engines to find Web sites running programs, such as CGI (common gateway interface), with known vulnerabilities. Hackers find a way Still, analysts are quick to say that even without Google and its peers, hackers have tools at their disposal for crawling the Web. Recent Internet worms such as Code Red and Nimda prove that massive, automated hacking exploits have no need of search engines to find vulnerable computers. "Intruders have their own search engines that bypass the robot-ignore feature and would still find the same sensitive documents with passwords or known flawed CGI script or what have you," said Internet Security Systems' Klaus. "And a robots.txt file could be a flag for intruders to say, this must be interesting if robots are being told not to look at it. "The underlying issue is that the infrastructure of all these Web sites aren't protected." Webmasters queried about the search engine problem said precautions against overzealous search bots are of fundamental concern. "Webmasters should know how to protect their files before they even start writing a Web site," wrote James Reno, chief executive of Amelia, Ohio-based ByteHosting Internet Services . "Standard Apache Password Protection handles most of the search engine problems -- search engines can't crack it. Pretty much all that it does is use standard HTTP/1.0 Basic Authentication and checks the username based on the password stored in a MySQL Database." But other critics said Google bears its share of the blame. "We have a problem, and that is that people don't design software to behave itself," said Gary McGraw, chief technology officer of software risk-management company Cigital, and author of a new book on writing secure software. "The guys at Google thought, 'How cool that we can offer this to our users' without thinking about security. If you want to do this right, you have to think about security from the beginning and have a very solid approach to software design and software development that is based on what bad guys might possibly do to cause your program grief." For all security-related news, including updates on the latest viruses, hacking exploits and patches, check out ZDNet UK's Viruses and Hacking News Section. Have your say instantly, and see what others have said. Click on the TalkBack button and go to the Security forum. Let the editors know what you think in the Mailroom. And read other letters.

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

TerryRK

Well it seems there is something a number of us agree on. Why is the Ubuntu Unity launcher so ugly? I thought perhaps it was something to do with...

4 hours ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
Freebies202

Duplicate comments are not made intentionally. Its very good to know that now you are keeping check on this problem because sometimes a commenter...

13 hours ago by Freebies202 on Microsoft fixes blog comments, speeds up blogs with open source
kevinmchapman

"the very significant number of users" and "many (most) of us" - you have no evidence for these statements. It is a fact that most users are saying...

21 hours ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
Marg Menzies Harrison

Another grammar faux pas is the improper use of "you". When sitting down down in a restaurant, for example, I get cringe when the waitress...

23 hours ago by Marg Menzies Harrison via Facebook on 10 flagrant grammar mistakes that make you look stupid
zdnetukuser

And NOW, folks, for Canonical's next trick... Kubuntu is late. Here's a pencil. Draw your own conclusions. cf.:...

23 hours ago by zdnetukuser on Linux Minterface
Moley

@kevinmchapman. The discussion here reflects the very significant number of users who really do like the traditional menu system and who wish to...

1 day ago by Moley on A tale of two distros: Ubuntu and Linux Mint
kevinmchapman

Er, no... It is an efficient means of finding the application/file/setting you need in one place. The icons are a simply a fallback for when you...

1 day ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
TerryRK

Isn't the provision of a text based search an admission by the developers that the mass of icons approach does not work? I don't need to use a...

1 day ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
kevinmchapman

"Unity and GNOME 3 both abandon the old text-based cascading menus in favour of a graphical icon-driven system." Point truly missed. Both use a...

1 day ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
TerryRK

whs001 - Thank you, I'm glad you liked the article. I absolutely agree with you on your first point. I should perhaps have made it clearer that...

1 day ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
Dennis Nilsson

If we allow corporate interest to dictate the way our government circumvents due process against foreign entities then we should accept the same...

1 day ago by Dennis Nilsson via Facebook on ACTA stumbles in Germany
GHar123

I totally dislike pirating of works, I fear that artists will be deterred from creating works if they think that they are going to get ripped off....

1 day ago by GHar123 on ACTA stumbles in Germany
JCB33

How dare film makers, artists or anybody that invests in creativity stop us pirating their works for free. I want to be able to walk into my local...

2 days ago by JCB33 on ACTA stumbles in Germany
Moley

@GrueMaster. I prefer horses for courses rather than one size fits all. I, and I suspect most other computer users, do not really wish to have...

2 days ago by Moley on A tale of two distros: Ubuntu and Linux Mint
greycynic

The product that scares me every time I have to use it is the Office 2007 version of Excel. The first bug that I found was applying the median...

2 days ago by greycynic on Ten flawed products that derail productivity
GrueMaster

Nice review and very informative. One thing I'd like to add (in reply to whs001's 1st question), the main reason to have the same interface from...

2 days ago by GrueMaster on A tale of two distros: Ubuntu and Linux Mint
Frederick Wrigley

I'be been using Mint 12 since the RC came out, and I am far more happy with the Cinnamon, the Mate, and, yes (with extensions), theGnome 3...

2 days ago by Frederick Wrigley via Facebook on A tale of two distros: Ubuntu and Linux Mint
bdantas

Excellent article. One small correction, though--although a fresh installation of Linux Mint 12 will, indeed, provide the user with a version of...

2 days ago by bdantas on A tale of two distros: Ubuntu and Linux Mint
Alan Ralph

In related news, the ISPs club together to get the members of the Home Affairs Select Committee (ya goofed on that part, ZDNet UK) copies of "The...

2 days ago by Alan Ralph via Facebook on MPs urge ISPs to take down terrorist material
Alan Ralph

In related news, the ISPs club together to get the members of the Home Affairs Select Committee (ya goofed on that part, ZDNet UK) copies of "The...

2 days ago by Alan Ralph via Facebook on MPs urge ISPs to take down terrorist material