Cyber porn and search engines - Report

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

NEWS
First, the porn myth. According to a study conducted by Dr. Steve Lawrence and Dr. C. Lee Giles for the NEC Research Institute, the Web contains about 800 million pages encompassing about 15 terabytes of data and about 180 million images. Contrary to popular opinion that the Web's a haven for porn, though, the study found that only 1.5 percent of Web sites contain pornographic content. "The sex sites were much less than you would have thought," Lawrence said. In fact, the study, which will be published in the July 8 issue of Nature magazine, found that commercial sites have taken over the Web, 83 percent of sites contain commercial content and 6 percent contain scientific/educational content. Lawrence said the study gauged the Web's content by random sample -- the study manually surveyed and categorised the content of 2,500 sites whose IP addresses had been randomly selected. The study's other key finding won't be news to regular search engine or portal users. According to the study, search engine coverage of the Web has decreased substantially since December 1997, with no search engine indexing more than 16 percent of the Web's indexable sites. That means, for surfers navigating the Web via search engines, the Web's 15 terabytes of data is more than ever like an iceberg -- largely submerged. And, for e-commerce sites, not being indexed by the search engines could be the difference between sinking and swimming. "That could have a substantial impact on their economic viability," Lawrence said. "Because the situation now is relatively unequal, in the sense that ... the more well known sites are the ones getting indexed. Lawrence says the reason for decreasing coverage of the Web is simple -- the search engines just can't keep up with the explosive growth in indexable pages -- but, he assures, "that trend is going to reverse." Lawrence explained: "At the moment you have a lot of information out there that's not available on the Web." But, once all that information is available on the Web, the avalanche of indexable information getting posted on the Web will slow, allowing the search engines to catch up. And how long will it take for that information avalanche to ease? Lawrence hasn't done precise calculations, but hazards an educated guess: "10, 20 years." "Engines will be able to improve their coverage over time, but the question is, will they really want to?" Other findings in the study:
  • Search engines are more likely to index sites that have more links to them (more 'popular' sites).
  • They are more likely to index U.S. sites.
  • Search sites are more likely to index commercial sites than educational sites.
  • Indexing of new or modified pages by just one of the major search engines can take months.
Are you surprised at these findings? Are you satisfied with the results your search engine provides? Tell the Mailroom

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

GrueMaster

Nice review and very informative. One thing I'd like to add (in reply to whs001's 1st question), the main reason to have the same interface from...

14 minutes ago by GrueMaster on A tale of two distros: Ubuntu and Linux Mint
Frederick Wrigley

I'be been using Mint 12 since the RC came out, and I am far more happy with the Cinnamon, the Mate, and, yes (with extensions), theGnome 3...

1 hour ago by Frederick Wrigley via Facebook on A tale of two distros: Ubuntu and Linux Mint
bdantas

Excellent article. One small correction, though--although a fresh installation of Linux Mint 12 will, indeed, provide the user with a version of...

2 hours ago by bdantas on A tale of two distros: Ubuntu and Linux Mint
Alan Ralph

In related news, the ISPs club together to get the members of the Home Affairs Select Committee (ya goofed on that part, ZDNet UK) copies of "The...

2 hours ago by Alan Ralph via Facebook on MPs urge ISPs to take down terrorist material
Alan Ralph

In related news, the ISPs club together to get the members of the Home Affairs Select Committee (ya goofed on that part, ZDNet UK) copies of "The...

2 hours ago by Alan Ralph via Facebook on MPs urge ISPs to take down terrorist material
Moley

For Gnome 2 die-hards, it is possible to add icons to the bottom panel (or top top panel, if you prefer) which provide the exact Gnome 2...

3 hours ago by Moley on A tale of two distros: Ubuntu and Linux Mint
ramwellian

Your comments would seem pretty naive and immature. Your 'solution' appears to be, "gee, let's all just give in to the hackers and give them...

3 hours ago by ramwellian on Cloud computing security: no more oxymoron?
BugStalker

"Interesting thought ... If you installed Win7 as a dual boot on a machine that previously only had Linux, and it wrecked your Linux installation,...

4 hours ago by BugStalker on Windows 7 Declares War on GRUB
whs001

This is an excellent summary of Ubuntu and Mint and the interface differences between them. Most such articles take a very partisan position for...

4 hours ago by whs001 on A tale of two distros: Ubuntu and Linux Mint
Moley

@ewallace. Not so clear. Anyone can obtain the text, for example from here http://www.ustr.gov/webfm_send/2379. I support ACTA so long as it and...

4 hours ago by Moley on ACTA: Facts, misconceptions and questions
45283

I think WinRT is fantastic. I just wish it was an option for people that didn't want to go through Microsoft's App Store with its attendant...

7 hours ago by 45283 on Why Windows 8 needs architectural hygiene for WOA
Burn-IT

Nine people? £30m? Who's back pocket is that lot going in? And IF they say it is for new buildings, what about all the ones the government has...

8 hours ago by Burn-IT on Police set to launch three £30m e-crime hubs
ewallace

Just to be clear, nobody knows what is in the text of ACTA, here is a photograph of the text of ACTA http://twitpic.com/8h9iju as submitted to the...

8 hours ago by ewallace on ACTA: Facts, misconceptions and questions
fgvrg56

Unfortunately main issue is that ASUS is refusing to accept that they make some mistake on this version of asus Transformer prime. 1 - GPS sensor...

9 hours ago by fgvrg56 on Asus Eee Pad Transformer Prime Wi-Fi & GPS problems?
Ben Woods

@Marcus A fair question. Just talked with Archos which said it was working on an announcement for next week....

10 hours ago by Ben Woods on Archos confirms G9 Ice Cream Sandwich update schedule
Marcus Karlsson

Any update on this, considering the claimed "first week of February"?

12 hours ago by Marcus Karlsson via Facebook on Archos confirms G9 Ice Cream Sandwich update schedule
apexwm

Bill Goodrich : Just as al_langevin pointed out, with Windows Server 2008 there is no Services for Macintosh anymore. It's gone, not available....

20 hours ago by apexwm on Windows Server 2008 drops the ball for Mac compatibility
txtrainguy

Replying to an old topic that I'm currently facing with my CEO (who is on a Mac). Our servers are primarily Windows Servers, office is about...

1 day ago by txtrainguy on Windows Server 2008 drops the ball for Mac compatibility
k0tcs3

Sure, that makes perfect sense. Pay wrong-doers money and thank them for breaching your security and pointing out your flaws, that would surely...

1 day ago by k0tcs3 on US indicts Romanian over NASA climate change hack
Random_Error

I think he's referring specifically to Android apps, as Apple do regulate their App Store, but Google seem to let any old crap onto the Android store!

1 day ago by Random_Error on RIM: BlackBerry will keep 'garbage' apps out of store