Report criticises Google's porn filters

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

NEWS
Children using Google's SafeSearch feature, designed to filter out links to Web sites with adult content, may be shielded from far more than their parents ever intended. A report released this week by the Harvard Law School's Berkman Center for Internet & Society says that SafeSearch excludes many innocuous Web pages from search-result listings, including ones created by the White House, IBM, the American Library Association and clothing company Liz Claiborne. The omissions occur because of the way Google designed the feature, which can be enabled or disabled through a preferences page. The feature uses a proprietary algorithm that automatically analyses the pages and makes an educated guess, without intervention by Google employees. That technique reduces the cost of the SafeSearch service, but it can lead to odd results. It's perhaps unlikely that many humans would have classified a BBC News report on East Timor, Mattel's site about its Scrabble game -- the URL includes the word "adults" -- or the Nashville Public Library's teen health issues page as unsuitable for minors. Some articles from CNET News.com, a sister publication of ZDNet UK, and CNET Software are also invisible to SafeSearch users. "If Google put some of its smart people on this task, they could do a much better job than they have so far," said Ben Edelman, the student fellow at the Berkman Center who performed the research. "They've got a lot of smart people. It would be shocking if their great engineers couldn't do better. The question is whether that's a priority for Google." Google admits that the thousands of innocuous sites listed by the Berkman Center's report are invisible to SafeSearch users. But the company challenged the methodology of the study, saying that some of the sites are missing because their Webmasters employ a device called the "robots.txt" file, which is designed to limit automated Web crawlers in various ways. Such a file might, for example, ask Web crawlers not to visit a certain area of the site because repeated visits would slow down the server considerably. Social etiquette dictates that crawlers should obey a robots.txt file. Google chooses not to include pages that use such files in SafeSearch listings because its crawler can't explore the entire site and thus, the company says, can't be expected to judge the site's content. Edelman said he was unaware of the robots.txt exclusion when he conducted the study, and revised his report on Thursday to include a discussion of the issue. The report was originally released Wednesday. Edelman said only 11.3 percent of the sites listed in his study are filtered because their Webmasters created robots.txt files. Those include sites from IBM, Apple Computer, the City University of New York, Groliers, and the Library of Congress. "It doesn't matter whether SafeSearch omits a site because the site has a robots.txt file or because SafeSearch is imperfect," Edelman said in an interview. "Either way, the site would have been relevant but disappears from results." Some of the thousands of non-pornographic sites without robots.txt files that are filtered include offerings from the Vermont Republican Party, the Stonewall Democrats of Austin, a UK government site on vocational training and the Pittsburgh Coalition Against Pornography. News sites take a hit too, with articles from Fox News, Wired News, The Baton Rouge Daily News and some Web logs affected. Google argues that SafeSearch is designed to err on the side of caution. David Drummond, Google's vice president for business development, said: "The design was meant to be overinclusive. The thinking was that SafeSearch was an opt-in feature. People who turn it on care a lot more about something sneaking through than they do about something getting filtered out." Drummond said that the list of off-limits sites is created "in an automated way" without human intervention. "It looks at keywords, it looks at certain words, the content of the page, the weighting of certain words that are likely to be found on something that's a bad site," Drummond said. An employee becomes involved when Google receives a complaint about a legitimate site that should have been visible or a pornographic one that was, Drummond added. Google is hardly alone in encountering problems when separating the wheat from the chaff on the Internet. In fact, filtering software is so problematic that Edelman, with Harvard professor Jonathan Zittrain, has made something of a career out of documenting overblocking and underblocking flaws in the programs. A federal appeals court relied on that research when deciding that Congress' attempt to force filters on public librarians was unconstitutional. That decision is on appeal to the US Supreme Court. There seem to be few consistent patterns in SafeSearch's overblocking, but one that does appear is that Web pages about Edelman and other Harvard researchers who have written about filtering software's problems are blocked too. "It might be difficult for an AI (artificial intelligence-based) system to figure out that this is a site about regulating pornography on the Internet instead of actual pornography," Edelman said. Google's "SafeSearch Help" carries this disclaimer: "While no filter is 100 percent accurate, Google's filter uses advanced proprietary technology that checks keywords and phrases, URLs and Open Directory categories... Google strives to keep the filtering information as current and comprehensive as possible through continual crawling of the Web and by incorporating updates from user suggestions."
For everything Internet-related, from the latest legal and policy-related news, to domain name updates, see ZDNet UK's Internet News Section. Let the editors know what you think in the Mailroom.

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

JCB33

How dare film makers, artists or anybody that invests in creativity stop us pirating their works for free. I want to be able to walk into my local...

27 minutes ago by JCB33 on ACTA stumbles in Germany
Moley

@GrueMaster. I prefer horses for courses rather than one size fits all. I, and I suspect most other computer users, do not really wish to have...

3 hours ago by Moley on A tale of two distros: Ubuntu and Linux Mint
greycynic

The product that scares me every time I have to use it is the Office 2007 version of Excel. The first bug that I found was applying the median...

3 hours ago by greycynic on Ten flawed products that derail productivity
GrueMaster

Nice review and very informative. One thing I'd like to add (in reply to whs001's 1st question), the main reason to have the same interface from...

4 hours ago by GrueMaster on A tale of two distros: Ubuntu and Linux Mint
Frederick Wrigley

I'be been using Mint 12 since the RC came out, and I am far more happy with the Cinnamon, the Mate, and, yes (with extensions), theGnome 3...

5 hours ago by Frederick Wrigley via Facebook on A tale of two distros: Ubuntu and Linux Mint
bdantas

Excellent article. One small correction, though--although a fresh installation of Linux Mint 12 will, indeed, provide the user with a version of...

6 hours ago by bdantas on A tale of two distros: Ubuntu and Linux Mint
Alan Ralph

In related news, the ISPs club together to get the members of the Home Affairs Select Committee (ya goofed on that part, ZDNet UK) copies of "The...

6 hours ago by Alan Ralph via Facebook on MPs urge ISPs to take down terrorist material
Alan Ralph

In related news, the ISPs club together to get the members of the Home Affairs Select Committee (ya goofed on that part, ZDNet UK) copies of "The...

6 hours ago by Alan Ralph via Facebook on MPs urge ISPs to take down terrorist material
Moley

For Gnome 2 die-hards, it is possible to add icons to the bottom panel (or top top panel, if you prefer) which provide the exact Gnome 2...

7 hours ago by Moley on A tale of two distros: Ubuntu and Linux Mint
ramwellian

Your comments would seem pretty naive and immature. Your 'solution' appears to be, "gee, let's all just give in to the hackers and give them...

7 hours ago by ramwellian on Cloud computing security: no more oxymoron?
BugStalker

"Interesting thought ... If you installed Win7 as a dual boot on a machine that previously only had Linux, and it wrecked your Linux installation,...

7 hours ago by BugStalker on Windows 7 Declares War on GRUB
whs001

This is an excellent summary of Ubuntu and Mint and the interface differences between them. Most such articles take a very partisan position for...

7 hours ago by whs001 on A tale of two distros: Ubuntu and Linux Mint
Moley

@ewallace. Not so clear. Anyone can obtain the text, for example from here http://www.ustr.gov/webfm_send/2379. I support ACTA so long as it and...

8 hours ago by Moley on ACTA: Facts, misconceptions and questions
45283

I think WinRT is fantastic. I just wish it was an option for people that didn't want to go through Microsoft's App Store with its attendant...

11 hours ago by 45283 on Why Windows 8 needs architectural hygiene for WOA
Burn-IT

Nine people? £30m? Who's back pocket is that lot going in? And IF they say it is for new buildings, what about all the ones the government has...

12 hours ago by Burn-IT on Police set to launch three £30m e-crime hubs
ewallace

Just to be clear, nobody knows what is in the text of ACTA, here is a photograph of the text of ACTA http://twitpic.com/8h9iju as submitted to the...

12 hours ago by ewallace on ACTA: Facts, misconceptions and questions
fgvrg56

Unfortunately main issue is that ASUS is refusing to accept that they make some mistake on this version of asus Transformer prime. 1 - GPS sensor...

13 hours ago by fgvrg56 on Asus Eee Pad Transformer Prime Wi-Fi & GPS problems?
Ben Woods

@Marcus A fair question. Just talked with Archos which said it was working on an announcement for next week....

14 hours ago by Ben Woods on Archos confirms G9 Ice Cream Sandwich update schedule
Marcus Karlsson

Any update on this, considering the claimed "first week of February"?

15 hours ago by Marcus Karlsson via Facebook on Archos confirms G9 Ice Cream Sandwich update schedule
apexwm

Bill Goodrich : Just as al_langevin pointed out, with Windows Server 2008 there is no Services for Macintosh anymore. It's gone, not available....

24 hours ago by apexwm on Windows Server 2008 drops the ball for Mac compatibility