Porn outsmarts search filters

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

NEWS
Search companies are increasingly turning to censorware to court G-rated customers such as corporations, schools and parents, but they're still showing too much skin. The shortcomings of porn filters were on display last week when Google launched a test version of a search engine for images with an optional filter for what it terms "inappropriate adult content". Even with the filter turned on, Google is serving a healthy dose of pornographic images, often for keywords with primarily nonsexual meanings. "The filter removes many adult images, but it can't guarantee that all such content will be filtered out," Google acknowledges on its Web site. "There is no way to ensure with 100 percent accuracy that all adult content will be removed from image search results using filters." Google is hardly alone in the uphill battle to filter pornographic and other sensitive images. Technology companies devoted to image recognition acknowledge that the state of the art is still crude, yielding inexact results at the cost of computing power. While technologists struggle to improve their tools, the market for image filtering is the subject of dispute. Google cites the need to protect its "sensitive" users, while search destination AltaVista touts its own filter as indispensable. "A picture says a thousand words, so we want to make sure that the image search is filtered by default," said AltaVista spokeswoman Kristi Kaspar. "We find that quite a few people are using the image search database for school. And what a huge turnoff if we're in an education market with a great product and we couldn't figure out how to provide a family filter." In another demonstration of potential demand for better image-filtering technology, Lycos deemed the available technology so inadequate that the site's parental controls disable multimedia search altogether. Some in the image-recognition business see a burgeoning corporate need to identify what kind of images their employees are downloading, while others extend the technology to e-commerce applications that can recognize a product such as an article of clothing and find similar examples for sale elsewhere. But according to at least one image search provider, actual use has not lived up to perceived demand. "Image filtering is something where we're investing a lot of [research and development] because we think it's going to be an essential feature," said Tom Wilde, vice president of marketing at Fast Search & Transfer, an Oslo, Norway-based company that is the search technology provider for Lycos.com and other Web portals. "But there's a difference between the perception of growing market demand and what's actually happening. At our All The Web portal, 98.6 percent of our visitors are using the image search without the content filter on." Regardless of demand for filtered image searching, several companies are struggling to get a handle on the problem. Google noted that its image filter is still in beta and said engineers are working to improve the product. But company representatives acknowledged that they face a daunting task. "It's a real challenge to do this effectively for a lot of different reasons," said Susan Wojcicki, product manager for Google search. "There is a lot of pornography out there on the Web. If all the porn were in one place, we could cut it out. But it's everywhere. Also, the definition of porn is not very clear." Even with consensus on a pornography definition, technologists have their work cut out for them. Current techniques fall into three categories. The first attempts to filter images by analyzing the text that names and surrounds them on a Web page. This method runs into several problems. For example, many words that belong to the pornographer's lexicon also fall into birder's dictionaries, guides to animal husbandry and hardware catalogs. As a result, text-based analysis turns up a high proportion of both false positives and false negatives, screening out wren tits and wood screws while admitting more salacious content. More problems with the text-based approach accompany foreign-language pornography. For now, the Google filter works only on English-language pages. After text filtering, the second avenue of attack screens out images gleaned from blacklisted Web addresses where pornography is deemed likely to turn up. But pornography has proved a faster target than such lists can catch. "Most of the firewalls have lists of URLs, but porno sites change their URLs regularly," said Bill Armitage, chief executive of Bulldozer Software, a US-based image-indexing and search technology provider that operates the Diggit search engine. "Those lists are always out of date. At any given time they're only 60 to 80 percent accurate. The remaining 40 to 20 percent of the time, you need another filtering mechanism to keep those things from coming in." For that extra layer of protection, many search engines are pinning their hopes on the third and most complex method, which analyzes the image itself for "flesh" tones and body shapes. But this method returns its own share of false negatives -- letting pornography in -- and false positives, blocking more innocuous images. "I'll tell you what slips through -- baby pictures slip through," said JJ Wallia, head of sales and business development for LookThatUp, a Paris-based company with offices in California. "That's a false positive. Babies tend to be showing a lot of skin. This is something the industry has just not been able to get around." Perhaps more damning than the occasional excluded infant is the toll that image analysis exacts on central processing units (CPUs). "The state of the art on image searching is such that there is no surefire pornography detection available," said Fast Search & Transfer's Wilde. "The big search engines have not yet done that because it's not scalable enough to keep up with the growth of the Internet. It's incredibly CPU-intensive to do image processing. We have 70 million images in our index. The image detection software that's available now gets absolutely crushed by that." Wilde estimates that the image recognition industry is between six and 12 months away from providing an adequate product. Even then, he warns, problems will remain. "If you do some sort of flesh detector, what color is flesh?" Wilde asked rhetorically. "It's really that complex. And then what's pornographic? You have different sensitivities, especially internationally. Then there's hate, weapons and violence. It's a really, really difficult problem to solve." Have your say instantly, and see what others have said. Click on the TalkBack button and go to the ZDNet News forum. Let the editors know what you think in the Mailroom. And read what others have said.

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

tinycg

Don't forget to check out apps like GoodReader or SlideShark either, they're indispensible for people on the go in presentation situations. Best...

1 hour ago by tinycg on Four top iPad apps for people on the move
TerryRK

Well it seems there is something a number of us agree on. Why is the Ubuntu Unity launcher so ugly? I thought perhaps it was something to do with...

6 hours ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
Freebies202

Duplicate comments are not made intentionally. Its very good to know that now you are keeping check on this problem because sometimes a commenter...

15 hours ago by Freebies202 on Microsoft fixes blog comments, speeds up blogs with open source
kevinmchapman

"the very significant number of users" and "many (most) of us" - you have no evidence for these statements. It is a fact that most users are saying...

23 hours ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
Marg Menzies Harrison

Another grammar faux pas is the improper use of "you". When sitting down down in a restaurant, for example, I get cringe when the waitress...

1 day ago by Marg Menzies Harrison via Facebook on 10 flagrant grammar mistakes that make you look stupid
zdnetukuser

And NOW, folks, for Canonical's next trick... Kubuntu is late. Here's a pencil. Draw your own conclusions. cf.:...

1 day ago by zdnetukuser on Linux Minterface
Moley

@kevinmchapman. The discussion here reflects the very significant number of users who really do like the traditional menu system and who wish to...

1 day ago by Moley on A tale of two distros: Ubuntu and Linux Mint
kevinmchapman

Er, no... It is an efficient means of finding the application/file/setting you need in one place. The icons are a simply a fallback for when you...

1 day ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
TerryRK

Isn't the provision of a text based search an admission by the developers that the mass of icons approach does not work? I don't need to use a...

1 day ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
kevinmchapman

"Unity and GNOME 3 both abandon the old text-based cascading menus in favour of a graphical icon-driven system." Point truly missed. Both use a...

1 day ago by kevinmchapman on A tale of two distros: Ubuntu and Linux Mint
TerryRK

whs001 - Thank you, I'm glad you liked the article. I absolutely agree with you on your first point. I should perhaps have made it clearer that...

1 day ago by TerryRK on A tale of two distros: Ubuntu and Linux Mint
Dennis Nilsson

If we allow corporate interest to dictate the way our government circumvents due process against foreign entities then we should accept the same...

1 day ago by Dennis Nilsson via Facebook on ACTA stumbles in Germany
GHar123

I totally dislike pirating of works, I fear that artists will be deterred from creating works if they think that they are going to get ripped off....

1 day ago by GHar123 on ACTA stumbles in Germany
JCB33

How dare film makers, artists or anybody that invests in creativity stop us pirating their works for free. I want to be able to walk into my local...

2 days ago by JCB33 on ACTA stumbles in Germany
Moley

@GrueMaster. I prefer horses for courses rather than one size fits all. I, and I suspect most other computer users, do not really wish to have...

2 days ago by Moley on A tale of two distros: Ubuntu and Linux Mint
greycynic

The product that scares me every time I have to use it is the Office 2007 version of Excel. The first bug that I found was applying the median...

2 days ago by greycynic on Ten flawed products that derail productivity
GrueMaster

Nice review and very informative. One thing I'd like to add (in reply to whs001's 1st question), the main reason to have the same interface from...

2 days ago by GrueMaster on A tale of two distros: Ubuntu and Linux Mint
Frederick Wrigley

I'be been using Mint 12 since the RC came out, and I am far more happy with the Cinnamon, the Mate, and, yes (with extensions), theGnome 3...

2 days ago by Frederick Wrigley via Facebook on A tale of two distros: Ubuntu and Linux Mint
bdantas

Excellent article. One small correction, though--although a fresh installation of Linux Mint 12 will, indeed, provide the user with a version of...

2 days ago by bdantas on A tale of two distros: Ubuntu and Linux Mint
Alan Ralph

In related news, the ISPs club together to get the members of the Home Affairs Select Committee (ya goofed on that part, ZDNet UK) copies of "The...

2 days ago by Alan Ralph via Facebook on MPs urge ISPs to take down terrorist material