ZDNet UK


Skip to Main Content

ZDNet.co.uk - Winner of Best Business Website 2007
  1. Home
  2. News
  3. Blogs
  4. Reviews
  5. Prices
  6. Resources
  7. Community
  8. My ZDNet

 

ZDNet UK RSS Feeds


IT Jobs

Emerging tech Toolkit

Porn outsmarts search filters

Paul Festa CNET News.com CNet

Published: 02 Jul 2001 14:11 BST

  • Email
  • Trackback
  • Clip Link
  • Print friendly
  • Post Comment

Search companies are increasingly turning to censorware to court G-rated customers such as corporations, schools and parents, but they're still showing too much skin.

The shortcomings of porn filters were on display last week when Google launched a test version of a search engine for images with an optional filter for what it terms "inappropriate adult content". Even with the filter turned on, Google is serving a healthy dose of pornographic images, often for keywords with primarily nonsexual meanings.

"The filter removes many adult images, but it can't guarantee that all such content will be filtered out," Google acknowledges on its Web site. "There is no way to ensure with 100 percent accuracy that all adult content will be removed from image search results using filters."

Google is hardly alone in the uphill battle to filter pornographic and other sensitive images. Technology companies devoted to image recognition acknowledge that the state of the art is still crude, yielding inexact results at the cost of computing power.

While technologists struggle to improve their tools, the market for image filtering is the subject of dispute. Google cites the need to protect its "sensitive" users, while search destination AltaVista touts its own filter as indispensable.

"A picture says a thousand words, so we want to make sure that the image search is filtered by default," said AltaVista spokeswoman Kristi Kaspar. "We find that quite a few people are using the image search database for school. And what a huge turnoff if we're in an education market with a great product and we couldn't figure out how to provide a family filter."

In another demonstration of potential demand for better image-filtering technology, Lycos deemed the available technology so inadequate that the site's parental controls disable multimedia search altogether.

Some in the image-recognition business see a burgeoning corporate need to identify what kind of images their employees are downloading, while others extend the technology to e-commerce applications that can recognize a product such as an article of clothing and find similar examples for sale elsewhere.

But according to at least one image search provider, actual use has not lived up to perceived demand.

"Image filtering is something where we're investing a lot of [research and development] because we think it's going to be an essential feature," said Tom Wilde, vice president of marketing at Fast Search & Transfer, an Oslo, Norway-based company that is the search technology provider for Lycos.com and other Web portals.

"But there's a difference between the perception of growing market demand and what's actually happening. At our All The Web portal, 98.6 percent of our visitors are using the image search without the content filter on."

Regardless of demand for filtered image searching, several companies are struggling to get a handle on the problem.

Google noted that its image filter is still in beta and said engineers are working to improve the product. But company representatives acknowledged that they face a daunting task.

"It's a real challenge to do this effectively for a lot of different reasons," said Susan Wojcicki, product manager for Google search. "There is a lot of pornography out there on the Web. If all the porn were in one place, we could cut it out. But it's everywhere. Also, the definition of porn is not very clear."

Even with consensus on a pornography definition, technologists have their work cut out for them. Current techniques fall into three categories. The first attempts to filter images by analyzing the text that names and surrounds them on a Web page.

This method runs into several problems. For example, many words that belong to the pornographer's lexicon also fall into birder's dictionaries, guides to animal husbandry and hardware catalogs. As a result, text-based analysis turns up a high proportion of both false positives and false negatives, screening out wren tits and wood screws while admitting more salacious content.

More problems with the text-based approach accompany foreign-language pornography. For now, the Google filter works only on English-language pages.

After text filtering, the second avenue of attack screens out images gleaned from blacklisted Web addresses where pornography is deemed likely to turn up.

But pornography has proved a faster target than such lists can catch.

"Most of the firewalls have lists of URLs, but porno sites change their URLs regularly," said Bill Armitage, chief executive of Bulldozer Software, a US-based image-indexing and search technology provider that operates the Diggit search engine. "Those lists are always out of date. At any given time they're only 60 to 80 percent accurate. The remaining 40 to 20 percent of the time, you need another filtering mechanism to keep those things from coming in."

For that extra layer of protection, many search engines are pinning their hopes on the third and most complex method, which analyzes the image itself for "flesh" tones and body shapes. But this method returns its own share of false negatives -- letting pornography in -- and false positives, blocking more innocuous images.

"I'll tell you what slips through -- baby pictures slip through," said JJ Wallia, head of sales and business development for LookThatUp, a Paris-based company with offices in California. "That's a false positive. Babies tend to be showing a lot of skin. This is something the industry has just not been able to get around."

Perhaps more damning than the occasional excluded infant is the toll that image analysis exacts on central processing units (CPUs).

"The state of the art on image searching is such that there is no surefire pornography detection available," said Fast Search & Transfer's Wilde. "The big search engines have not yet done that because it's not scalable enough to keep up with the growth of the Internet. It's incredibly CPU-intensive to do image processing. We have 70 million images in our index. The image detection software that's available now gets absolutely crushed by that."

Wilde estimates that the image recognition industry is between six and 12 months away from providing an adequate product.

Even then, he warns, problems will remain.

"If you do some sort of flesh detector, what color is flesh?" Wilde asked rhetorically. "It's really that complex. And then what's pornographic? You have different sensitivities, especially internationally. Then there's hate, weapons and violence. It's a really, really difficult problem to solve."

Have your say instantly, and see what others have said. Click on the TalkBack button and go to the ZDNet News forum.

Let the editors know what you think in the Mailroom. And read what others have said.

  • Email
  • Trackback
  • Clip Link
  • Print friendly Print with Kyocera

Did you find this article useful?
16 out of 29 people found this useful


Full Talkback thread

0 comments

Company/Topic Alerts

Create a new alert from the list below:






Discussions

1000030281 1000030281

Facebook Bans Firefox 3

Sunday 20 July 2008, 2:33 AM

1 comment
roger andre roger andre

SP3 Under Suspicion Again

Saturday 19 July 2008, 9:29 PM

2 comments

Blog Posts

Avatar roger andre

Facebook Bans Firefox 3

Saturday 19 July 2008, 7:54 PM

1 comment
Avatar geek

Windows Vista

Friday 18 July 2008, 7:58 PM

0 comments

Featured Talkback

While full medical records may be of (dubious) value at rear/base medical facilities, these could be provided much simpler by either physical disk or electronic transfer to an "in theatre" database for individuals posted in. That £80m (and it's associated running costs) could have been far better employed in resuscitating a disbanded infantry battalion or providing a big boost in equipment quality and quantity.

By: 1000215420

Read full story:
Photos: MoD unveils £80m IT health programme