AOL sorry for search data 'screw-up'

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

Topics

AOL, Data, Research, Privacy

NEWS

AOL apologised on Monday for releasing search log data on subscribers that had been intended for use with the company's newly launched research site.

The randomly selected data, which focused on 658,000 subscribers and posted 10 days ago, was among the tools intended for use on the recently launched AOL Research site. But the Internet giant has since removed the search logs from public view.

"This was a screw-up, and we're angry and upset about it. It was an innocent enough attempt to reach out to the academic community with new research tools, but it was obviously not appropriately vetted, and if it had been, it would have been stopped in an instant," AOL, a unit of Time Warner, said in a statement. "Although there was no personally identifiable data linked to these accounts, we're absolutely not defending this. It was a mistake, and we apologise. We've launched an internal investigation into what happened, and we are taking steps to ensure that this type of thing never happens again."

Although AOL had used identification numbers rather than names or user IDs when listing the search logs, that did not quell concerns of privacy advocates, who said that anyone among the 658,000 could easily be identified based on the searches each individual conducted.

"We think it's a major privacy concern, and we're glad to see AOL is taking it seriously," said Ari Schwartz, deputy director of the Center for Democracy and Technology. "Companies that deal in search results have to understand that they carry very sensitive information, even if it doesn't have what we would traditionally consider to be personally identifiable information involved."

Schwartz and other privacy advocates noted that with bits of information, a "mosaic" could be created that could eventually lead a person to identify the individual in question.

"Sometimes what people are searching for may be an indicator of who they are and who they know," said Richard Smith, founder of Internet security and privacy consulting firm Boston Software Forensics.

In one search log, terms such as "how to tell your family you're a victim of incest", "casey middle school", "surgical help for depression", "can you adopt after a suicide attempt", "Fishman David Dr - 2.6 miles NE - 160 E 34th St, New York, 10016 - (212) 731-5345", "gynecology oncologists in new york city" and "how long will the swelling last after my tummy tuck" appeared in the set of data.

Some researchers, however, contend the information serves a valuable purpose in helping to develop better information retrieval technology.

"Researchers at universities or small companies don't have access to this type of data. I think the (AOL) researchers were trying to do a good thing by making this available to the research community," said Steve Beitzel, who holds a doctoral degree in computer science from the Illinois Institute of Technology with a specialisation in information retrieval. Beitzel, who is an affiliated researcher with the university's Information Retrieval Lab, once served as an intern at AOL, but was not involved with the release of the search log data.

In developing his doctoral thesis, Beitzel used another set of search data from AOL, unrelated to this recent issue, which focused on tracking trends in search query strings.

"It's a hot... research problem that people are trying to solve," he said.

Beitzel noted that the former Excite released a smaller data set of its users' search results in 1999 and 2001, and AltaVista engaged in a similar situation about five or six years ago.

Excite, as well as AltaVista, withheld the user's name and IP address and used an anonymous identifier.

"They released the data sets more than five years ago, and it hasn't hurt anyone," Beitzel said. "The bloggers say what AOL did was evil and a violation of privacy. But this may be an overreaction... a nine-digit number in a search box with no name attached is meaningless."

Kurt Opsahl, a staff attorney for the Electronic Frontier Foundation, pointed to other means to make the information available to the research community without making it open to the public.

"There are ways of conducting research into search technology, without making individuals' search terms public," Opsahl said. "Universities could abide by AOL's privacy laws and various laws for privacy...They could get consent from users before handing out the information to third parties."

While Beitzel agreed other methods could be enacted to aid researchers and the search community, he advised against issuing filters to screen out information such as names or Social Security numbers.

"If you alter the collection, then it is no longer representative," he said.

The release of the search logs runs counter to a court ruling in March, when a federal judge rejected efforts by the Department of Justice to gain access to Google users' search logs. The court, however, determined the Justice Department could have limited access to Google's index of Web sites.

Google was the only search engine to fight the Justice Department, with Yahoo, Microsoft's MSN and AOL turning over their users' search data.

"All search engines collect this kind of user data, and it's valuable to marketers, insurance companies, people involved in divorce and custody battles," said Rebecca Jeschke, a spokeswoman for the EFF. "If this information is available, there is a lot of temptation to release it."

Smith, meanwhile, noted the information AOL provided is similar to the type of search string information the Justice Department sought, under the Children's Online Protection Act.

The search log data, culled from March to May, represents approximately 1.5 percent of AOL's search network in May. The data applied only to US searches by AOL subscribers using the company's client software.

A number of blogs are pointing to mirror sites to let people take a peek at the search logs of AOL users.

CNET News.com's Declan McCullugh contributed to this report.

Talkback

What gives you the right, in reporting the story of AOL's screw-up, to divulge details of one of the search items, being the name address and telephone number of an individual? Did you amend the data in any way to protect the identity of the individual? You may argue that such information is in the public domain, but did you check to see if this was the case? The number could have been delisted.
How about an apology from you?

via Facebook 22 August, 2006 19:13
Reply

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

subhorup

It simultaneously worries me and uplifts me that a self-proclaimed group of internet activists name themselves after Indian mythical figures....

2 hours ago by subhorup on Anonymous activists release PCAnywhere source code
naviathan

It's actually far easier to work anonymously on the internet than you think. With tools like Tor bouncing your traffic around the world before...

5 hours ago by naviathan on Anonymous activists release PCAnywhere source code
Agnostic_OS

1000272134 and bluedalmatian with you both there but then I'm still in 10.04 land (and happy with it)

5 hours ago by Agnostic_OS on Ten factors that make Ubuntu 11.10 a hit
apexwm

Interesting article and definitely see your points on the products mentioned. One of the top products for our Help Desk (approximately 20% of all...

13 hours ago by apexwm on Ten flawed products that derail productivity
Paul Hutchinson

Absolutely - this should obviously not be handled my isp - but handled by their hosting operator. What's been suggested here is that my isp police...

13 hours ago by Paul Hutchinson via Facebook on MPs urge ISPs to take down terrorist material
Techs UK

Looks like a great phone. I don't notice any deficiencies in WP7. used IOS before, that's pretty good. I don't spend much time in Apps, all i need...

16 hours ago by Techs UK on Nokia pins US 're-entry' hopes on Lumia 900
Larry Bloggy

Now with the help of these apps you are always synced with MS outlook while on the move. Just download apps like xobni or outlookreflex and get...

17 hours ago by Larry Bloggy via Facebook on Outlook Social Connector beta 2 and the LinkedIn connector
mike40g123

Your details are wrong. The version currently being made is the one with 2 USB ports, 256MB RAM and a network port. This is the Model B. The...

18 hours ago by mike40g123 on Raspberry Pi boards set to go on sale
Moley

The thing that has been puzzling me for quite a while is how Anonymous can remain anonymous whilst not only being active on the Internet but also...

1 day ago by Moley on Anonymous activists release PCAnywhere source code
Don Dilly

If what Semantec is saying is rue, that is even worse and shows a complete disregard for thier users. If what Anonymous claims is true and the...

1 day ago by Don Dilly via Facebook on Anonymous activists release PCAnywhere source code
MattChurchy

Didn't seem particularly biased to me either. Oh though you might have mentioned some other competitors with free search and email services...

2 days ago by MattChurchy on Time for an evil umpire: Google, Microsoft & privacy
Simon Bisson and Mary Branscombe

James - exactly as much as anyone paid you for your comment; I don't feel that I need to say that I'm independant and unbiased, but just for you...

2 days ago by Simon Bisson and Mary Branscombe on Time for an evil umpire: Google, Microsoft & privacy
Carl White

Once they realise symantec are willing to pay real money, they will simply keep extorting, unless of course symantec/authorities can use the...

2 days ago by Carl White via Facebook on Symantec offered hackers $50k in source code sting
Jonathan Hassell

You can find more information on BS 8878 by Jonathan Hassell its lead-author at http://www.hassellinclusion.com/bs8878/ The page includes a...

2 days ago by Jonathan Hassell on BSI publishes first British web accessibility standard
servermanagement

Thanks for this list. Now I know, what to include on my system to make it more functional.

2 days ago by servermanagement on Ten flawed products that derail productivity
1000092626

What if it's a 4 car household? The point is, more bandwidth = more things you can do simultaneously, like streaming HD video in one room of the...

2 days ago by 1000092626 on Virgin Media beats 100Mbps schedule, hikes prices
Gary Burton

No point whatsoever increasing broadband download speed. unless ever server on the net has access to massively up rated throughput. The worlds...

2 days ago by Gary Burton via Facebook on Virgin Media beats 100Mbps schedule, hikes prices
Random_Error

They're also increasing their TV package prices, whether to help fund this or not.

2 days ago by Random_Error on Virgin Media beats 100Mbps schedule, hikes prices
Techs UK

How can you set it up wrong to intermittently connect? Should I be asking for more pay? Outlook/Exchange is a breeze.

2 days ago by Techs UK on Ten flawed products that derail productivity
JamesCheese

And how much did Microsoft pay you for that article?

3 days ago by JamesCheese on Time for an evil umpire: Google, Microsoft & privacy