Big data and the big privacy problem

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

About this blog

Coretech

500 words into the future

Unapologetically opinionated views on technology, in the office and out

If privacy is dead (as a number of technology executives in whose interest it is for us not to care about privacy have opined), there wouldn't have been much fuss over the most recent time researchers discovered that iPhones - like pretty much every other phones in the world - track your location and use it to build up maps and traffic information. Often that's where those handy green and red traffic lines on the maps come from; detecting how fast how many people are driving (because digging up the roads to put sensors in is really expensive, compared to scrubbing off the identity from the incoming flood of location data from phones). Dash - now owned by RIM - used to boast that it put the new road on its map the day the Google off-ramp opened in silicon valley rather than having to wait a month to get an update, based on just a handful of drivers using its devices. Better information faster; that's what we all want, so what's the problem?

In the case of the iPhone location trail, the problem is one, that the average user didn't dig through umpteen pages of the ITunes EULA to see that Apple was asking for this information (Microsoft identity architect Kim Cameron has been particularly scathing about the impenetrability of the ITunes EULA on his personal blog for the last year) and two, that leaving the location traces on the phone in a poorly protected file seems like an invitation to snoopers (legal and illegal) to grab information directly from the phone. That's less data mining and more data strip mining… The news that Android and Windows Phone also collect and anonymise location information isn't a surprise (and Microsoft yet again surprises critics by turning out to have the strongest privacy policy - and it hardly matters whether it's poacher turned gamekeeper syndrome or an honest belief that privacy matters or a cynical belief that privacy matters and is a good way to compete). But while location gets a lot of attention because it's such a personal, private thing, there are a lot more sets of data out there that the industry needs to be having a conversation about.

One of them is going on in the US Supreme Court right now in a case about whether data mining companies can sell (anonymised) information they've gathered about the prescription of brand name drugs (instead of cheaper generics) back to the pharmaceutical companies. The Supreme Court is looking at how the case affects commercial free speech and whether the State of Vermont is just trying to stop drug marketing, but the decision could set a precedent for wider issues about turning masses of data into useful information, something that was the key part of the agenda at the Data2.0 conference a few weeks ago.

Thanks to GPU computing, grids and commoditised high-performance computing, we can process in minutes or hours what used to take months and years. That's a huge benefit, in medicine and other areas. A few years ago experts suggested we'd reached peak oil and there were no major new 'elephant' fields to find; most of the elephant fields found since then have come by examining the survey data gathered years ago with faster computing techniques. The recent resurgence of fundamental research in AI and computer vision has been driven by the fact that a cheap graphics card or four can give you the power of a hugely expensive Unix workstation; Nvidia's conferences are split between the hard core gamers and the hard core researchers these days. And Google, Microsoft and - to a lesser extent - Apple are building services based on machine learning driven by huge data sets. Those range from the Internet itself to your smartphone's location history, your search history, hours of voice recordings or pages of handwriting or thousands of scanned books or information from sensors recently nicknamed the Internet of Things. The way Kinect can tell what's your hand, what's your hip and what you're shouting over a game at full volume? The uncannily accurate spelling correction and word prediction on Windows Phone? Google translation of Web pages? The immediate spelling correction in Google Wave? The location information in Google Maps (and that fake village in Lancashire a few miles from where I grew up)? Machine learning.

The technique of taking vast amounts of data and feeding it into a system that uncovered patterns and correlations isn't new (the mathematics that underlie it go back to George Boole in the 1840s); the power to do it quickly, the accessible data sets to feed it and the source of those data sets are. When Microsoft developed the handwriting algorithms for the tablet PC a decade ago, they got handwriting samples from thousands of volunteers who knew what they were for. When you use Gmail you probably do know that your email is being mined to teach Google Ads about language and ideas so it know what ads to show you. When you drive around with a TomTom GPS, you probably didn't know that aggregate traffic patterns and speeds were being sold as a data set that the police in the Netherlands bought and used to set speed cameras on roads that are both dangerous and routinely driven at over the legal speed limit. It's the same issue as Bing using the 'clickstream' of where you go and what you click and ending up replicating a small percentage of (false and deliberately-created) Google results; who has the rights to the information that comes out of the aggregation of data?

And what about when it's not actually that anonymous? Again and again at Data 2.0, companies based on aggregating and selling information talked about what they were doing - and found they were talking about privacy. Visit a Web site that uses the Triggit ad system and it grabs your IP address and looks up what it knows about you - including where the Quova IP geolocation service thinks you are - and decides who you as a user are valuable to. Are you someone Amazon will want to show an ad to? "In about 120 milliseconds," said Triggit CEO Zach Coelius, "real-time in the background there's a marketplace and they bid in the auction to assign this ad to you." Martin Wesley of BrightTag - which handles the tracking tags that help advertisers follow you from site to site in a way that lets Web sites choose what data is collected and which marketing partner gets to see it - sounded a cautionary note with a reference all the way back to 1999 when the DoubleClick ad service bought Abacus (before Google bought DoubleClick) and privacy advocates worried about all that personal data being used to serve ads. "Make sure your privacy policy lines up with what you're doing with the data. Everyone in the industry needs to handle this with care or this will set the industry back."

Is it my data in the first place, someone asked? How do I get my cut? That's hard, said Miten Sampat of Quova, and besides you're already getting rewarded. "The answer is yes, it would be nice if there was a way for consumers to be compensated for opting in giving data to the ecosystem, it's just a hard problem to solve right now. There's so much confusion about what opting out is.. This vision will come from somebody who creates a plugin that fits in a browser that understands all the data I'm sharing. But what you're getting already is free content - that really is what you're getting in exchange."

That could be creepy said Sam Ramji of API aggregator platform Apigee (creepy is the line Google tries to get right up to but not to cross, as Larry Page put it a few months ago). "The challenge is the boundary of personalisation and privacy. It's wonderful to have a personalised web experience, it's even better to have a personalised app experience but I'm worried about what's happening to the data. I haven't found a way to say to Facebook I want you to be able to express this info about me to my app without telling them who I am. That's pretty creepy; I don’t want to live in that world. How can we enrich the database with our data and prevent the creepy factor or privacy violations creeping in as try to create personalised experience?"

Terry Jones of cloud data service Fluidinfo wanted users to be involved; "We as normal people are using apps and they're storing data on our behalf. They shouldn’t have the last world on our data we should be able to add it or edit it…" Interestingly, the Cabinet Office recently launched the Better Choices, Better Deals strategy for mining data about what's a good value service (think information to help you switch electricity provider on steroids for multiple services), promising a new service called ‘mydata’ "which will enable consumers to access, control and use data currently held about them by businesses". And there's a privacy-first social network called Connect.Me launching soon that promises to let you choose who gets to see what information about you (as well as letting you vouch for the identity of people you know personally). The question is whether people care enough - when the latest news scare story has passed - to curate their own data.

Andreas Weigens, Amazon's chief scientist, doesn't think so - and he started his presentation by showing his Stasi record. "It's a myth that people are interested in privacy. Look anywhere you want to look and it's maybe just some politicians interested in privacy. When you give people an opportunity to share, they will share." But not only is it in Amazon's interest to get you to share, that attitude leaves out all the issues about whether people know what they're sharing and with who.

Just a few weeks before the whole smartphone location issue blew up the CEO of location programming service SimpleGeo Jay Adelson told the Data 2.0 conference something that now sounds slightly naive. "In general most of the users who have access [to location permissions in apps] are aware of what's being collected. I'm not sure we’ve really run into abuse of that data."

And Chris Palmer of the Electronic Frontier Foundation said bluntly. "To get to a unique id, if I have your birthday and zip code I know who you are - the end."

Anonymising data, adding value: the boundaries aren't clear so Palmer suggested some useful principles. "The challenge is to make sure data mining doesn't become data strip mining - that we don't burn down the forest to make a lot of money quick but with no long term value. In a lot of business models today, the issue is that the value proposition is vague... Everyone is skimming off that ambiguity; minimising customer surplus and maximising their own. Without trust, without trustworthy behaviour, it's strip mining. If you can't say what you do for a living in one sentence, it's probably illegal. If you can't say to the consumer what it is you do in a way they can understand - maybe you should reconsider what you do."

Mary Branscombe

Talkback

I understood everything except for the issue with Triggit, this location based ad serving existed for such a long time. There are many web-sites like http://ip-address-lookup-v4.com/ that even make this data readily available to everyone free of charge. Does it really violate my privacy if they know that I`m based in Montreal

Yldar Hakimo via Facebook 2 May, 2011 21:27
Reply

It's less a specific issue with Triggit and more the question of the scale of tracking information that's being gathered and - whether or not this can be correlated, which is when real personal information gets revealed - whether this is being responsibly disclosed in a way that means people really understand what information they're sharing. If it's protected, anonymised and professionally handled, it would be a shame for worries about being what's being disclosed to limit services - and if it's not being responsibly handled, then it ought to be...

M

Reply

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

annonymous2

If Piratebay is a crime then so is borrowing a dvd you purchased to a family member or a friend. Why should we not be aloud to share. Most of the...

1 hour ago by annonymous2 on UK ISPs ordered to block Pirate Bay website
NanWag

File Services For Macintosh was causing Excel to prompt for Overwriting changes or Save Another Copy because it was changing the timestamp on the...

2 hours ago by NanWag on Windows Server 2008 drops the ball for Mac compatibility
Regis Machado

creative cloud $48/month in the USA, £48/month in the UK ($79). good for the competitors

4 hours ago by Regis Machado via Facebook on Adobe move promotes piracy
Tom Espiner

Hello KosGirl, Good question. I've asked Belfius for a response. The latest post I can find on Pastebin about it is here:...

4 hours ago by Tom Espiner on Hackers hold bank to ransom over stolen data
KosGirl

Have there been any further updates to this story? I can't find any information on whether the hackers released the data or not.

5 hours ago by KosGirl on Hackers hold bank to ransom over stolen data
SandJ

I have done 7 speed tests this morning on different speed test tools. They tell me my download speed is: 12.3, 12.3, 12.3, 11.1, 12.7, 12.7, 11.7...

6 hours ago by SandJ on Watchdog: TalkTalk's broadband speed test misled users
Jack Schofield

@Mary Microsoft could always send Mozilla a spec sheet and oblige them to meet the same standards as IE. Then Mozilla can spend millions of...

9 hours ago by Jack Schofield on Windows RT browsers and the point of Windows RT
goth1csnake3

Not before time, that people making films,dvd's get whats coming to them. Well done, Virgin Media.

11 hours ago by goth1csnake3 on Virgin Media: Spotify deal will bring down piracy
Simon Bisson and Mary Branscombe

Apex - the question then is what about letting the user choose to have a tablet where they don't have to have that responsibility? why can't the...

21 hours ago by Simon Bisson and Mary Branscombe on Windows RT browsers and the point of Windows RT
Simon Bisson and Mary Branscombe

Moley, Apex, thanks; I think there's an interesting other dimension of choice - the choice to have a platform that is 'locked down' in the sense...

21 hours ago by Simon Bisson and Mary Branscombe on Mozilla accuses Microsoft of shutting Firefox out of WOA
Yellowcave

Not surprised. I once used the methods to let my firewall just notify me of breaches. Not one single logged event was genuine. Once, we all...

1 day ago by Yellowcave on Mobile porn filters catch innocent content, says report
duplex

live realy sucks in facebook becuase people hack your profile

1 day ago by duplex on Irish watchdog: Facebook privacy still falls short
Ed Macnair

If only it was that simple. When you start accessing Cloud applications you are stuck with the security model the vendor provides...........unless...

1 day ago by Ed Macnair via Facebook on IT security? You're doing it wrong!
Phil at Cloud4

Another good updaet, I have enjoyed going on the journey reading this series on SharePoint 2010 and have learned alot. Great writing.

1 day ago by Phil at Cloud4 on Designing a SharePoint farm: Tiers before bedtime
muteen

roumers of an ipad Mini, isnt that just an iTouch!?

1 day ago by muteen on Apple rebrands iPad 4G as 'Wi-Fi + Cellular' for UK
apexwm

Thanks for this article and bringing this issue to light. Unfortunately this type of activity is common not only with Adobe, but many other...

1 day ago by apexwm on Adobe move promotes piracy
Andy Bolstridge

there's a very thin line between tax avoidance and tax efficiency - earning £850 a month and claiming dividends to bring my income up to normal...

1 day ago by Andy Bolstridge via Facebook on The Idle Self-employed
Andy Bolstridge

I see that they are happy to announce these numbers.. but no-one will take any notice until they start announcing sales numbers too.

1 day ago by Andy Bolstridge via Facebook on Microsoft's score card for Smoked by Windows Phone
AndyPagin

I saw a Windows phone about a year ago, haven't seen once since, and quite a few people own phones in the City of London.

1 day ago by AndyPagin on Microsoft's score card for Smoked by Windows Phone
helice041

Well said. You can add the change differences between US $ and Euro for the adobe cloud subscription and the very clouded informations about when...

2 days ago by helice041 on Adobe move promotes piracy

Community highlights

BarryGill

Darth Vader brought his own device...

Blog Post A few weeks ago I wrote a blog piece called "Bring Your Own Delusion (BYOD)"....

16 May, 2012 by BarryGill
Jack Schofield

Mobile phone sales dip while smartphones boom

Blog Post Worldwide sales of mobile phones to end users fell by 2 percent to 419.1...

16 May, 2012 by Jack Schofield
First Take

HTC One V

Blog Post HTC's One range of handsets comprises three models. There's the flagship HTC...

16 May, 2012 by First Take
Simon Bisson and Mary Branscombe

Contribute, contract; endorse? Technology reputations

Blog Post Technology companies need to be careful about who and what they're seen to...

16 May, 2012 by Simon Bisson and Mary Branscombe