Clicks: where did you think the data was coming from?

Daily Newsletters

Sign up to ZDNet UK's daily newsletter.

About this blog

Coretech

500 words into the future

Unapologetically opinionated views on technology, in the office and out

The furore over whether Bing should have surgically excised the clickstream from Google out of the results they get about what IE and Bing Bar users do online (and that link is an interesting perspective from a Bing developer) - or perhaps have engineered in some checking on results going into the index based on only a single clickstream 'signal' -raises some interesting points about what counts as fair competition, what counts as unfair behaviour and how machine learning is dominating development (at Microsoft, Google and elsewhere).

If Bing shouldn't be able to use information about what users do on the Google site (which might seem a reasonable prohibition at first thought), should they be able to use the results of your pattern of behaviour on Blekko, or on Ask or the BBC or on Amazon? Bing and Google both know things about where some products are on the Amazon site that the Amazon search engine doesn't, as I've found when checking the Amazon UK selling price for some devices - a Google or Bing search for the product name plus the keyword phrase 'Amazon UK' will give you the Amazon UK page for some products the search box on Amazon UK just can't find. I find that really useful.

We give our social graph to Facebook and LinkedIn and Twitter for free, in exchange for a convenient place to have conversations and they use it themselves to develop new services - and sometimes they share it or sell it. Twitter thinks the 'firehose' of tweets in transit is both public and something it's happy to share; any search engine can negotiate for or buy access to use it as a way of understanding links and content and semantics and anything else they think they can learn. But if you restrict search engines and services to only getting information from sites they've negotiated an agreement with, searching Amazon and Twitter and the BBC from Bing might carry on improving, but a million smaller sites would fall by the wayside.

And the precedent of saying that a site can keep the anonymised but publicly available details of how users use the site for only their own information and development has other implications for all sorts of services we've come to expect to be freely available. How much of the metadata of the way a site is built and used should be proprietary? There's no benefit to a hotel site that has a French version of its English pages in having Google scrape the parallel pages to use as the basis of its machine learning-based translation tools between French and English; indeed it might help a competitive hotel that hasn't paid to do translations into French because it knows visitors can use the Google service. But then that French/English hotel site probably isn't complaining because the translation service lets Italian visitors read a machine translation of their service and the Italian/English sites the service has learned from get the benefit of machine translations into Greek and so on…

The 'captcha's you have to type in to comment on blogs like this? A lot of them are used to check the OCR'd manuscripts going into Google Books. Google learns from scanning your email in Gmail, and the click patterns on all the sites that use Google Analytics; it's more likely to be using that kind of information to place ads on pages than to place results on search pages, but it's still using it.

Just like Microsoft has been using telemetry for years (probably one reason the Bing team sounds taken aback to be called on using telemetry; it's practically a religion at Microsoft). Microsoft has been using what it learns from the anonymous, opt-in Customer Experience Improvement Program (CEIP) about which commands Office users click the most (Paste, Save and Copy are the most common in Word, followed by Undo - and Paste is so frequently followed by Undo in action that the Paste Options popup was designed to stop you having to undo).

Some major features in Office have come out of telemetry, according to Steven Sinofsky (who used to run the Office team) - as well as more minor changes. "We learned that a very significant amount of time the first suggestion in the spelling dictionary was the right correction (hence autocorrect). We learned that no one ever read the tip of the day (“Don’t run with scissors”)." Lots of applications have autocorrection now, from OpenOffice to Google Docs; if any of them were inspired by the Office team's discovery that spell checking was good enough to be useful, that's your clickstream data out there having an influence.

Telemetry quite literally made a lot of Windows 7 what it is, as Sinofsky explained repeatedly and publicly in the Engineering 7 blog - and summed up nicely at PDC 2009. "Anytime you plug a device into Windows, we can have the opportunity to get diagnostics to learn what device is plugged in, what drivers were loaded, did the drivers come from you or from a local machine, 32- or 64-bit, was the installation of those successful? ...Another element of telemetry is what we call the software quality monitor... SQM is our way of understanding what features of Windows or any software that Microsoft makes are you using. What are the buttons you're clicking on, are you using keyboard accelerators or the sequence of events. Well, with all of these telemetry items, they'll all respect your privacy, they're all voluntary, and they're all opt-in... But it turns out that over 80% of our customers voluntarily opt-in to sending us this information." The 100 million SQM sessions a month beta users generated had a huge impact and Windows 7 is Microsoft's most popular OS in a very long time in large part because it does make it easier to do things the way people actually work - and because the telemetry gathered from beta users let the Windows team do things like working on performance until users on PCs out in the world were seeing the Start menu open in the target 50-100 milliseconds.

For IE 9, as Dean Hachamovitch put it last year, "We use many, many data sources from customer to inform what we build and how we build it. The Connect database is one of several sources; we have a SQM database, telemetry, error reporting - all these different sources of data. When you connect data from hundreds of millions of users around what they actually do and how actually do it that is extremely powerful." It's not just finding which APIs Web sites use that IE 8 didn't support and adding them, or looking at sites to see what browser subsystems need to speed up to make them load faster. IE 9 also takes the anonymised list of what executable files are being downloaded through the browser and uses them as part of building a reputation service for applications so you only see warnings for downloads that are likely to be dangerous.

Similarly, the specific malware that the Malicious Software Removal Tool looks for each month is based on what's been showing up in the telemetry from systems running Windows Defender and Security Essentials. Microsoft shares that information with the security vendors in the Microsoft Virus Information Alliance but many security vendors gather their own telemetry. Zone Alarm is free because it gathers information about security problems that make Checkpoint more valuable to paying customers. The spam button you click in your email package doesn't just get the message out of your inbox; it can end up marking the sender as a spammer on a blocklist that I can use on my mail server.

Got your iPhone phone turned on as you walk around? You're probably sending location and cellular network data back to Apple; "We may collect information such as occupation, language, zip code, area code, unique device identifier, location, and the time zone where an Apple product is used so that we can better understand customer behaviour and improve our products, services, and advertising," it says. Nokia phones and BlackBerrys collect anonymised location information to use in Ovi Maps and RIM's ETA service respectively; Vodafone takes the aggregated movements of phone users and turns them into data about traffic speeds. (There's on-going debate about whether Google gathers too much information from phones and StreetView cars when it uses your location for Google Maps on your phone.) Treating phones as 'sensors' like weather stations is the basis for what's been called the Internet of Things; that's an extension of the fact that looking at what people in general do is a good way of understanding both what people in general want and what's going on in the world.

Collectively gathering, anonymising and learning from user behaviour is what many of the technology tools we use every day are built on. The assumption is that we own our online behaviour, not that the sites that we visit own it. We want it to be anonymised and not to breach our privacy, but we give it to vendors for free - knowingly or unknowingly depending on whether we've bothered to read the licence agreements and the privacy agreements - and in return we get a service (or sometimes the opportunity to buy a service, which doesn't feel as fair). The very best thing that could come out of this heated discussion - along with the on-going discussion of Web search quality that it disrupted - would be a wider awareness of the information users contribute to technology companies and some discussion about who gets to control that and whether we want to give up the services that use machine learning to extract information from that data (because while it's valuable data, it's only data - not information or knowledge).

Right now anything that anyone can do online is up for tracking as long as you protect their privacy. If that rule is going to change, it has to change for everyone.

Mary Branscombe

Post your comment

In order to post a comment you need to be registered and logged in.

You can also log in with Facebook. Log in or create your ZDNet UK account below

  • Login

Will not be displayed with your comment

By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ

Get ZDNet UK's daily newsletter

Enter your email address to sign up

ZDNet UK Live

annonymous2

If Piratebay is a crime then so is borrowing a dvd you purchased to a family member or a friend. Why should we not be aloud to share. Most of the...

1 hour ago by annonymous2 on UK ISPs ordered to block Pirate Bay website
NanWag

File Services For Macintosh was causing Excel to prompt for Overwriting changes or Save Another Copy because it was changing the timestamp on the...

2 hours ago by NanWag on Windows Server 2008 drops the ball for Mac compatibility
Regis Machado

creative cloud $48/month in the USA, £48/month in the UK ($79). good for the competitors

4 hours ago by Regis Machado via Facebook on Adobe move promotes piracy
Tom Espiner

Hello KosGirl, Good question. I've asked Belfius for a response. The latest post I can find on Pastebin about it is here:...

4 hours ago by Tom Espiner on Hackers hold bank to ransom over stolen data
KosGirl

Have there been any further updates to this story? I can't find any information on whether the hackers released the data or not.

5 hours ago by KosGirl on Hackers hold bank to ransom over stolen data
SandJ

I have done 7 speed tests this morning on different speed test tools. They tell me my download speed is: 12.3, 12.3, 12.3, 11.1, 12.7, 12.7, 11.7...

6 hours ago by SandJ on Watchdog: TalkTalk's broadband speed test misled users
Jack Schofield

@Mary Microsoft could always send Mozilla a spec sheet and oblige them to meet the same standards as IE. Then Mozilla can spend millions of...

9 hours ago by Jack Schofield on Windows RT browsers and the point of Windows RT
goth1csnake3

Not before time, that people making films,dvd's get whats coming to them. Well done, Virgin Media.

11 hours ago by goth1csnake3 on Virgin Media: Spotify deal will bring down piracy
Simon Bisson and Mary Branscombe

Apex - the question then is what about letting the user choose to have a tablet where they don't have to have that responsibility? why can't the...

21 hours ago by Simon Bisson and Mary Branscombe on Windows RT browsers and the point of Windows RT
Simon Bisson and Mary Branscombe

Moley, Apex, thanks; I think there's an interesting other dimension of choice - the choice to have a platform that is 'locked down' in the sense...

21 hours ago by Simon Bisson and Mary Branscombe on Mozilla accuses Microsoft of shutting Firefox out of WOA
Yellowcave

Not surprised. I once used the methods to let my firewall just notify me of breaches. Not one single logged event was genuine. Once, we all...

1 day ago by Yellowcave on Mobile porn filters catch innocent content, says report
duplex

live realy sucks in facebook becuase people hack your profile

1 day ago by duplex on Irish watchdog: Facebook privacy still falls short
Ed Macnair

If only it was that simple. When you start accessing Cloud applications you are stuck with the security model the vendor provides...........unless...

1 day ago by Ed Macnair via Facebook on IT security? You're doing it wrong!
Phil at Cloud4

Another good updaet, I have enjoyed going on the journey reading this series on SharePoint 2010 and have learned alot. Great writing.

1 day ago by Phil at Cloud4 on Designing a SharePoint farm: Tiers before bedtime
muteen

roumers of an ipad Mini, isnt that just an iTouch!?

1 day ago by muteen on Apple rebrands iPad 4G as 'Wi-Fi + Cellular' for UK
apexwm

Thanks for this article and bringing this issue to light. Unfortunately this type of activity is common not only with Adobe, but many other...

1 day ago by apexwm on Adobe move promotes piracy
Andy Bolstridge

there's a very thin line between tax avoidance and tax efficiency - earning £850 a month and claiming dividends to bring my income up to normal...

1 day ago by Andy Bolstridge via Facebook on The Idle Self-employed
Andy Bolstridge

I see that they are happy to announce these numbers.. but no-one will take any notice until they start announcing sales numbers too.

1 day ago by Andy Bolstridge via Facebook on Microsoft's score card for Smoked by Windows Phone
AndyPagin

I saw a Windows phone about a year ago, haven't seen once since, and quite a few people own phones in the City of London.

1 day ago by AndyPagin on Microsoft's score card for Smoked by Windows Phone
helice041

Well said. You can add the change differences between US $ and Euro for the adobe cloud subscription and the very clouded informations about when...

2 days ago by helice041 on Adobe move promotes piracy

Community highlights

BarryGill

Darth Vader brought his own device...

Blog Post A few weeks ago I wrote a blog piece called "Bring Your Own Delusion (BYOD)"....

16 May, 2012 by BarryGill
Jack Schofield

Mobile phone sales dip while smartphones boom

Blog Post Worldwide sales of mobile phones to end users fell by 2 percent to 419.1...

16 May, 2012 by Jack Schofield
First Take

HTC One V

Blog Post HTC's One range of handsets comprises three models. There's the flagship HTC...

16 May, 2012 by First Take
Simon Bisson and Mary Branscombe

Contribute, contract; endorse? Technology reputations

Blog Post Technology companies need to be careful about who and what they're seen to...

16 May, 2012 by Simon Bisson and Mary Branscombe