... fraud, but they figured if they could put their customers, their employees and the people that they already knew were fraudsters into one database, they figured that that would help them save, reduce that fraud, maybe by $10m.
But the risk of putting all of your customers and all of your employees in one database, the risk of having that escape and get away — if somebody hacks a system or an employee goes bad — the risk of that to their brand was more than $10m. So they chose not to do it.
So this (anonymisation) technique would allow them to analyse that data that they have all those rights legally to analyse. But to do it in a way that reduces the risk of unintended disclosure, such as somebody hacking the system and stealing information.
Many people are concerned about the amount of information on them that's collected. Can this technology be used beyond law enforcement or fraud detection? My view of how relevant this technology is has been changing over time. I've come to this new conclusion: that is, if a company is sharing its sensitive data — like its customers or employees — and if you told the company they could share it in an anonymised form and get a materially similar result, why would a company want to share it any other way? And I've been passing that around to see what people think about that and I'm getting a lot of agreement. I'm starting to think that what has been created here is bigger than I was originally thinking.
It is relevant to health care. It is relevant to how the companies share data for marketing purposes. It is relevant to how government would share information. I'm starting to see more and more places where I'm like, "Wow, you could apply it there as well."
How does this anonymisation technology work? How is it different from encryption?
That's a good question. So with encryption, I would normally encrypt my data and send it you and you would decrypt it to use it. The technique that we've developed is: I encrypt and you encrypt and all of the analysis is done only using the encrypted data. It's not decrypted first. Historically, you first decrypt it to analyse it. We figured out how to do deep analytics while it is encrypted.
Privacy, it seems, has at least as much to do with the right practices as the right technology. Do you coordinate with IBM's chief privacy officer (Harriet Pearson)?
Absolutely. How could I possibly be wearing a privacy hat without a regular dialogue? The first year that I invented it, I was generally told that it was impossible.
I also have a privacy strategist that reports to me. This is rather unusual for most companies. But we've got a guy named John Bliss that works with me on privacy strategies. It kind of highlights the importance with which we see baking privacy into the creations. I don't invent something for our group, pass it off to engineering and later figure out how to make it protect privacy — or think about how we're going to message it to be privacy-protective.
John Bliss and I worked very closely together and with others in IBM, including IBM Research — they have some incredibly smart people doing work in the privacy areas in Almaden, Zurich and Watson. So by the time I bake something up for engineering, it has the best notion of, at that time, what would be a responsible way to deploy it.
Since you've been at IBM, have you found more awareness of this identity resolution technology by corporate customers?
They've just got a growing awareness. I'll tell you a funny thing. The first year that I invented it, I was generally told that it was impossible.
How come?
Well, one-way hashes are infinitely sensitive. "Bob" and "Bob " (with a space) produce entirely different hashes. So, the number of times that identity data is exactly the same on the left hand and the right hand is almost never, because one's got a period after their middle initial and the other one doesn't. So it was believed that it wouldn't have any real practical use, because it was believed that it would be too sensitive.
But now that I've been presenting the techniques and we have some trade secrets, some pending patents, we're encouraged that others are going after this as well. We think it'll be a real market. We moved from a year of, "That's not even possible, you're lying" to "OK, OK, you can do it; it'll work".






