The benefits of email archiving
Email archiving lowers the risk of being unable to find important documents and help in achieving regulatory compliance and answering litigation requests.
Data analysis, Data mining, Data warehousing, Visitor data, Analytics, Business intelligence, Databases, Website, Online, Social networking

Social-networking site MySpace may have slipped behind Facebook, but it still handles up to six billion visitor records a day.
With a major revamp due later this year, designed to help MySpace make up ground on its rival, the company — snapped up by Rupert Murdoch's NewsCorp for $580m (then £332m) in 2005 — is predicting a big jump in activity on the site and a corresponding surge in the records it handles daily — up to 10 billion. The man charged with the sizable task of making sense of that information is chief data architect Don Watters.
ZDNet UK caught up with him recently to ask him about MySpace's relaunch plans and the main issues with managing and analysing substantial volumes of information.
Q: How would you describe your job and the main challenges you face?
A: As chief data architect for MySpace, I pretty much have tactical purview over the entire data platform. Not only the data warehouse, but also the data-mining platform and the data development platform, which is all the front-end data you see at MySpace — so anything to do with your profile or music and video data. It's my responsibility to ensure the data is secure, safe, reliable and on time in real-time.
The biggest challenges we have are to do with scale. We have been doing it a while, but it's still not easy to deal with billions of records a day and still maintain some kind of coherency within the system. We have to deal with that amount of data every day. We're still struggling with new data as it comes in and [with] integrating it and making it available not only to internal users but to our customers.
Can you give an example of the information you provide to customers?
The easiest concrete example is something like an artist's dashboard, where we've given the artists who are on MySpace information about what their user base looks like. Just demographically — they don't get to see any detailed user data.
By demographic, we show artists over time what's happening on their part of the site so that they can get a better understanding of who is actually doing what. Then they either can adjust their message or their site, or maybe go to the towns where they're seeing a lot of activity.
We do an incredible amount of data crunching to be able to figure that out, because it's not always easy to take in information from users. They may say they are 103 years old and live on the North Pole, and we just have to believe them.
Or we can do the opposite, and do some introspection and try and figure out what do [the artist's] friends look like and who that person is, based on other information. We use crowdsourcing, where you take multiple sources and try and figure out what's going on in a single point of view from that crowd source.
Can you provide a sense of the scale of the number crunching?
It's massive numbers of information. We're doing somewhere in the order of three to six billion records a day.
As MySpace changes over the next six months to reinvigorate our brand, we are going to do things that will change the front end and make even more activity happen. If you think about what Twitter does and what Facebook does, you'll see a lot of things that are similar in concept, but not similar in product.
So today on MySpace, the centre panel is called the activity stream. You can filter that by many different aspects, which is something that nobody else really does. But to be able to do that in real-time is actually quite a challenge. To be able then to record on what's going on on the site, so that people understand what features are being used and what's not being used, and how they are being used and what ways users traverse the site — what pages they are hitting on the way — that takes an incredible amount of data.
MySpace is going to go through a giant product relaunch towards the end of this year, and that means the way we are doing business on the front end...
In order to post a comment you need to be registered and logged in
Log in or create your ZDNet UK account below
By signing up for this service, you indicate that you agree to our Terms and Conditions and have read and understood our Privacy Policy. Questions about membership? Find the answers in the Community FAQ
@apexwm >> "They can save maybe up to 1% of their IT costs" > I'd like to know how you propose this number? MS Office costs hundreds > per copy,...
1 minute ago by Jack Schofield on Late starters to Windows 7 migration may find it more costly, says Gartner@apexwm > I would be curious to know what exactly they mean by "mini-notebooks are > less-than-perfect substitutes for standard low-end laptops"....
26 minutes ago by Jack Schofield on While PC shipments will grow to a million per day, netbooks are in declineDigital Britain author attacks the government for delaying the 2Mbps universal service commitment http://bit.ly/ciAS2s
29 minutes ago on Twitter by superglazeResearchers at Norwegian and German institutes claim to have successfully cracked quantum cryptography equipment http://bit.ly/bfQQRt
3 hours ago on Twitter by LarsTSQuantum crypto detectors cracked by researchers http://tinyurl.com/32orrr8 @schneierblog - your thoughts?
3 hours ago on Twitter by benrothkeSuse Linux Enterprise Server for VMware ships: By Jack Clark, ZDNet UK, 2 September, 2010 17:11 VMware and Novell ... http://bit.ly/bL9BMy
3 hours ago on Twitter by dominic_victorRT @ZDNetUK_News: Dell abandons battle to buy 3Par: HP has won the short, sharp race to add the data storage management company to i... http://bit.ly/aLg1tA
4 hours ago on Twitter by Bhackett10Suse Linux Enterprise Server for VMware ships: Businesses that buy vSphere licences will get SLES free of charge, ... http://bit.ly/adlav5
4 hours ago on Twitter by ZDNetUK_NewsRT @ZDNetUK_News: iOS 4.2 available for iPad in November: The operating system update will allow wireless printing and audio and vid... http://bit.ly/azstPx
4 hours ago on Twitter by qbspchelp@gruber @daringfireball It's here, but will it get used? Universal wireless charger standard gets public release http://bit.ly/doJO2u
5 hours ago on Twitter by superglazeUniversal wireless charger standard gets public release http://bit.ly/cCdlZv
5 hours ago on Twitter by ZDNetUK_News#IPv6 repost RT @pixeladdikt: RT @RIPE_NCC: ~"IPv6 news: using #IPv6 to connect everything http://bit.ly/dtJvh3 " ... http://bit.ly/aRkCNT
5 hours ago on Twitter by IP_v6Windows Phone 7 released to manufacturers http://bit.ly/addml7
5 hours ago on Twitter by paulallen77Windows Phone 7 released to manufacturers http://bit.ly/b9oigT
5 hours ago on Twitter by ImGoneBuzzirkRT @pixeladdikt: RT @RIPE_NCC: ~"IPv6 news: using #IPv6 to connect everything http://bit.ly/dtJvh3 " +ArchRock :)
6 hours ago on Twitter by trejrcoCarter attacks coalition over 2Mbps delay http://bit.ly/aPTmax | #Droid #Android
6 hours ago on Twitter by Droid_PhoneWindows Phone 7 released to manufacturers http://bit.ly/9rL0sc | #Droid #Android
6 hours ago on Twitter by Droid_PhoneTony - on the 28th, Hotmail EAS on iPhone didn't work because it wasn't publicly available then. Ignore the email, which was part of the internal...
6 hours ago by First Take on Hotmail Exchange ActiveSyncRT @RIPE_NCC: Exciting IPv6 news: using #IPv6 to connect everything from people's homes to the smart grid http://bit.ly/dtJvh3 (by @mlamonica)
6 hours ago on Twitter by BrenoValeEmail archiving lowers the risk of being unable to find important documents and help in achieving regulatory compliance and answering litigation requests.
Technology transforming business - The term cloud is used as a metaphor for the Internet, based on how theInternet is depicted..
This compelling paper by Principled Technologies compares out-of-box experiences on Dell PowerEdge M600 Blade System, HP BladeSystem..