All Those 140-Character Twitter Messages Amount To Petabytes Of Data Every Year
Every day, Twitter users generate tons of data. According to the Technology Review Editors' Blog, all those 140-character messages "add up to 12 terabytes of data every day."
"This wealth of data seems overwhelming but Twitter believes it contains a lot of insights that could be useful to it as a business," Erica Naone writes.
So, what does the company do with all that information?
According to the company's analytics lead, Kevin Weil, Twitter "tracks when users shift from posting infrequently to becoming regular participants, and looks for features that might have influenced the change."
Weil is responsible for "managing and mining big data at Twitter." At the Web 2.0 Expo in New York, he said "the company has also determined that users who access the service from mobile devices typically become much more engaged with the site," Naone reports.
Twitter is also asking some more open-ended questions. Weil said the company is interested in what influences retweets (posts from one user that are reposted by another). And Twitter has discovered that it can make good guesses about the topics a user is interested in by looking at the users he follows that don't follow him back.
Asking such specific questions of huge quantities of data is a common problem for successful Web companies. Weil explained that Twitter benefits from a variety of open-source software developed by companies such as Google, Yahoo, and Facebook. These tools are designed to deal with storing and processing data that's too voluminous to manage on even the largest single machine.