Twitter users should be able to download all of their old Tweets by the end of the year, according to the company’s CEO Dick Costolo, who again repeated his intention to give Twitter users access to their personal archives before 2012 comes to a close.
Costolo’s talk at the University of Michigan in Ann Arbor in mid-November, while previously reported, was recently posted online as a video and text transcript, and mirrors one he gave in September in San Francisco, Twitter’s home city.
But in his latest remarks on downloadable Tweet functionality, Costolo went into more detail about what has held the company back from implementing such capabilities sooner, and why even his deadline of by the end of the year might end up being too ambitious. In short: Twitter has been too busy keeping up with the astronomical growth in users and new Tweets — now typically up to 350 million Tweets per day, according to the CEO — that it hasn’t had time to build a solid archive retrieval feature.
Indeed, as Twitter users can attest, the website’s very design makes hunting for old Tweets a chore: After all, Twitter streams — or “timelines,” as Costolo refers to them in his latest talk — are single columns displaying their Tweets in reverse-chronological order, newest first. Just scrolling through a user’s Tweets beyond the latest 20 to 50 or so takes time, especially as Twitter stops the process to load the next batch of Tweets.
Costolo described the situation when responding to an audience member’s question: “Can you talk about why users aren’t allowed to download their own tweet history?.”
As Costolo said, according to the transcript (.txt file):
So the question is, “Why are users not allowed to download their tweet history?” It’s funny, the question makes it sound like I won’t let them. [Laughter] So here’s the deal, so during the night of the presidential election, there was a point at which we were serving 1.3 million timelines. A timeline is my home timeline of all the tweets of the people I’m following; 1.3 million timelines per second. So keep in mind that’s every second 1.3 million timelines going out that are threading together every single tweet that’s coming in from around the world at 15,000 tweets per second, and organizing them in chronological order. So that architecture is really, really, really, really well-suited to real-time search and real-time distribution.
Costolo continued:
It’s really horribly suited to archive search and archive distribution. So if you wanted to do a search against our user database, our user DB for that entire history, it would be so slow that it would slow down the rest of the real-time distribution of things. So what we’re doing to enable users to download the entire archive history of their tweets is, as you can imagine, creating a different kind of archival system for these tweets. We’re in the process of doing that now. And by the end of the year I’ve already promised this, so the engineers — when I promised it publicly they’re already mad at me so they can keep being mad at me. By the end of the year you’ll be able to download the archive history of your entire tweets; you know, your entire tweet archive. [Applause] Yes. It’s a commonly requested feature. Now, again, once again, I caveat this with the engineers who are actually doing the work don’t necessarily agree that they’ll be done by the end of the year, but we’ll just keep having that argument and we’ll see where we end up year end.
Despite the fact that Twitter does seem intent on offering, at some point in the near future, personal archives of its users’ content for their downloading convenience, the company still has yet to explain other key details about how it would work, such as, what file format Tweets would be available in, whether or not Direct Messages (private messages between two users) would be downloadable as well, and whether users could select specific Tweets for download and not others.
Twitter also hasn’t mentioned a project that would seem to be closely related: A massive Twitter reseach archive at the U.S. Library of Congress.
Twitter first announced it was working with the agency on an archive of all public Tweets ever posted to the website for “internal library use, for non-commercial research, public display by the library itself, and preservation,” back in April 2010.
At that time, just 55 million Tweets were published every day compared to the 350 million posted on an average day at present. Back then, Twitter and Google also had a search deal allowing Google’s Realtime search to scrape public Tweets and display them alongside other links in the search results, but that ended in June 2011 and hasn’t been rekindled since.
Also by June of 2011, the Library of Congress Twitter archive was still just “under construction” and Twitter was seeing 140 million Tweets per day, according to O’Reilly Radar.
One year later, in the summer of 2012, one Canadian researcher said that the Library of Congress had actually quietly given up on the project, due in part to Twitter’s recent changes to consolidate its users’ content and increase monetization of that content, primarily through ad sales. Following up, Buzzfeed found that the Library of Congress was still aiming to create its Twitter archive, but that the technical challenges were proving to be greater than expected and that there was no official timeline (no pun intended) for when it would officially launch.
Asked about the status of the Library of Congress archive now, a Twitter spokesperson provided TPM the following statement:
“As for Library of Congress, Twitter sees a billion Tweets every 2.5 days. As you can imagine, there’s a fair amount of infrastructure needed to handle all this public data, and the Library of Congress is working on this now. We don’t have an update on timing on that project.”
TPM has also reached out to the Library of Congress for more about their end of the project and will update when we receive a response.
(H/T: TechCrunch)