How to get old #jan25 tweets from Topsy

As we pass the first anniversary of #jan25, we’ve been getting a lot of support requests at Topsy asking how to get old tweets. So we thought we’d explain with a blog post.

Can I get all my old tweets / old tweets for a query?

Short answer: it’s not very simple, but it’s possible.

Topsy has tweets going back to mid-2008. But only tweets which contain links, or tweets that have been retweeted, or retweets of other tweets are included in our searchable index.

The user-friendly way

Use Topsy Advanced Search. Fill out your query term, e.g. #jan25 or from:r2g2. Fill out a date range, e.g. Jan 24, 2011 to Jan 25, 2011 (the time zone is in Pacific Time). Then go! You will see results sorted by relevance (weighted by number of tweets / retweets, and the influence of the tweeters). You can choose to sort by date and limit results to tweets (excluding websites and images linked to in tweets) but if you want results in order of ascending date (oldest first) you will have to hand edit the url, adding &timeline_order=-date

I want it all: computer-friendly is ok

To get a time-sorted (oldest-first) list of tweets that can be parsed by a simple script, use Topsy’s Otter API. Use the /search call, specify mintime and maxtime in Unix Epoch time, specify perpage=100, and iterate through the pages with page=1, page=2 etc. You can get up to 10 pages. This means you only get 1000 results. But there were 23658 #jan25 tweets in our index on Jan 25, 2011. To get them all, you will have to specify smaller windows, e.g. one hour at a time. The ‘total’ field in the ‘response’ provides the number of results within the time window chosen, as long as it’s less than 1000 you can retrieve them.

Use /search.json for JSON output; /search.txt for human-readable formatted JSON; and /search.dummy for an HTML table that can be copied and pasted into a spreadsheet.

You can use any query in the ‘q’ field, such as ‘#jan25′ or, to get all tweets from a user (such as @r2g2, that’s me): ‘from:r2g2′

Enjoy!

Comments are closed.