Twitter’s Firehose Myth – What You Need To Know About the Twitter Firehose and APIs
Today’s big news: Twitter is developer focused and is expanding it’s policy for access to their data. The Twitter firehose is opening up in 2010.
I wanted to share my perspective on the Twitter APIs and interfaces for developers.
Drinking from the Firehose – Not Until 2010
There are a lot of places where people seem to think they’ve got the firehose, but what they’re really running are downsampled versions ("gardenhose" or "spritzer"). You can get "spritzer" amounts of data without any special arrangement, "gardenhose" is more data and there seem to be a number of university projects that are using that.
Today, Ryan Sarver is reporting from Europe that Twitter has received over 50,000 registered applications from developers using the Twitter API. Ryan stressed that they want to be transparent about their polices and intentions about their APIs. Additionally, they want to be proactive in communicating with developers on key activities.
Twitter success factors will depend on their ability to provide great technology via their API and more importantly make sure their is a profitable ecosystem of developers.
The big announcement is that the firehose will be open to developers and that Twitter will put on a developer conference called Chirp both the firehose and Chirp happening in 2010. Also Twitter said that anyone using OAuth will get a rate limit increase by a factor of 10x.
The Firehose Myth
The REST API and Search Streaming Interfaces do just fine. Here is my Angle in working with Twitter for the past year via the API.
For the Twitter API, there is a form on the web site to fill out. Once they approve it, you get 20K requests per hour. The search API has a higher rate limit which seems to vary. The streaming interface is new,
Getting Whitelisted
Developers who were serious would have to go throught a "whitelist" process. They provide a form and that is for the Twitter REST API. Historically, they generally seem to be pretty liberal about providing access to developers that look like they’re doing something plausible.
Turn around time on Twitter API whitelisting varies, It just depends on how busy they are, and whether they think you’re an actual developer or not (as opposed to someone wanting to run a spam bot). In the past, a few days to a week is typical. Haven’t seen a lot of complaints, so it’s probably similar now.
Once you’re whitelisted you can make 20K request/hour. The lead time varies from a few days to a few weeks depending on how busy they are, someone actually looks at each request. They have a pretty severe spam problem, and I think they probably want to avoid people who are obviously going to cause problems hence the whitelist process.
Search API and Streaming Interface
The search API and streaming interface are separate from the REST API. There’s no whitelisting process at for Twitter search, they allow fairly heavy use of the service, and adjust the rate limit up and down, probably depending on load and policy.
The streaming interface doesn’t require any agreement at all to use the "spritzer", a subsampled set of Twitter status messages. Many developers and users seem to be using spritzer even though they say (or perhaps think) they’re using the firehose.
Most (almost all) developers do not have access to the "firehose", and it requires a separate discussion than the Twitter API whitelisting application. Many applications are using combination of spritzer, track (which matches keywords from the firehose) and search to achieve most of what they would get from the full data stream.
Twitter recently announced that they would be providing the firehose to Ycombinator startups and as announced today there will probably be more "real" firehose applications to come out next week.
My advice for developers is to use the whitelist form for the REST API. If you’re starting a serious project you’d probably want that. That lets you look up profiles/friends/followers etc.
The streaming interface is separate, and you can get at a lot of things without the full stream by using spritzer, track, shadow, and search. "Track" is sort of like a streaming search, and "shadow" is like a streaming timeline feed. You open a connection and they push matching items at you as they arrive.
The firehose is not open to everyone today and it is often the case developer are really running are downsampled versions ("gardenhose" or "spritzer") .
Bottom Line Angle
If Twitter opens up the firehose feed, take it, but you might not necessarily need/want it. There’s a lot more data that comes with the firehose so it can be more useful to have Twitter (via their search and streaming API) filter for you then process the result.
The firehose could drown you so be careful.
You don’t need the firehose to be successful using and scaling with Twitter the search and streaming interfaces are fine, but if the firehose opens up then take it. However beware that the amount of data that comes with the firehose can be overwhelming and require data "cleanup" and processing.
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU