Twitter Streaming API from Python #
I'm playing around with Twitter's streaming API for a (personal) project. tweetstream is a simple wrapper for it that seemed handy. Unfortunately it has a known issue that the HTTP library that it uses (urllib2) uses buffering in the file object that it creates, which means that responses for low volume streams (e.g. when using the follow
method) are not delivered immediately. The culprit appears to be this line from urllib2.py
(in the AbstractHTTPHandler
class's do_open
method):
fp = socket._fileobject(r, close=True)
socket._fileobject
does have a bufsize
parameter, and its default value is 8192. Unfortunately the AbstractHTTPHandler
doesn't make it easy to override the file object creation. As is pointed out in the bug report, using httplib directly would allow this to be worked around, but that would mean losing all of the 401 response/HTTP Basic Auth handling that urllib2
has.
Instead, while holding my nose, I chose the following monkey patching solution:
# Wrapper around socket._fileobject that forces the buffer size to be 0 _builtin_socket_fileobject = socket._fileobject class _NonBufferingFileObject(_builtin_socket_fileobject): def __init__(self, sock, mode='rb', bufsize=-1, close=False): builtin_socket_fileobject.__init__( self, sock, mode=mode, bufsize=0, close=close) # Wrapper around urllub2.HTTPHandler that monkey-patches socket._fileobject # to be a _NonBufferingFileObject so that buffering is not use in the response # file object class _NonBufferingHTTPHandler(urllib2.HTTPHandler): def do_open(self, http_class, req): socket._fileobject = _NonBufferingFileObject # urllib2.HTTPHandler is a classic class, so we can't use super() resp = urllib2.HTTPHandler.do_open(self, http_class, req) socket._fileobject = _builtin_socket_fileobject return resp
Then in tweetstream
's urllib2.build_opener()
call an instance of _NonBufferingHTTPHandler
can be added as a parameter, and it will replace the built-in HTTPHandler
.
5 Comments
And you can set the socket to be line buffered with socket._fileobject.default_bufsize = 1. I'm not sure how significant the performance difference really is though.
Post a Comment