[Rabbit-dev] FilterHandler is unable to correctly parse some buffer streams

Sat Dec 12 21:52:58 CET 2009

On Sat, 12 Dec 2009 20:27:41 +0000
Mindaugas Žakšauskas <mindas at gmail.com> wrote:

> Now if I place a breakpoint in rabbit.filter.BackgroundFilter::handleTag
> (the line which does tag.removeAttribute ("background");) it sometimes stops
> (and works correctly) but sometimes doesn't. E.g. it stops if I go to
> 
> http://www.ukstudentlife.com/Life/Money.htm (1)
> but doesnt if I go to
> http://en.wikipedia.org/wiki/South_African_Republic_pond (2)

If you add the HttpSnoop filter to the httpoutfilters you can easily see
the header that rabbit reads from the server and you will find: 

GET http://en.wikipedia.org/wiki/South_African_Republic_pond HTTP/1.1
HTTP/1.0 200 OK
...
Content-Encoding: gzip
Content-Length: 5878
Content-Type: text/html; charset=utf-8
...

So that you see binary junk is expected, the data is gzipped.

Rabbit should not normally try to filter this data, if it is then it is doing the
wrong thing.
If you set "repack=true" rabbit ought to unpack and filter pages
that are gzipped.

Rabbit does not normally try to unzip compressed pages, since 
compressed content means that someone has already thought about
minimizing the content and then rabbit will only add latency.

/robo