[Rabbit-dev] FilterHandler is unable to correctly parse some buffer streams

Sat Dec 12 21:27:41 CET 2009

Hi,

I was playing with various Html filters (e.g. BackgroundFilter) but
strangely I am getting intermittent results.

Basically I only want to add some extra stuff to the output HTML and do not
bother with gzipping at all (any reason why FilterHandler extends
GZipHandler?). So my config is like this:

8<------------------------------------------------------------------------------------
#
text/html=rabbit.handler.FilterHandler
text/html;charset\=iso-?8859-1=rabbit.handler.FilterHandler
text/html;charset\=iso-?8859_1=rabbit.handler.FilterHandler
text/html;charset\=utf-?8=rabbit.handler.FilterHandler
text/html;charset\=utf_?8=rabbit.handler.FilterHandler

# few lines below....

[rabbit.handler.GZipHandler]
# If I set this to false, no filtering happens - at all! WTF?
compress=true

[rabbit.handler.FilterHandler]
filters=rabbit.filter.BodyFilter,rabbit.filter.BackgroundFilter
8<------------------------------------------------------------------------------------

Now if I place a breakpoint in rabbit.filter.BackgroundFilter::handleTag
(the line which does tag.removeAttribute ("background");) it sometimes stops
(and works correctly) but sometimes doesn't. E.g. it stops if I go to

http://www.ukstudentlife.com/Life/Money.htm (1)
but doesnt if I go to
http://en.wikipedia.org/wiki/South_African_Republic_pond (2)

The reason why this is so intermittent is because the byte buffer array
(named "arr") which is formed in FilterHandler::modifyBuffer sometimes comes
as meaningful text and sometimes as unparseable garbage. If I try to create
a new String from this array - readable string is created for page (1) but
is total garbage for (2). At some later stage, when parser parses this
garbage, it spits a single HtmlBlock which is yet another invariant of same
garbage anyway.

The reason why "arr" array sometimes comes as garbled must be coming from
implementation details of rabbit.io.BufferHandle
(rabbit.io.CacheBufferHandle in my case). It is rather hard and time
consuming for me to dig deeper to understand what might be wrong here, but I
hope I have provided enough pointers for somebody who knows the internals
better.

Somebody else seen this problem before? Does this depend on different HTTP
server behaviour (to me it looks so)?
BTW, I am using Rabbit 4.2.

m.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://khelekore.org/pipermail/rabbit-dev/attachments/20091212/1ab924c9/attachment-0002.html>