rabbit.html
Class HtmlParser

java.lang.Object
  extended by rabbit.html.HtmlParser

public class HtmlParser
extends Object

This is a class that is used to parse a block of HTML code into separate tokens. This parser uses a recursive descent approach.

Author:
Robert Olofsson

Field Summary
static int COMMENT
          A HTML comment "<!-- some text -->"
static int DOUBLEQUOTE
          This is the character '"'
static int DQSTRING
          This is a Double Quoted String a "string"
static int END
          This indicates the end of a block.
static int EQUALS
          Equals '='
static int LT
          Less Than '<'
static int MT
          More Than '>'
static int SCRIPT
          A HTML script
static int SINGELQUOTE
          This is the character '''
static int SQSTRING
          This is a Single Quoted String a 'string'
static int START
          This indicates the start of a block.
static int STRING
          This indicate a String value was found.
static int UNKNOWN
          Unknown token.
 
Constructor Summary
HtmlParser(Charset cs)
          Create a new HTMLParser
 
Method Summary
 HtmlBlock parse()
          Get a HtmlBlock from the pagepart given.
 void setText(byte[] page)
          Set the data block to parse.
 void setText(byte[] page, int startIndex, int length)
          Set the data block to parse.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

START

public static final int START
This indicates the start of a block.

See Also:
Constant Field Values

STRING

public static final int STRING
This indicate a String value was found.

See Also:
Constant Field Values

SQSTRING

public static final int SQSTRING
This is a Single Quoted String a 'string'

See Also:
Constant Field Values

DQSTRING

public static final int DQSTRING
This is a Double Quoted String a "string"

See Also:
Constant Field Values

SINGELQUOTE

public static final int SINGELQUOTE
This is the character '''

See Also:
Constant Field Values

DOUBLEQUOTE

public static final int DOUBLEQUOTE
This is the character '"'

See Also:
Constant Field Values

LT

public static final int LT
Less Than '<'

See Also:
Constant Field Values

MT

public static final int MT
More Than '>'

See Also:
Constant Field Values

EQUALS

public static final int EQUALS
Equals '='

See Also:
Constant Field Values

COMMENT

public static final int COMMENT
A HTML comment "<!-- some text -->"

See Also:
Constant Field Values

SCRIPT

public static final int SCRIPT
A HTML script

See Also:
Constant Field Values

END

public static final int END
This indicates the end of a block.

See Also:
Constant Field Values

UNKNOWN

public static final int UNKNOWN
Unknown token.

See Also:
Constant Field Values
Constructor Detail

HtmlParser

public HtmlParser(Charset cs)
Create a new HTMLParser

Method Detail

setText

public void setText(byte[] page)
Set the data block to parse.

Parameters:
page - the block to parse.

setText

public void setText(byte[] page,
                    int startIndex,
                    int length)
Set the data block to parse.

Parameters:
page - the block to parse.
length - the length of the data.

parse

public HtmlBlock parse()
                throws HtmlParseException
Get a HtmlBlock from the pagepart given.

Throws:
HtmlParseException