com.sun.slamd.http
Class HTMLDocument

java.lang.Object
  extended bycom.sun.slamd.http.HTMLDocument

public class HTMLDocument
extends java.lang.Object

This class defines an HTML document that may be included as part of a response sent by a Web server. It provides methods for performing various operations on the document, including extracting any links or images that it may contain, or retrieving the text of the document.


Constructor Summary
HTMLDocument(java.lang.String documentURL, java.lang.String htmlData)
          Creates a new HTML document using the provided data.
 
Method Summary
 java.lang.String[] getAssociatedFiles()
          Retrieves an array containing a set of URLs parsed from the HTML document that reference files that would normally be downloaded as part of retrieving a page in a browser.
 java.lang.String[] getDocumentFrames()
          Retrieves an array containing a set of URLs parsed from the HTML document that reference frames used in the document.
 java.lang.String[] getDocumentImages()
          Retrieves an array containing a set of URLs parsed from the HTML document that reference images used in the document.
 java.lang.String[] getDocumentLinks()
          Retrieves an array containing a set of URLs parsed from the HTML document that are in the form of links to other content.
 java.lang.String getDocumentURL()
          Retrieves the URL of this HTML document.
 java.lang.String getHTMLData()
          Retrieves the original HTML data used to create this document.
 java.lang.String getTextData()
          Retrieves the contents of the HTML document with all tags removed.
 boolean parse()
          Actually parses the HTML document and extracts useful elements from it.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HTMLDocument

public HTMLDocument(java.lang.String documentURL,
                    java.lang.String htmlData)
             throws java.net.MalformedURLException
Creates a new HTML document using the provided data.

Parameters:
documentURL - The URL for this document.
htmlData - The actual data contained in the HTML document.
Method Detail

parse

public boolean parse()
Actually parses the HTML document and extracts useful elements from it.

Returns:
true if the page could be parsed successfully, or false if not.

getDocumentURL

public java.lang.String getDocumentURL()
Retrieves the URL of this HTML document.

Returns:
The URL of this HTML document.

getHTMLData

public java.lang.String getHTMLData()
Retrieves the original HTML data used to create this document.

Returns:
The orginal HTML data used to create this document.

getTextData

public java.lang.String getTextData()
Retrieves the contents of the HTML document with all tags removed.

Returns:
The contents of the HTML document with all tags removed, or null if a problem occurs while trying to parse the HTML.

getAssociatedFiles

public java.lang.String[] getAssociatedFiles()
Retrieves an array containing a set of URLs parsed from the HTML document that reference files that would normally be downloaded as part of retrieving a page in a browser. This includes images and external style sheets.

Returns:
An array containing a set of URLs to files associated with the HTML document, or null if a problem occurs while trying to parse the HTML.

getDocumentLinks

public java.lang.String[] getDocumentLinks()
Retrieves an array containing a set of URLs parsed from the HTML document that are in the form of links to other content.

Returns:
An array containing a set of URLs parsed from the HTML document that are in the form of links to other content, or null if a problem occurs while trying to parse the HTML.

getDocumentImages

public java.lang.String[] getDocumentImages()
Retrieves an array containing a set of URLs parsed from the HTML document that reference images used in the document.

Returns:
An array containing a set of URLs parsed from the HTML document that reference images used in the document.

getDocumentFrames

public java.lang.String[] getDocumentFrames()
Retrieves an array containing a set of URLs parsed from the HTML document that reference frames used in the document.

Returns:
An array containing a set of URLs parsed from the HTML document that reference frames used in the document.