logo PHP XML Classes
A collection of classes and resources to process XML using PHP

Description: This is a set of classes defining an abstract interface for SAX parsers as well as interfaces to implement SAX filters. A Expat SAX parses is provided as an example of a SAX parser and some SAX filters are provided as examples. Filters can transform, update or query XML documents. Simple filters can be chained for advanced processing.

Sax Filters (class_sax_filters.php)
PHP XML Classes
Sax Filters (class_sax_filters.php)

Description: This is a set of classes implementing SAX filters, the classes include a SAX class to parse XML documents using Expat and defines a way to create SAX filters to perform SAX- based queries, updates and transformations of documents. Simple filters can be chained to construct augmented complex XML processors. Interfaces for filters are very easy to define. The classes include a mechanism to stream the output of "final" filters in order to increase efficiency.

NEWS:
  • 07-04-2002 Documentation updated and package rebuilt.
  • 05-17-2002 First version of this set of classes released.
This class code as well as documentation are hosted at SourceForge please visit our SourceForge page for releases, documentation, bug-tracking, support forums and mailing lists.

Resources Requirements
  • PHP 4.0.5+

Features To-dos
  • Arbietrary filters can be defined
  • Filter chains can link any number of filters
  • Easy to implement
  • Support for processing instructions

Contact: Luis Argerich (lrargerich@yahoo.com)

Detailed description and usage:

This is an overview of the classes that this package defines and how to use each one

class AbstractSAXParser: This class defines an abstract SAX parser, if you want to build your own SAX parser or adapt some parser you should implement this class methods.

What does a SAX parser do? It must parse the XML document and generate "events" that are passed to a listener object. Note that the parser doesn't process the XML document at all it just parses the documents and generate events that will be processed by a listener object (An AbstractFilter object).

Class methods to be implemented are:

MethodDescription
AbstractSAXParser()The constructor should build the parser and can receive an XML file, for example if needed indicating the XML document to be parsed. How the parser knows where is the document to parse is left free to the parser implementation.
other optional methodsOther optional methods allowinf to set XML documents from different sources, parser options etc can be added as needed.
parse()This is the principal method of the class, parsers must parse the XML source and generate the proper events calling the following methods defined in this class: startElementHandler($parser,$name,$attribs)/endElementHandler($parser,$name)/characterDataHandler($data) those methods will call the same methods on the listener object thus propagating SAX events to the listener as they are produced by the parser.

TIP:Note that you can implement an AbstractSAXParser for non-xml data converting the non-XML data to XML by simply producing events and then processing the events using filters that are prepared for XML processing.

class ExpatParser: This class is an implementation of the AbstractSAXParser class using the PHP built-in expat parser which is, precicesly, a SAX parser. This version receives the XML as a file receiving the name of the file as an argument of the constructor. The class can be used as following:
$parser = new ExpatParser("foo1.xml");
$filter=new SomeFilterHere();
$parser->setListener($filter);
$parser->parse();

Note that all the processing is done at the filter so it is time to see what a filter is (keeo reading, it's easy!)

class AbstractFilter:Filters are objects that receive SAX events (from a parser or another filter), process them doing something useful and then pass the events to another filter or output the result in some way, filters that don't propagate events are called "finalizer" filters and are typically filters that output the document to the browser or a file.

Filters must extend the AbstractFilter class implementing the following methods:

startElementHandler($name,$attribs)This method is called when an element starts in the XML document, the method receives the element names as well as an array of asociative arrays with the element's attirbutes. A filter can artificially call this method to "create" elements in the result.
endElementHandler($name)This method is called when an element ends. It receives the name of the element.
characterDataHandler($data)This method is called when text data occurs in the XML document note that context is not provided so the filter object must keep track of the context if needed using variable members, an stack or smilar methods.
Other methodsOther methods specific to what the filter should do may be added as well.

Besides that methods all filters have a predefined "setListener" method that allows you to set a listener object for the filter events what is needed to propagate events from one filter to another.

As an example two filters are provided in the package: FilterName and FilterNameBold, FilterName converts all the <name>something</name> elements uppercasin its content for example to <name>SOMETHING</name>

The FilterNameBold adds a "bold" element to all name elements thus converting <name>something</name> into <name><b>something</b></name>

The FilterOutput method is a "finalizer" filter that doesn't propagate events, it just outputs the XML content to the browser. So it is useful as the last filter in filter chains for testing.

If you want to convert all name elements to uppercase you use the classes as follows:
include_once("class_sax_filters.php");
$f1=new ExpatParser("applications.xml");
$f1->parserSetOption(XML_OPTION_CASE_FOLDING,0);
$f2=new FilterName();
$f3=new FilterOutput();
$f2->setListener($f3);
$f1->setListener($f2);
$f1->parse();

We create an Expat parser, a FilterName object and a FilterOutput object.

First we set the FilterOutput as the FilterName listener, what means that events created by FilterName will be passed to FilterOutput.

Then we set the FilterName as the parser listener what means that events generated at the parser level will be propagated to FilterName and since FilterName passes events to FilterOutput that will be the last link in the filter chain.

The order in which listeners are set is very important since when we set the parser listener that object must already have been set with a listener in order to do something.

Then we just call the parse method. What will happen is that the parser will parse the XML document generating events, the events will be passed to filterName where name elements are uppercased and then the events will be propagated to filterOutput where the content is just printed.

Filter Chains can be as complex as you want linking several filters to produce a complex task. Filters can add elements, remove elements (absorbing events) and change elements thus allowing any kind of XML processing from queries to transformations.

SAX filters are a sound way to modularize SAX processing of XML documents. When documents are very large or huge only a SAX based processing is efficient since SAX never reads the whole document in memory it just processes the document chunk by chunk.

Documentation

Classes

AbstractSAXParser

Extends: None
Description: This is an abstract class defininf the methods that SAX parsers must implement in order to be able to work with SAX filters.

Method Summary
 void AbstractSAXParser()
          Constructor
 void parse()
          Parses the XML source
 void setListener(Object $obj)
          Sets the parser's listener
 

Method Detail

AbstractSAXParser

void AbstractSAXParser()
The constructor may or may not receive arguments pointing to the XML source to be parsed, this heavily depends on the parser itself, we may have parsers for XML files, parsers for XML files or XML strings etc.
 
Parameters:
Returns:
Throws:
None

parse

void parse()
This method parses the XML source specified to the parser in some way. While parsing this method must call the proper methods in order to propagate events to this class' listener. The methods to be called are startElementHandler($parser,$name,$attribs), endElementHandler($parser,$name) and characterDataHandler($data). Note that these methods must not be implemented by the parser.
 
Parameters:
Returns:
Throws:
None

setListener

void setListener(Object $obj)
This method sets the listener to a parser object. The listener is a Filter object extending the AbstractFilter class that will receive the events generated by the parser and do something with them.
 
Parameters:
$obj - An object of a Filter class extending the AbstractFilter class
Returns:
Throws:
None

AbstractFilter

Extends: None
Description: This class defines the methods that must be implemented by a Filter.

Method Summary
 void setListener(object $obj)
          Sets the Filter's listener object
 void startElement(string $name, array $attribs)
          Method that is called when an XML element starts
 void endElement(string $name)
          Method that is called when an element ends
 void characterDataHandler(string $data)
          Method that will be called when text data is found
 

Method Detail

setListener

void setListener(object $obj)
This method defines an object that will be used as a listener for events propagated from a Filter. This method is already implemented in the abstract class so Filters don't have to implement it.
 
Parameters:
$obj - An object from a class extending the AbstractFilter class
Returns:
Throws:
None

startElement

void startElement(string $name, array $attribs)
This method should be implemented by Filters, the method receives the name of the element and its attributes. What the method does depends on the filter.
 
Parameters:
$name - Name of th element
$attribs - This is an array of associative arrays containing the attributes for the element. You can process it using a construct like: foreach($attribs as $name=>$value) { }
Returns:
Throws:
None

endElement

void endElement(string $name)
This method should be implemented by filters, it will be called when an XML element ends
 
Parameters:
$name - Name of the element that ends
Returns:
Throws:
None

characterDataHandler

void characterDataHandler(string $data)
This method should be implemented by filters, the method will be called when text is found in an XML document, the method can be called several times for the same text node (by chunks) and no context information is provided, the filter should track context if it needs to know, for example, the name of the element where text was found
 
Parameters:
$data - The text chunk found
Returns:
Throws:
None

ExpatParser

Extends: AbstractSAXParser
Description: This is an implementation of the AbstractSAXParser class using the PHP internal Expat parser

Method Summary
 void ExpatParser(string $xmlfile)
          Constructor
 void parse(string $xmlfile)
          Parses the XML document
 void setListener(object $obj)
          Sets the listener object of the ExpatParser
 void parserSetOption(constant $option, some $value)
          Sets options for the Expat parser
 

Method Detail

ExpatParser

void ExpatParser(string $xmlfile)
The constructor receives the name of the XML file to be parsed.
 
Parameters:
$xmlfile - Name of the file containing the XML document to be parsed
Returns:
Throws:
None

parse

void parse(string $xmlfile)
This method parses the XML file pointedby the filename indicated when the parser was constructed. The method will parse the document and propagate events to the listener object. A setListener method must havebeen used before parsing.
 
Parameters:
Returns:
Throws:
None

setListener

void setListener(object $obj)
This method is used to set the listenerObject for the parser: the first Filter in the chain. The object must be an instance of a class extending the AbstractFilter object.
 
Parameters:
$obj - An object from a class implementing the AbstractFilter class
Returns:
Throws:
None

parserSetOption

void parserSetOption(constant $option, some $value)
Sets options for the Expat parser
 
Parameters:
$option - For example XML_OPTION_CASE_FOLDING to set if case folding is applied or not to the document. (See the PHP documentation for options that can be set for an Expat parser)
$value - Value for the option being set
Returns:
Throws:
None

FilterOutput

Extends: AbstractFilter
Description: This is a finalizer filter that must be used always at the end of a filter chain. This filter absorbs SAX events ouputting the XML document to the browser.

Method Summary

This class doesn´t have any method

 

Method Detail

This class doesn´t have any method


Contribute!: If you want to contribute a class to this project or help with new versions of existing classes please let me know it by emaill Hosted at:
SourceForge.net Logo
Contact & credits
Luis Argerich
Rogerio

OSI Certified Open Source Software
OSI
RSS FEED XHTML