XML

What is XML?
XML is a specification for creating your own markup languages. A document created according to the rules of the XML specification looks similar to HTML in that it contains elements and attributes coded as tags. Nevertheless, browsers do not attempt to directly format and display XML documents unless formatting information is provided in some way.
XML documents are human readable, and yet there are many applications designed to parse XML documents and work efficiently with their content. PHP5 has new XML-related functions that can easily be used to work with XML documents, or transform non-XML data into XML documents.
You can make your own XML document as easily as this:
<?xml version="1.0" ?>
<php_programs>
   <program name="cart">
   <price>100</price>
   </program>
   <program name="survey">
      <price>500</price>
   </program>
</php_programs>
The first line of this document specifies the version of XML that's being used (notice the delimiters <?xml and ?>—awfully close to PHP's delimiters, so make sure to use the full <?php to start your PHP code). The second line defines the root element of the document (named programs). There can only be one root element for an XML document. The third line defines a child element of the root element, named program, and it contains an attribute named name that is set equal to cart.
From these lines, it should be obvious that this XML document is about PHP programs, and that there are two programs available (cart and survey), and that the price of cart is $100, whereas the price of survey is $500. The root element may contain multiple elements (in the example there are two program elements in the root element), and like HTML, XML documents are composed primarily of elements and attributes.
Anyone can write XML documents, and many folks also program their applications and programming languages to handle XML documents, both reading existing documents and composing their own new ones. The XML specification is free for anyone to use; the World Wide Web Consortium (at www.w3.org, the same place as the HTML and XHTML specs are maintained) authored and maintains the latest versions of the spec.
Although you can write XML documents just by laying out a few elements and attributes, often you want predefined elements and attributes, so that when you exchange data with another person or application, both parties to the transaction know exactly what the element and attribute names mean. Predefined elements and attributes are specified in a document type definition (DTD) or by an XML Schema, both of which are discussed a little later in this Post Frequently you'll find that before you write an XML document, you'll need to either find or write your own DTD or XML Schema. Once you write a DTD, you can publish it on the Web, and anyone who needs to write an XML document compatible with yours (or receive one from you and validate it) has the capability to check the published DTD as he reads your document.
XML Document Structure
There are two terms you hear over and over when discussing XML: well-formed and valid. A well-formed XML document follows the basic syntax rules (to be discussed in a minute), and a valid document also follows the rules imposed by a DTD or an XML Schema.
Being well-formed is the most basic requirement for XML documents; one that is not well-formed is not really an XML document. It's kind of like a script that someone tried to write in PHP but which contains fatal syntax errors; yes, it looks like PHP, but it really isn't until all the syntax errors are removed. A well-formed XML document may contain any elements, attributes, or other constructs allowed by the XML specification, but there are no rules about what the names of those elements and attributes can be (other than the basic naming rules, which are really not much of a restriction) or about what their content can be. It is in this extensibility that XML really derives a lot of its power and usefulness; so long as you follow the basic rules of the XML spec, there's no limit to what you can add or change.
A well-formed document does not need to be valid, but a valid document must be well formed, otherwise it couldn't be read in the first place. If a document is well formed, and it contains a reference to a DTD or XML Schema, your XML parser has the opportunity to reference the DTD or Schema and determine whether the document is valid. An XML document is valid if the elements, attributes, and so on that it contains follow the rules in the DTD or Schema. By definition then, the DTD or Schema contains rules about what elements or attributes may be contained in the document, what data those elements and attributes are allowed to have, and so on. In fact, the whole purpose of having a DTD or Schema is to define exactly what elements and attributes are allowed, and exactly what data they can contain.
Although referencing a DTD or Schema limits the name/value pairs (elements, attributes, and many of the other XML constructs) you may have in your XML document, the big advantage is that applications that know nothing about each other can still communicate effectively when they share the capability to parse XML because they can both read a well-formed document, and understand its contents if it is valid. Being readable by either humans or machines, and, by virtue of a DTD or XML Schema, knowing specifically what the elements and attributes mean, is another feature that makes XML so powerful.
Major Parts of an XML Document
An XML document may contain an optional prolog, and then the mandatory root element (including any content and other elements, attributes, and so on), with an optional section at the end for other data. The following list identifies the requirements within these major sections.
1. XML documents should contain an xml version line, possibly including a character encoding declaration.
2. Valid XML documents contain a DTD or an XML Schema, or a reference to one of these if they are stored externally.
3. XML documents usually contain one or more elements, each of which may have one or more attributes. Elements can contain other elements or data between their beginning and ending tags, or they may be empty.
4. XML documents may contain additional components such as processing instructions (PIs) that provide machine instructions for particular applications; CDATA sections, which may contain special characters, such as those found in scripts that are not allowed in ordinary XML data; notations, comments, entity references (aliases for entities such as special characters), text, and entities.
Here's an example of an XML document that is both well formed and valid:
<?xml version="1.0" standalone="no"?>
<!DOCTYPE Client SYSTEM "http://www.example.com/dtds/client.dtd">
<clients>
   <client>Joe</client>
   <client>Jim</client>
</clients>
Notice the reference to an external DTD located at the URL specified. This means that the document can be validated by reading the DTD and then checking the document to make sure that it conforms to the DTD. Of course, you could manually read through the document and compare it with the elements, attributes, and other document components specified in the DTD, but there are many applications available that can automatically validate an XML document against a DTD or an XML schema. And because the DTD or Schema is available either directly in the document or online, it's easy for these applications to perform the validation function for you automatically as they parse the document.
Well-Formed XML Documents
A well-formed XML document follows the XML specification syntax. The syntax, of course, follows some basic rules, the most common of which are listed here:
1. There is only one parent element containing all the rest of the elements in the document.
2. XML documents should (but are not strictly required to) begin with an XML declaration that gives the XML version number being used. For example:
<xml version="1.0">
3. Character encoding declarations may be included with the XML version line, and must be included for encodings other than UTF-8 or UTF-16. The code might look like this:
<xml version="1.0" encoding="UTF-8">
4. If an XML document contains a DTD or a reference to an XML Schema that must appear before the first element in the document.
5. XML elements can be made from start and end tags (written much like HTML tags) or can be a single tag with a terminator (like the <br/> tag in XHTML). Unlike HTML, there is no allowance for elements that have only a starting tag and are not self-terminated. Elements with start and end tags are considered non-empty (meaning they can contain content) whereas empty tags do not contain content (empty elements sometimes signify something on their own, like the <br/> tag in XHTML), like this:
<client>John Doe</client>
<break />
6. XML attributes can be written inside non-empty XML elements, and must have a name and a value; the value must be enclosed in delimiters, such as double quotes. No attribute name may appear more than once inside a single element.
7. XML elements must be properly nested, meaning any given element's start and end tags must be outside the start and end tags of elements inside it, and inside the start and end tags of elements outside it. Here's an example:
//not good
<parent><child></parent></child>
//good
<parent>child></child></parent>
8. CDATA sections (sections of data that make up scripts, for example) must be delimited by [CDATA and ]].
9. XML elements may not be named using "xml," "XML," or any upper- or lower-case combination of these characters in this sequence. Names must start with a letter, an underscore, or the colon, but in practice, you should never use colons. Names are case-sensitive. Numbers, the hyphen, and the period are valid characters to use after the first character.
10. Comments are delimited like HTML comments (<!-- and -->).
Using XML Elements and Attributes
XML elements and their attributes form the hierarchical structure of an XML document, and contain its content (the content of an XML document is its data). Although there can be only one root element, the root element may contain multiple elements of the same name (often referred to as child elements), and child elements can also contain multiple elements of the same name. So you might have a document like this:
<clients>
  <client ID="1">Joe</client>
  <orders>
    <order ID=°1">ProductA</order>
    <order ID="2">ProductB</order>
  </orders>
  <client ID="2">Jim</client>
  <orders>
    <order ID="1">ProductA</order>
    <order ID="2">ProductB</order>
  </orders>
<client>
</clients>
As you can see, part of the content of this document is put into elements (the name of each client is between the beginning and ending client elements) and part of the content is the value of attributes (the ID numbers of the clients and their orders are specified in the ID attributes of the client and order elements).
There is some controversy about when to use an attribute and when to use an element for containing data. Although there is no hard and fast rule, a good rule of thumb is to use an element when there is the possibility that you might need to specify the same thing more than once (for example, although you may only have one order at present, you can expect there will be more orders for a single client), and when you're sure the data will only occur once (for example, each client may have one, and only one, ID number), use an attribute.
Valid XML Documents: DTDs and XML Schemas
DTDs are special documents written in Extended Backus Naur Format (EBNF), which is not an XML language and therefore isn't so easy to parse. DTDs specify constraints on XML elements, attributes, content, and more. XML Schemas serve the same purpose, but are written in the XML Schema language, and can easily be parsed and processed using the same application that was used to read the XML document. XML Schemas are also much more capable than DTDs for defining detail in your elements and attributes (such as data type, range of values, and so forth) and are therefore preferred over DTDs by many XML authors. Both can be referenced in the XML document before the first element, and both have other means of being included within an XML document (you'll see how in just a bit).
If a DTD or schema is present or is referenced in an XML document, some or all of the elements and content of the document may be validated against the DTD or schema. The primary added value of a validated XML document is that the processing application "knows" something about the content of the document, such as how many times a given element may appear within another element, what values an attribute may assume, and so on.
As mentioned previously, anyone can author an XML document, and anyone can define a DTD or XML Schema against which to validate an XML document. This being the case, the World Wide Web Consortium has made the next version of HTML into XHTML, using the existing DTD for HTML (yes, HTML has always been based on a formal DTD), with very small modifications, as the definition of all the elements, attributes, and other components allowed in an XHTML document. The main difference between HTML and XHTML is the fact that an XHTML document must conform to the XML specification, whereas HTML documents are not required to do so.
Complicating things further, browsers will display HTML documents even if they are not well-formed HTML, let alone well-formed XHTML. But browsers will display XHTML documents as XML if the file extension is .xml, and as regular Web pages if the file extension is .htm or .html. Of course, to display an XHTML document as a regular Web page, the reference to the XHTML DTD must be valid, and the document must be well formed. In the next few sections you'll examine a portion of the DTD for XHTML, show how the DTD can be referenced in an XHTML document, and see how it displays in the browser when the file extension is .xml and when it is .htm.
The DTD for XHTML
There are three DTDs for XHTMl,. They're located at:
1. www.w3.org/TR/xhtmll/DTD/xhtmll-strict.dtd
2. www.w3.org/TR/xhtmll/DTD/xhtmll-transitional.dtd
3. www.w3.org/TR/xhtmll/DTD/xhtmll-frameset.dtd
These three DTDs complement their HTML counterparts, and are, in fact, quite similar. If you enter these links in your browser, you'll actually see the DTD in plain text.
Here is some code showing how a DTD (the strict version) is written for the XHTML language, but just for the image (IMG) element. The DTD for HTML is shared with XHTML (with very small differences to ensure that XHTML documents conform to the XML spec), although only XHTML actually conforms to the XML specification. What this means is that you'll find all the HTML elements and attributes present in XHTML, but if you use them in an XHTML document you must conform strictly to the rules imposed by XML (such as proper nesting and termination of elements).
<! --
    To avoid accessibility problems for people who aren't
    able to see the image, you should provide a text
    description using the alt and longdesc attributes.
    In addition, avoid the use of server-side image maps.
    Note that in this DTD there is no name attribute. That
    is only available in the transitional and frameset DTD.
-->
<!ELEMENT img EMPTY>
<!ATTLIST img
  %attrs;
  src         %URI;          #REQUIRED
  alt         %Text;         #REQUIRED
  longdesc    %URI;          #IMPLIED
  height      %Length;       #IMPLIED
  width       %Length;       #IMPLIED
  usemap      %URI;          #IMPLIED
  ismap       (ismap)        #IMPLIED
  >
<!-- usemap points to a map element which may be in this document
  or an external document, although the latter is not widely supported -->
In this example (keeping in mind that it is written in EBNF) you can see that on the first line following the comment, there is a callout for ELEMENT, and the name of the element is img, and it is EMPTY (contains no content between the non-existent beginning and ending tags). However, even though it is formally empty, its src attributes does contain data in the form of a URI (for our purposes the same as a URL) that specifies where the image file can be found.
Following the ELEMENT callout is a list of attributes that may be included with the img tag in an XHTML document. Those of you familiar with HTML and XHTML no doubt recognize the src attribute as the URL (or URI) that specifies the location of the image file and is REQUIRED.
So this portion of the DTD for XHTML documents specifies that it is permissible to include the IMG element in such documents. If this DTD is referenced in an XHTML document (the entire DTD, not just this portion), and the document includes the img element with an appropriate src attribute, then the document could be said to be valid (at least as far as the img element is concerned). However, if you tried to include an element name imge or image or images, a validating XML parser would produce an error, because according to the DTD such elements are not defined, and therefore the document is not valid. And note that although the img element does not need to be terminated in an HTML document, it must be properly terminated in an XHTML document.
Referencing DTDs and XML Schemas
To validate an XML document, there needs to be a either a reference to an external file containing the DTD or XML Schema, or the DTD or schema must be included with the XML document. Referencing XML Schemas is slightly more complex, so first take a look at how DTDs are referenced.
To reference an external DTD, a DOCTYPE declaration is used. The DOCTYPE declaration provides some information regarding how to locate the DTD and what its name is. For example, this line shows how a DTD is referenced using a URL:
<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtmll/DTD/xhtmll-strict.dtd">
The html after the <!DOCTYPE in the first line signifies that the root element is named html, and is required. If the DTD is an external document, it can be located anywhere, and identified by any URI (Uniform Resource Locator) that the application reading it understands and has access to, not just a URL over the Internet.
A big limitation of DTDs is that only one external DTD can be referenced in a document, and there is no DTD support for adding prefixes to element or attribute names anyway, although you can call out a namespace for a document that references a DTD.
A namespace is a definition of the source of names for elements and attributes (so far as XML is concerned). Designating the source of an element or attribute name means that you can use the same name to represent different things within a single document. A namespace can be identified within an XML document by referencing it via a special reserved XML keyword, the xmlns (XML Namespace) attribute, like this:
xmlns = "http://www.w3.org/1999/xhtml"
This URL is the official namespace of XHTML. The element and attributes names for this namespace are defined within the XHTML DTD, and the xmlns attribute serves only to define the namespace for the root element of the XHTML document (the root element is html). Defining the namespace for the root element in this manner also serves to define the namespace for all the rest of the elements and attributes in the document.
External XML Schemas
You can reference an XML Schema by referencing the location of the XML Schema document with a URI. Typically, this is written into the XML document by putting the xmlns attribute as part of the root element and setting it to the location of the schema so that the namespace is defined and the parser also knows where to look for the XML Schema.
To reference an XML Schema, an xmlns attribute may be added to the root element of the document, as shown here:
<?xml version="1.0" encoding="UTF-8"?>
<customer xmlns="http://www.example.com/customer.xsd" cust_id="1">
   <cust_name>John Doe</cust_name>
</customer>
Of course, this implies that you have already written the XML Schema document that defines the customer (and its cust_id attribute) and cust_name elements, named this document customer.xsd, and placed the document in the root folder of the http://www.example.com Web site. Although this book won't get into the details of writing an XML Schema, suffice it to say that XML Schema is a much richer language for specifying elements, attributes, and other components of an XML conforming language, and because it is written according to the guidelines of the XML specification, it is easier to process as well.
For documents that can be validated against an XML Schema, any number of namespaces can be declared using the xmlns attribute, each associated with an external XML Schema. For example, you might have one XML Schema for which the element farm means an area used for agricultural purposes, and another for which the element farm means a number of server computers all performing the same task. If you want to create an XML document that uses both elements (for example, describing how the farm manages its IT) you need some way to distinguish between the two.
Because both XML Schemas can be referenced in a single document, you can use the xmlns attribute to identify them by URL, and you can create prefixes that can precede any element names from either one. For example, you might use code such as the following to do this:
xmlns:agri = "http://www.example.com/agricultural.xml"
xmlns:serv = "http://www.example.com/server.xml"
Thereafter, any element preceded by agri: would be defined by the agricultural schema, and any element preceded by serv: would be defined by the server schema. This prevents confusion about the meaning of these elements.
Writing an XML Document with XHTML
For an XHTML document, there is also a requirement to specify a namespace. Although DTDs don't lend themselves to multiple references, you can still specify one namespace, and the XHTML spec makes this a requirement.
To write an XHTML document, start by indicating the version of XML you're using, provide a DOCTYPE declaration referencing the XHTML DTD, and then insert the xmlns attribute indicating the namespace of the document (inserting the xmlns attribute into the root element makes the root element defined by the DTD, and by default all of its child elements as well). Here's an example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtmll/DTD/xhtmll-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <HEAD>
    <title>An xhtml example</title>
  </head>
  <body>
    <p>This document is an example of an xhtml document.
       It can contain images <img src="http://www.example.com/images/image.gif
" /> as well as links <a href="http://example.com/">example.com</a> and any
other html elements.
    </p>
  </body>
</html>
Of course, this document looks very much like an ordinary HTML document, and will be displayed just like any Web page written in HTML in most browsers if you save it with the extension .htm or .html, but it conforms to the XML specification, and is not only well formed but also valid. (If you save it with the extension .xml, it will be displayed in XML format by Internet Explorer.)
Web Services
Another example of the power of XML is found in the design of Web services. Web Service is the name given to a unit of programmed logic that is available across the Internet, and the name XML Web Service is applied when the Web Service is accessible via XML languages for accessing such services.
So how do Web services work, and why are they valuable? Consider how you define and then call a function in a PHP program. First you write the function, giving it a name and parameter list, and adding all the processing logic required for it to do its job. Then, you can call it and expect it to perform just by naming its name and passing the appropriate parameters.
That's great, but suppose you could do the same thing across the Internet, accessing predefined functions (and thereby other data stores, including databases) by simply identifying them by their URL and name, and passing the appropriate parameters. This would mean you could build an application that theoretically is distributed (meaning it doesn't matter where the programming logic is coded or the data is stored) anywhere across the Internet.
And that is exactly what you can do with Web services. Calling a function that someone else coded, using someone else's database, or even multiple functions and multiple databases, anywhere across the Internet is what Web services are for. But you need a little bit of specialized help to access Web services, because they may run from any platform, using any language and any database, and there are some translation issues. That's where SOAP and WSDL come in:
Simple Object Access Protocol (SOAP) is an XML language that provides for defining an envelope, body, and other parts to send and receive Web Service calls. You insert your Web services calls in a SOAP envelope.
Web Service Description Language (WSDL) is another XML language that is used to define the name, type, and arguments associated with a call to a Web Service.
Although Web services are one of the most important uses for XML, and there are quite a few Web services-related applications available that make it easy to develop both PHP Web services and the client code that calls them, the subject is beyond the scope of this book. Please see Wrox's Professional PHP Development for a great deal of interesting information on this topic.
PHP and XML
There have been PHP functions available for connecting to, retrieving data from, and manipulating data in databases almost since PHP was first written. More recently, as the XML specification gained prominence as a means of exchanging and storing data, PHP has added functions that make it easier to work with XML documents.
Because of the nature and format of XML documents, much of the work on adding XML functions to PHP has centered on properly parsing XML documents, and manipulating them while remaining in conformance with the XML specification. To effectively parse and manipulate XML documents, these functions need to be able to get at and work with the names and the values of elements and attributes, as well as the many other types of XML document components.
Next, you'll explore XML functions that have been built into PHP over the years, including the most recent additions such as the simpleXML extension. XML functions that were available in PHP4 are discussed first, followed by the simp1eXML extension and the Document Object Model (DOM) extension, and finally the PHP5 extensions.
PHP4 XML Functions
PHP5 maintains backward compatibility with many PHP4 features, so we'll start off this section discussing some of the PHP4 XML functions before moving on to the new XML functions available in PHP5. The XML parser functions found in PHP4 implement James Clark's expat (a parser for XML 1.0 written in the C language). Expat parsers can tell you if an XML document is well-formed, but do not validate XML documents, so parsers created using these functions must receive well-formed XML documents or an error will be generated (but fortunately you can find out where the error occurred).
Here are a few of the most common functions:
1. xml_parser_create: This is the basic function to create an xml parser, which can then be used with the other XML functions for reading and writing data, getting errors, and a variety of other useful tasks. Use xml_parser_free to free up the resource when done.
2. xml_parse_into_struct: Parse XML data into an array structure. You can use this function to take the contents of a well-formed XML file, turn it into a PHP array, and then work with the contents of the array.
3. xml_get_error_code: Gets XML parser error code (defined as constants, such as XML_ERROR_NONE and XML_ERROR_SYNTAX). Use xml_error_string to get the textual description of the error based on the error code.
4. xml_set_option: There are several options that can be set for an xml parser: XML_OPTION_CASE_FOLDING and XML_OPTION_TARGET_ENCODING. The case folding option is enabled by default, and means that element names will be made uppercase unless it is disabled. Target encoding enables you to specify which encoding is used for the target; the default is the encoding used by xml_parser_create, which in turn is ISO-8859-1. Use xml_parser_get_option to find out what options are currently set for an xml parser.
There are also a number of other xml parser-related functions for setting up handlers of various types (for common xml components, such as processing instructions, character data, and so on).
XML Parsers
Now you've seen how to create an XML document using nothing more than ordinary PHP string functions, but it should be clear that these functions provide no easy way to manipulate your XML document. You could write some regular expressions and special functions of your own to make the job easier, but of course, the authors of PHP realized that and came up with some for you.
The next example demonstrates the use of the xml_parser_create() and xml_parse_into_struct() functions. It also uses the file_get_contents() function to retrieve the contents of a file and turn it into a string.
The Document Object Model
The Document Object Model (DOM) is simply a hierarchical model for interacting with documents. It enables access to the parts of a document by addressing them directly via their lineage in the document.
You can model XML documents with the DOM because there is a specific relationship among the parts of any XML document. As you've seen, there can be only one root element, and it is the parent of all the rest of the elements in an XML document. This means that the root element is at the bottom (hence the name) of the hierarchy or tree from which the rest of the elements spring. Therefore, the relationship between the components of an XML document can be inferred programmatically (and that's just what the PHP DOM extension functions do; we'll talk about this more in just a bit).
For any given element, elements inside it are its children (or child elements), whereas the element it is inside of is its parent (or parent element). So you can think of elements as parents and children, or you can think of elements as being like a tree, with root, branches, and leaves. Within the DOM, both ways of thinking about XML document components are valid. Elements and other components of an XML document are considered to be nodes within the DOM.
The DOM Extension
The DOM extension follows the Worldwide Web Consortium's DOM Level 2 recommendation closely. This recommendation states, "the DOM is an application programming interface (API) for valid HTML and well-formed XML documents." With the DOM, any well-formed XML document can be programmatically built, navigated, and nodes added, edited, and deleted.
Configuring PHP with with-dom=dom_dir makes this extension available. For PHP on Windows, copy libxml2.dll or iconv.dll to the System32 folder. So please see the documentation for further instructions.
Support for any DOM extension functions is still experimental (which is why there are no examples in this book), but you can refer to Professional PHP Programming (Wrox) for a more in-depth explanation of the PHP DOM extension.
Using the PHP DOM Extension Functions
PHP's DOM functions include domxml_new_doc() to create new XML documents, domxml_open_file() to open an XML document file as a DOM object, and domxml_open_mem() to create a DOM object from an XML document already in memory. These functions return a DOM object, not a string or other common data type. When you use the PHP DOM extension functions, you typically first create a DOMDocument object and then manipulate it using functions that are part of the class of that object. There are a number of object classes available, and by starting with a DOMDocument object, you can then examine it and retrieve new objects reflecting XML document components such as elements, attributes, and so on.
For example, to open the XML file created earlier, you might use code like this to create a DOM object named $my_dom_obj:
$my_dom_obj = domxml_open_file("first_xml.xml");
Then, to create a variable representing the root element found within the document, you might use code like this:
$the_root_element = $dom->document_element();
You can also manipulate the DOM object you've created with quite a few other PHP DOM functions, including create_element(), which creates a new element, and append_child(), which appends an element as a child of an existing element.
If you want to find a particular element in an XML document that has been arranged by the DOM, for example, you could use the get_element_by_tagname function of the DOMElement object to find that element as a node in the DOM, and it would be found as a child node within the hierarchy of root, branch, leaf in the DOM.
There are many other classes of objects available in the PHP DOM extension, as well as quite a few functions for using them to create and manipulate XML documents. Their object-oriented nature makes them more suitable for a book such as Professional PHP Programming, to which you can refer for further information about the DOM and PHP.
PHP5 XML Functions
PHP4 included some basic XML parsing features, but PHP5 contains many new features and functions based on libxml2. Support for simpleXML in PHP5 is automatically turned on; there is no need to include any additional extensions. PHP5 also supports XML document validation when referencing a DTD or an XML Schema.
The SimpleXML Extension
The simpleXML extension is also experimental, but has been completely overhauled in PHP5. This extension is installed by default, with -enable-simplexml. It includes functions for working with XML documents that make common operations fairly easy, such as the ability to take a string and convert it to an XMLformatted document and display it. The primary advantage is that the XML document becomes an object that can be processed like other objects in PHP, with elements, attributes, and their data accessible using normal object operations.
The functions include:
1. simplexml_load_file: takes a file path as an argument, and if the contents of the file are well-formed XML will load the contents as an object.
2. simplexml_load_string: takes a string as an argument, and the string should be well-formed XML. Converts the string to an object.
3. simplexml_import_dom: takes a node from a DOM document and turns it into a simplexml node.
4. simplexml_element->asXML: returns a well-formed XML string from a simpleXML object.
5. simplexml_element->attributes: provides the attributes and values defines within a well-formed string of XML.
6. simplexml_element->children: method provides the child elements of an element.
7. simplexml_element->xpath: method runs an XPath query on a simpleXML node.
Using simplexml_load_string()
You can write out (or get as a result from an expression or function) a string that is formatted as well-formed XML within a PHP program. Once you have the string, you can turn it into a simpleXML object using the simplexml_load_string() function. To turn a string named $string into a simpleXML object based on a string of XML inside your PHP program, just do something like this:
<?php
$string = <<<XML
<?xml version='1.0'?>
<root_element>
  <child01>My first element</child01>
  <child02>My second element</child02>
</root_element>
XML;
$xml_string = simplexml_load_string($string);
?>
Try reading an XML string with the simplexml_load_string function first. All you need to do is supply a string to the function. And once you have your simpleXML object, you can use the object's asXML() function to display it as properly formatted XML.
Here's an example of using the simplexml_load_string and simplexml_element's asXML method. Open your text editor and create a. php document containing the following code:
<?php
//The asXML method formats the parent object's data in XML version 1.0.
//create an XML formatted string
$my_xml_string = <<<XML
<a>
  <b>
    <c>text content</c>
    <c>more text content</c>
  </b>
  <d>
    <c>even more text content</c>
  </d>
</a>
XML;
//load the string into an object
$xml_object = simplexml_load_string ($my_xml_string);
//display the contents of the xml object
echo $xml_object->asXML();
?>
Save this document as create_xml_doc.php, and then open it in your browser.
Using simplxml_load_file()
Alternatively, if your XML happens to be in a file, you can turn the contents of the file into a simpleXML object using the simplexml_load_file() function. Whichever way you come up with your simpleXML object, it can be manipulated in the same way with the other simpleXML functions.
To simplify things, an external file is used to contain the XML strings, and read them with the simplexml_load_file() function. For example, if you want to read an XML file such as the one you created in the very beginning of the chapter, you can use the simplexml_load_file() function. Here's the XML file again (save it as php_programs.xml):
<?xml version="1.0" ?>
<php_programs>
   <program name="cart">
      <price>100</price>
   </program>
   <program name="survey">
      <price>500</price>
   </program>
</php_programs>
To read the names and values of the elements and attributes in this XML document, use the simpleXML function for loading a file: simplexml_load_file(). Just begin a PHP document (name this one simplexml_01.php), and then set a variable to the result of the simplexml_load_file() function, like this:
<?php
$php_programs = simplexml load_file( 'php programs.xml');
The variable contains an object that can be used much like an ordinary array. To get the names and values from the variable, use the foreach statement to return keys and values, just like with an ordinary array:
foreach. ($php_programs->program as $program_key => $program_val) {
echo "The root element. <B>php_programs</B> contains an element named
<B>$program_ key</B><BR>";
But we can't directly echo out the contents of the $program_val variable; it contains an array-like object as well, and we must therefore use a foreach statement on it as well, both for its child elements and its attributes, like this:
foreach($program_val->children() as $child_of_a program key =>
$child of_program_val)
{
   if ($child__of_program_key == "price") {
     foreach($program_val->attributes() as $att => $val.) {
        if ($att == "name") {
          foreach($program_val->price as $the_price) {
             echo "This <B>$program_key</B>
             element has an attribute named <B>$att</B> and is named
             <B>$val</B>.<BR>";
             echo "This <B>$program_key</B>
             element has a child element named <B>$child_of_program_key</B>
             and the value of <B>$child_of_program_key</B> is
             <B>$the_price</B>.<BR>";
             echo "Therefore, we can say that the <B>$child_of_program_key</B>
             of the <B>$val</B> <B>$program_key</B> is
            <B>\$$the_price</B>.<BR><BR>";
          }
        }
      }
    }
  }
}
?>
The result in your browser.
Summary
In this post you explored the basics of XML, including the rules by which an XML document is determined to be well formed and valid. You examined Document Type Definitions (DTDs) written in Extended Backus Naur Format (EBNF) and how they can be referenced to validate an XML document. Namespace support in XML, and how multiple XML Schema documents can be referenced within an XML document were also topics of this chapter,
You checked out the Document Object Model, the hierarchical model of XML document structure, and reviewed some PHP functions related to the DOM. You also looked at some of the older XML Parser functions in PHP4, and the newer simpleXML extension as it appears in PHP5.
You created examples to make and read XML documents, work with XML documents using the simpleXML functions, and go from simpleXML to DOM functions. Although you haven't explored all of the object-oriented functions found in simpleXML and the DOM extension, you've found ways to effectively create and manipulate XML documents and many of their components.















Post a Comment

Previous Post Next Post