Reading & Writing XML using the PHP DOM library

Reading XML using the DOM library

The easiest way to read a well-formed XML file is to use the Document Object Model (DOM) library compiled into some installations of PHP. The DOM library reads the entire XML document into memory and represents it as a tree of nodes, as illustrated in Figure 1.
Figure 1. XML DOM tree for the books XML
XML DOM tree for the books XML

The books node at the top of the tree has two child book tags. Within each book, there are authorpublisher, and titlenodes. The authorpublisher, and title nodes each have child text nodes that contain the text.

The code to read the books XML file and display the contents using the DOM is shown in Listing 2.
Listing 2. Reading books XML with the DOM

<!--?php
  $doc = new DOMDocument();
  $doc->load( 'books.xml' );

  $books = $doc->getElementsByTagName( "book" );
  foreach( $books as $book )
  {
  $authors = $book->getElementsByTagName( "author" );
  $author = $authors->item(0)->nodeValue;

  $publishers = $book->getElementsByTagName( "publisher" );
  $publisher = $publishers->item(0)->nodeValue;

  $titles = $book->getElementsByTagName( "title" );
  $title = $titles->item(0)->nodeValue;

  echo "$title - $author - $publisher\n";
  }
  ?>

The script starts by creating a new DOMdocument object and loading the books XML into that object using the load method. After that, the script uses the getElementsByName method to get a list of all of the elements with the given name.

Within the loop of the book nodes, the script uses the getElementsByName method to get the nodeValue for the author,publisher, and title tags. The nodeValue is the text within the node. The script then displays those values.

You can run the PHP script on the command line like this:

% php e1.php
PHP Hacks - Jack Herrington - O'Reilly
Podcasting Hacks - Jack Herrington - O'Reilly
%

As you can see, a line is printed for each book block. That’s a good start.

Writing XML with the DOM

Reading XML is only one part of the equation. What about writing it? The best way to write XML is to use the DOM. Listing 5 shows how the DOM builds the books XML file.
Listing 5. Writing books XML with the DOM

<!--?php
  $books = array();
  $books [] = array(
  'title' => 'PHP Hacks',
  'author' => 'Jack Herrington',
  'publisher' => "O'Reilly"
  );
  $books [] = array(
  'title' => 'Podcasting Hacks',
  'author' => 'Jack Herrington',
  'publisher' => "O'Reilly"
  );

  $doc = new DOMDocument();
  $doc->formatOutput = true;

  $r = $doc->createElement( "books" );
  $doc->appendChild( $r );

  foreach( $books as $book )
  {
  $b = $doc->createElement( "book" );

  $author = $doc->createElement( "author" );
  $author->appendChild(
  $doc->createTextNode( $book['author'] )
  );
  $b->appendChild( $author );

  $title = $doc->createElement( "title" );
  $title->appendChild(
  $doc->createTextNode( $book['title'] )
  );
  $b->appendChild( $title );

  $publisher = $doc->createElement( "publisher" );
  $publisher->appendChild(
  $doc->createTextNode( $book['publisher'] )
  );
  $b->appendChild( $publisher );

  $r->appendChild( $b );
  }

  echo $doc->saveXML();
  ?>

At the top of the script, the books array is loaded with some example books. That data could come from the user or from a database.

After the example books are loaded, the script creates a new DOMDocument and adds the root books node to it. Then the script creates an element for the author, title, and publisher for each book and adds a text node to each of those nodes. The final step for each book node is to re-attach it to the root books node.

The end of the script dumps the XML to the console using the saveXML method. (You can also use the save method to create a file from the XML.) The output of the script is shown in Listing 6.
Listing 6. Output from the DOM build script

  % php e4.php
  <?xml version="1.0"?>
  <books>
  <book>
  <author>Jack Herrington</author>
  <title>PHP Hacks</title>
  <publisher>O'Reilly</publisher>
  </book>
  <book>
  <author>Jack Herrington</author>
  <title>Podcasting Hacks</title>
  <publisher>O'Reilly</publisher>
  </book>
  </books>
  %

The real value of using the DOM is that the XML it creates is always well formed.

[Ref: http://www.ibm.com/developerworks/opensource/library/os-xmldomphp/index.html]

REST Web Services calls with C#

It’s really easy to call REST based web services from C#.Net. Let’s see how to do it. We’ll be calling Yahoo Web Services as an example here.

Make REST Calls With C#

The .NET Framework provides classes for performing HTTP requests. This HOWTO describes how to perform both GET and POST requests.

Overview

The System.Net namespace contains the HttpWebRequest and HttpWebResponse classes which fetch data from web servers and HTTP based web services. Often you will also want to add a reference to System.Web which will give you access to the HttpUtility class that provides methods to HTML and URL encode and decode text strings.

Yahoo! Web Services return XML data. While some web services can also return the data in other formats, such as JSON and Serialized PHP, it is easiest to utilize XML since the .NET Framework has extensive support for reading and manipulating data in this format.

Simple GET Requests

The following example retrieves a web page and prints out the source.

C# GET SAMPLE 1


using System;
using System.IO;
using System.Net;
using System.Text;
// Create the web request
HttpWebRequest request = WebRequest.Create(“http://developer.yahoo.com/”) as HttpWebRequest;
// Get response
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
// Get the response stream
StreamReader reader = new StreamReader(response.GetResponseStream());
// Console application output
Console.WriteLine(reader.ReadToEnd());
}

Simple POST Requests Some APIs require you to make POST requests. To accomplish this we change the request method and content type and then write the data into a stream that is sent with the request. C# POST SAMPLE 1

// We use the HttpUtility class from the System.Web namespace
using System.Web;
Uri address = new Uri("http://api.search.yahoo.com/ContentAnalysisService/V1/termExtraction");
// Create the web request
HttpWebRequest request = WebRequest.Create(address) as HttpWebRequest;
// Set type to POST
request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
// Create the data we want to send
string appId = "YahooDemo";
string context = "Italian sculptors and painters of the renaissance"
+ "favored the Virgin Mary for inspiration";
string query = "madonna";
StringBuilder data = new StringBuilder();
data.Append("appid=" + HttpUtility.UrlEncode(appId));
data.Append("&context=" + HttpUtility.UrlEncode(context));
data.Append("&query=" + HttpUtility.UrlEncode(query));
// Create a byte array of the data we want to send
byte[] byteData = UTF8Encoding.UTF8.GetBytes(data.ToString());
// Set the content length in the request headers
request.ContentLength = byteData.Length;
// Write data
using (Stream postStream = request.GetRequestStream())
{
    postStream.Write(byteData, 0, byteData.Length);
}
// Get response
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
    // Get the response stream
    StreamReader reader = new StreamReader(response.GetResponseStream());
    // Console application output
    Console.WriteLine(reader.ReadToEnd());
}

HTTP Authenticated requests

The del.icio.us API requires you to make authenticated requests, passing your del.icio.us username and password using HTTP authentication. This is easily accomplished by adding an instance ofNetworkCredentials to the request.

C# HTTP AUTHENTICATION

// Create the web request
HttpWebRequest request
= WebRequest.Create("https://api.del.icio.us/v1/posts/recent") as HttpWebRequest;
// Add authentication to request
request.Credentials = new NetworkCredential("username", "password");
// Get response
using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
{
    // Get the response stream
    StreamReader reader = new StreamReader(response.GetResponseStream());
    // Console application output
    Console.WriteLine(reader.ReadToEnd());
}

Error Handling

Yahoo! offers many REST based web services but they don’t all use the same error handling. Some web services return status code 200 (OK) and a detailed error message in the returned XML data while others return a standard HTTP status code to indicate an error. Please read the documentation for the web services you are using to see what type of error response you should expect. Remember that HTTP Authentication is different from the Yahoo!Browser-Based Authentication.

Calling HttpRequest.GetResponse() will raise an exception if the server does not return the status code 200 (OK), the request times out or there is a network error. Redirects are, however, handled automatically.

Here is a more full featured sample method that prints the contents of a web page and has basic error handling for HTTP error codes.

C# GET SAMPLE 2

public static void PrintSource(Uri address)
{
    HttpWebRequest request;
    HttpWebResponse response = null;
    StreamReader reader;
    StringBuilder sbSource;
    if (address == null) { throw new ArgumentNullException("address"); }
    try
    {
        // Create and initialize the web request
        request = WebRequest.Create(address) as HttpWebRequest;
        request.UserAgent = ".NET Sample";
        request.KeepAlive = false;
        // Set timeout to 15 seconds
        request.Timeout = 15 * 1000;
        // Get response
        response = request.GetResponse() as HttpWebResponse;
        if (request.HaveResponse == true && response != null)
        {
            // Get the response stream
            reader = new StreamReader(response.GetResponseStream());
            // Read it into a StringBuilder
            sbSource = new StringBuilder(reader.ReadToEnd());
            // Console application output
            Console.WriteLine(sbSource.ToString());
        }
    }
    catch (WebException wex)
    {
        // This exception will be raised if the server didn't return 200 - OK
        // Try to retrieve more information about the network error
        if (wex.Response != null)
        {
            using (HttpWebResponse errorResponse = (HttpWebResponse)wex.Response)
            {
                Console.WriteLine(
                "The server returned '{0}' with the status code {1} ({2:d}).",
                errorResponse.StatusDescription, errorResponse.StatusCode,
                errorResponse.StatusCode);
            }
        }
    }
    finally
    {
        if (response != null) { response.Close(); }
    }
}

Further reading

Related information on the web.

XML with C#

There is excellent support for XML in Microsoft .Net framework, and there is a very good and easy article on it at Yahoo Developer Network.

I am pasting its content here for convenience. Enjoy 🙂

Using Returned XML with C#

Once you have retrieved data from a web service you will need to do something with it. This HOWTO describes the various built-in methods .NET provides to use XML returned by a web service.

Overview

The .NET Framework provides excellent support for XML. Combined with the data binding support of WinForms and ASP.NET applications you have an easy and powerful set of tools. ASP.NET 2.0 takes data binding another step further by providing the DataSource control which lets you declaratively provide data access to data-bound UI controls.

Returned Data to a String

The simplest way to view the returned data is to get the response stream and put it into a string. This is especially handy for debugging. The following code gets a web page and returns the contents as a string.

C# STRING SAMPLE

public class StringGet { public static string GetPageAsString(Uri address) { string result = “”; // Create the web request HttpWebRequest request = WebRequest.Create(address) as HttpWebRequest; // Get response using (HttpWebResponse response = request.GetResponse() as HttpWebResponse) { // Get the response stream StreamReader reader = new StreamReader(response.GetResponseStream()); // Read the whole contents and return as a string result = reader.ReadToEnd(); } return result; } }

USING XMLREADER

XmlReader provides fast forward-only access to XML data. It also allows you to read data as simple-typed values rather than strings. XmlReader can load an XML document without having to use HttpRequest, though you won’t have the same amount of control over the request. If you use HttpRequest, you can just pass the stream returned by the GetResponseStream() method to XmlReader. Fast write-only functions are provided byXmlTextWriter.

With .NET 2.0 you should create XmlReader instances using the System.Xml.XmlReader.Create method. For the sake of compatibility and clarity the next sample uses the .NET 1.1 creation method.

C# XMLREADER SAMPLE

using System.Xml;
// Retrieve XML document
XmlTextReader reader = new XmlTextReader("http://xml.weather.yahoo.com/forecastrss?p=94704");
// Skip non-significant whitespace
reader.WhitespaceHandling = WhitespaceHandling.Significant;
// Read nodes one at a time
while (reader.Read())
{
	// Print out info on node
	Console.WriteLine("{0}: {1}", reader.NodeType.ToString(), reader.Name);
}

USING XMLDOCUMENT

XmlDocument gives more flexibility and is a good choice if you need to navigate or modify the data via the DOM. It also works as a source for the XslTransform class allowing you to perform XSL transformations.

C# XMLDOCUMENT SAMPLE

// Create a new XmlDocument
XmlDocument doc = new XmlDocument();
// Load data
doc.Load("http://xml.weather.yahoo.com/forecastrss?p=94704");
// Set up namespace manager for XPath
XmlNamespaceManager ns = new XmlNamespaceManager(doc.NameTable);
ns.AddNamespace("yweather", "http://xml.weather.yahoo.com/ns/rss/1.0");
// Get forecast with XPath
XmlNodeList nodes = doc.SelectNodes("/rss/channel/item/yweather:forecast", ns);
// You can also get elements based on their tag name and namespace,
// though this isn't recommended
//XmlNodeList nodes = doc.GetElementsByTagName("forecast",
//                          "http://xml.weather.yahoo.com/ns/rss/1.0");
foreach(XmlNode node in nodes)
{
	Console.WriteLine("{0}: {1}, {2}F - {3}F",
	node.Attributes["day"].InnerText,
	node.Attributes["text"].InnerText,
	node.Attributes["low"].InnerText,
	node.Attributes["high"].InnerText);
}

Using XPathNavigator/XPathDocument

XPathDocument provides fast, read-only access to the contents of an XML document using XPath. Its usage is similar to using XPath with XmlDocument.

C# XPATHDOCUMENT SAMPLE

using System.Xml.XPath;
// Create a new XmlDocument
XPathDocument doc = new XPathDocument("http://xml.weather.yahoo.com/forecastrss?p=94704");
// Create navigator
XPathNavigator navigator = doc.CreateNavigator();
// Set up namespace manager for XPath
XmlNamespaceManager ns = new XmlNamespaceManager(navigator.NameTable);
ns.AddNamespace("yweather", "http://xml.weather.yahoo.com/ns/rss/1.0");
// Get forecast with XPath
XPathNodeIterator nodes = navigator.Select("/rss/channel/item/yweather:forecast", ns);
while(nodes.MoveNext())
{
	XPathNavigator node = nodes.Current;
	Console.WriteLine("{0}: {1}, {2}F - {3}F",
	node.GetAttribute("day", ns.DefaultNamespace),
	node.GetAttribute("text", ns.DefaultNamespace),
	node.GetAttribute("low", ns.DefaultNamespace),
	node.GetAttribute("high", ns.DefaultNamespace));
}

Using a DataSet

Using a DataSet from the System.Data namespace lets you bind the returned data to controls and also access hierarchical data easily. A dataset can infer the structure automatically from XML, create corresponding tables and relationships between them and populate the tables just by calling ReadXml().

C# DATASET SAMPLE

using System.Data;
public void RunSample()
{
	// Create the web request
	HttpWebRequest request
	= WebRequest.Create("http://xml.weather.yahoo.com/forecastrss?p=94704") as HttpWebRequest;
	// Get response
	using (HttpWebResponse response = request.GetResponse() as HttpWebResponse)
	{
		// Load data into a dataset
		DataSet dsWeather = new DataSet();
		dsWeather.ReadXml(response.GetResponseStream());
		// Print dataset information
		PrintDataSet(dsWeather);
	}
}
public static void PrintDataSet(DataSet ds)
{
	// Print out all tables and their columns
	foreach (DataTable table in ds.Tables)
	{
		Console.WriteLine("TABLE '{0}'", table.TableName);
		Console.WriteLine("Total # of rows: {0}", table.Rows.Count);
		Console.WriteLine("---------------------------------------------------------------");
		foreach (DataColumn column in table.Columns)
		{
			Console.WriteLine("- {0} ({1})", column.ColumnName, column.DataType.ToString());
		}  // foreach column
		Console.WriteLine(System.Environment.NewLine);
	}  // foreach table
	// Print out table relations
	foreach (DataRelation relation in ds.Relations)
	{
		Console.WriteLine("RELATION: {0}", relation.RelationName);
		Console.WriteLine("---------------------------------------------------------------");
		Console.WriteLine("Parent: {0}", relation.ParentTable.TableName);
		Console.WriteLine("Child: {0}", relation.ChildTable.TableName);
		Console.WriteLine(System.Environment.NewLine);
	}  // foreach relation
}

Further reading

Related information on the web is listed below.