Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

XML Processing in Python



Hello dear readers! welcome back to another section of my tutorial on Python. In this tutorial guide, we are going to be discussing about Python XML Processing. Try to read through this detailed guide carefully and do feel free to ask questions.

XML is a portable and open source language that allows developers to create applications that can be read by other application, mindless of the operating system and/or the developmental language.

What is XML?

The Extensible Markup Language is a markup language much like Html, XHTML or SGML. This is recommeded by the World Wide Web Consortium and is available as an open standard.

The XML is extremely useful for keeping track of small to medium amounts of data without needing a SQL-based backbone.

RECOMMENDED POST: Python os.tempnam() Method with example

XML Parser Architectures and APIs

Python standard library provides a little but useful set of interfaces to work with XML.

The two most basic and widely used APIs to XML data are known as the SAX and DOM interfaces.

  • Simpe API for XML(SAX) - Here, you register callbacks for the events of interest and allow the Parser to proceed through the document. This is actually useful when your documents are very large or when you have memory limitations, it parses the file as it is reading it from the disk and the full file is never stored in memory.
  • Document Object Module (DOM) API - This is a World Wide Web Consortium(W3C) recommendation where the full file is read into memory and stored in a hierarchical (tree based) form to show all the nice features of an XML document.

RECOMMENDED POST: Python os.openpty() Method with example

SAX cannot process information as fast as DOM can when working with large files. On the other hand, using DOM exclusively can in fact kill your resources, especially if used on a lot of small files.

SAX is a read-only, while DOM let changes to the XML file. Since these two different APIs actually compliment each other, there is no reason at all why you cannot use them together in large projects.

Example

For all our XML code examples, we will be using a simple XML file movies.xml as an input -

 shelf="New Arrivals">
title="Enemy Behind">
War, Thriller
DVD
2003
PG
10
Talk about a US-Japan war

title="Transformers">
Anime, Science Fiction
DVD
1989
R
8
A schientific fiction

title="Trigun">
Anime, Action
DVD
4
PG
10
Vash the Stampede!

title="Ishtar">
Comedy
VHS
PG
2
Viewable boredom


RECOMMENDED POST: Python Operators


Parsing XML with SAX APIs

The SAX is a standard interface for  the event driven XML parsing. Parsing an XML with SAX normally needs that you create your own ContentHandler by subclassing xml.sax.ContentHandler.

Your newly created ContentHandler handles the particular tags and attributes of your flavor(s) of XML. A ContentHandler object provides methods to handle various parsing events. It is owning parser call ContentHandler methods as it parses the XML file.

The methods startDocument and the endDocument are called at the start and end of the XML file. The method characters(text) is passed character data of the XML file via parameter text.

The ContentHandler is called at the start and end of each element. If the parser isn't in namespace mode, then the following methods startElement(tag, attributes) and endElement(tag) are called; else, the corresponding methods startElementNS and endElementNS are called. Here, the tag is the element's tag and the attributes is an Attribute object.

RECOMMENDED: Object Oriented Programing in Python

The following below are other important methods to understand before proceeding -


The make_parser Method

The following method creates a new parser object and returns the object. The parser object created will be of the first parser type that the system finds.

Syntax

Following is the syntax for using the make_parser method -

xml.sax.make_parser( [parser_list] )

Parameter Details

Following below is the details of the parameters -

  • parser_list - The optional argument consisting of a list of parsers to be used which must all implement the SAX make_parser method.

The parse Method

The following method creates a SAX parser and makes use of it to parse a document.

Syntax

Following is the syntax for using the parse method -

xml.sax.parse( xmlfile, contenthandler[, errorhandler])

Parameter Details

Following below is the details of the parameters -

  • xmlfile - The name of the XML file to read from.
  • contenthandler - This must be a ContentHandler object.
  • errorhandler - If specified, it must be a SAX ErrorHandler object.

RECOMMENDED: Sending Emails in Python using SMTP

The parseString Method

There is one more method to create a SAX parser and parse the specified XML string.

Syntax

Following is the syntax for using the parse method -

xml.sax.parseString( xmlstring, contenthandler[, errorhandler])

Parameter Details

Following below is the details of the parameters -

  • xmlstring - This is the name of the XML string to read from.
  • contenthandler - This must be a ContentHandler object.
  • errorhandler - If specified, it must be a SAX ErrorHandler object.

Example

Following below is an example -

#!/usr/bin/python

import xml.sax

class MovieHandler( xml.sax.ContentHandler ):
def __init__(self):
self.CurrentData = ""
self.type = ""
self.format = ""
self.year = ""
self.rating = ""
self.stars = ""
self.description = ""

# Call when an element starts
def startElement(self, tag, attributes):
self.CurrentData = tag
if tag == "movie":
print "*****Movie*****"
title
= attributes["title"]
print "Title:", title

# Call when an elements ends
def endElement(self, tag):
if self.CurrentData == "type":
print "Type:", self.type
elif self.CurrentData == "format":
print "Format:", self.format
elif self.CurrentData == "year":
print "Year:", self.year
elif self.CurrentData == "rating":
print "Rating:", self.rating
elif self.CurrentData == "stars":
print "Stars:", self.stars
elif self.CurrentData == "description":
print "Description:", self.description
self.CurrentData = ""

# Call when a character is read
def characters(self, content):
if self.CurrentData == "type":
self.type = content
elif self.CurrentData == "format":
self.format = content
elif self.CurrentData == "year":
self.year = content
elif self.CurrentData == "rating":
self.rating = content
elif self.CurrentData == "stars":
self.stars = content
elif self.CurrentData == "description":
self.description = content

if ( __name__ == "__main__"):

# create an XMLReader
parser
= xml.sax.make_parser()
# turn off namepsaces
parser
.setFeature(xml.sax.handler.feature_namespaces, 0)

# override the default ContextHandler
Handler = MovieHandler()
parser
.setContentHandler( Handler )

parser
.parse("movies.xml")

Output

When the above code is executed, it will produce the following result -

*****Movie*****
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Year: 2003
Rating: PG
Stars: 10
Description: Talk about a US-Japan war
*****Movie*****
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Year: 1989
Rating: R
Stars: 8
Description: A schientific fiction
*****Movie*****
Title: Trigun
Type: Anime, Action
Format: DVD
Rating: PG
Stars: 10
Description: Vash the Stampede!
*****Movie*****
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Stars: 2
Description: Viewable boredom

RECOMMENDED POST: Network Programming in Python

Parsing XML with DOM APIs
The document object module is a cross-language API that comes from the W3C for accessing and modifying XML documents.

DOM is very useful for random-access application. SAX only allows you view one bit of the document at a time. If you are looking at one SAX element, then you have no access to another.

Below is the simplest way to quickly load an XML document and to create a minidom object using the xml.dom module. The minidom object gives a simple parser method that speedily creates a DOM tree from the XML file

The sample phrase calls the phrase( file [,parser]) function of the minidom object to parse the XML file selected by file into a DOM tree object.

Example
#!/usr/bin/python

from xml.dom.minidom import parse
import xml.dom.minidom

# Open XML document using minidom parser
DOMTree = xml.dom.minidom.parse("movies.xml")
collection
= DOMTree.documentElement
if collection.hasAttribute("shelf"):
print "Root element : %s" % collection.getAttribute("shelf")

# Get all the movies in the collection
movies
= collection.getElementsByTagName("movie")

# Print detail of each movie.
for movie in movies:
print "*****Movie*****"
if movie.hasAttribute("title"):
print "Title: %s" % movie.getAttribute("title")

type
= movie.getElementsByTagName('type')[0]
print "Type: %s" % type.childNodes[0].data
format
= movie.getElementsByTagName('format')[0]
print "Format: %s" % format.childNodes[0].data
rating
= movie.getElementsByTagName('rating')[0]
print "Rating: %s" % rating.childNodes[0].data
description
= movie.getElementsByTagName('description')[0]
print "Description: %s" % description.childNodes[0].data

Output
When the above code is executed, it will produce the following result -

Root element : New Arrivals
*****Movie*****
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Rating: PG
Description: Talk about a US-Japan war
*****Movie*****
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Rating: R
Description: A schientific fiction
*****Movie*****
Title: Trigun
Type: Anime, Action
Format: DVD
Rating: PG
Description: Vash the Stampede!
*****Movie*****
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Description: Viewable boredom

RECOMMENDED: Multithreading Programing in Python

Alright guys! This is where we are rounding up for this tutorial post. In my next tutorial, we are going to be discussing about the Python GUI Programing.

Feel feel to ask your questions where necessary and i will attend to them as soon as possible. If this tutorial was helpful to you, you can use the share button to share this tutorial.

Follow us on our various social media platforms to stay updated with our latest tutorials. You can also subscribe to our newsletter in order to get our tutorials delivered directly to your emails.

Thanks for reading and bye for now.


This post first appeared on Web Design Tutorialz, please read the originial post: here

Share the post

XML Processing in Python

×

Subscribe to Web Design Tutorialz

Get updates delivered right to your inbox!

Thank you for your subscription

×