1. Technology

Part 3: Creating a Feed Class and Finishing the Python Program


5 of 8

XML, Minidom and Parsing the String of Data

Having the feed data into memory, we still need to do something with it. RSS feeds come in XML format. Therefore, we must parse the XML file to get to the RSS data. Consider the next two lines of code.

 file_xml = minidom.parseString(file_feed) 
 item_node = file_xml.getElementsByTagName("item") 

Using the module xml.dom.minidom, we can use a simple XML parser to create a DOM tree. The minidom module makes two parsing systems available to the programmer: parse and parseString. The former takes local filenames and parses them into memory. The latter, as we have used here, is for strings that are already in memory.

Next, we need to get the items from the feed. Every RSS feed item is braced by complementary '[item]' tags. Using minidom's method getElementsByTagName, we can access just those nodes of type item, leaving the rest of the document behind. getElementsByTagName returns a list of the nodes; these are here assigned to item_node. The list is actually returned as an object of type NodeList, thus allowing us to use NodeList methods to access its contents.

If you want to see what the feed looks like, put this line after file_xml is assigned the parsed string:
 print file_xml.toxml() 
But be sure to remove it after you have seen enough; otherwise, the entire feed will be printed on the web page.

  1. About.com
  2. Technology
  3. Python
  4. Web Development
  5. RSS Reader in Python - Building an RSS Reader With Python - Python's xml Module for Parsing the RSS Feed

©2014 About.com. All rights reserved.