1. Technology

How Python Processes Non-English RSS Feeds


1 of 5

Moving Beyond ASCII with Python

Now that our RSS Reader has error handling, we can move beyond the basics and start to add more robustness to the application. Something that every web application must be able to do is to handle non-English fonts. A quick trip outside of an English-speaking country will show just how few people speak English. Most every language in the world has loanwords, words which have been borrowed from other languages, as well as spellings that may involve accents.

Currently, the RSS Reader of this series can only read ASCII. There are 95 printable characters in the ASCII encoding. These correspond approximately to the character range of a United States keyboard, including the currency symbol for the US dollar ('$'). Any character outside this set requires special handling.

On output or on assignment, one must tell Python which encoding to use in expressing a given string value. To do this, we use the encode method. The encode method is applicable to any string object and can be used to convert how that object is expressed within the program's environment and thus to the user. Before talking about encodings, though, let's see what is wrong with leaving the feed handling in ASCII.

Other tutorials in this series:

Part 1 | 2 | 3 | 4 | 5
Get the Code!
  1. About.com
  2. Technology
  3. Python
  4. Web Development
  5. RSS Reader in Python - Building an RSS Reader With Python - Multilingual Processing With Python

©2014 About.com. All rights reserved.