It may seem obvious sometimes, but one of the keys to software re-use is abstraction. For this reason, Python does not handle files in terms of their names but in terms of number value. Rather than process the file as "This is file named 'simple.txt'", it works with it as "This is file three, the one after file two and before file four." Consequently, after you tell Python to open a file, it deals with it by its number, not by its name.
To illustrate this, we'll save a file to your computer's hard disk and use it to illustrate Python's file descriptor handling. So press 'Control-s', or whatever it takes to save a file from your web browser, and save this file as 'python-fd.html'. Note the directory into which you are saving it; you will need that information in a moment.
Now, start a Python shell session. Import the os module:
And open the saved file with a file handle of your choice. Here we will use the innovative handle input:
Note that we are using the absolute path and are explicitly telling Python to open the file as 'read only'. You need do neither, but both are good practice. If you opened the Python shell from the directory in which the file is located, you can simply use dot-slash shorthand: './python-fd.html'. The initial dot tells Python to look in the present directory for the ensuing file path or name. The default for the open() function is read-only (type 'help(open)' in a Python shell to read more about this).
input = open('/absolute/path/to/the/saved/file/python-fd.html', 'r')
In my shell session, Python returned '3'. If you open another file and follow the same procedure without doing anything else in the terminal, you will get '4'. Knowing the file descriptor, we can resolve the first of the aforementioned problems by calling fstat():
This returns, among other things, the file's group ID, user ID, size, inode, and sundry other bits about its status on the system. If you are on a Unix variant, You can get even more information about the file by running os.fstatvfs(3).
>>> os.fstat(3) (33188, 5225L, 11L, 1, 1000, 1000, 5834L, 1209359646, 1209359726, 1209359726)
Now, let's say you want to read from a certain range of characters within the file. For purposes of an example, let's say you want from character's 10-15. You could read in the whole file, and read out a certain amount, whittling it down. But this is a hack compared to what os can do for you. Using os.lseek() and os.read(), you can get the information you need with comparatively little overhead - saving resources and increasing speed. The basic syntax for each is as follows:
os.lseek(fd, pos, how)
os.read(fd, n) In the first call, fd is 'file descriptor', pos is 'position', and how is an integer of 0, 1, or 2. This indicates how to calculate the position using pos. 0 is relative to the beginning of the file, 1 is relative to the current position within the file, and 2 is relative to the end. In the second call, n is the number of bytes you want to read from the position within the file.
To get characters 10-15, we first move to the tenth point from the head of the file:Then we can read the 10-16 characters:
os.lseek(3, 10, 0)Depending on the numbers you feed os, you may get any manner of character combinations. As Python sees HTML as simple text, you will get HTML tags and the like. If, however, you copy this article as text and save it, you will get something different:
os.read(3,6)Keep in mind that we are telling Python to move within the file according to characters - not words. Therefore, it treats spaces in the same way as it treats letters or numbers.
Armed with this knowledge, you can then use these other file handling calls from the os module [Note that this section is in two parts; be sure to see the next page for more]:
- close(fd): Closes the file descriptor
- dup(fd): Essentially creates a pointer or symbolic link to the file descriptor. The resulting duplicate consequently shares state, file pointer, and file locks. The lowest available descriptor number is used for the resulting duplicate.
- dup2(oldfd, newfd): Duplicates the old descriptor with the new. If the new descriptor already points to another file descriptor, that descriptor is closed first - thus freeing up newfd for use.
- fdopen(fd, mode, bufsize):: Opens a file object to a given file descriptor in a set mode. How much of the file is viewable can be controlled with the bufsize argument. Both mode and bufsize are optional and function similarly to the more common open() function.
- fpathconf(fd, name):Returns configurable pathname variables for fd. name is the name of the variable to be retrieved. Which variables are available depend on the operating system. They usually follow the system header information. POSIX-compliant systems have a bevy of variables which I plan to detail elsewhere on this site (look for the link in due course). However, for most C-based Python installations, you can look at what is available in the header files <limits.h> or <unistd.h> of the C libraries used by Python. Jython users can look similarly at the Java libraries.
- ftruncate(fd, length): Truncate the file fd to a maximum length of length bytes. (available only on Unix derivatives)
- fsync(fd): Forces unwritten data on fd to be written to disk. If fd is buffered (e.g., a Python file object) you are best to flush the data before making this call.
- tcgetpgrp(fd): Returns the process group associated with the control terminal associated with fd. (Unix only)
- tcsetpgrp(fd, pg): Sets the process group associated with the control terminal associated with fd. (Unix only)
- ttyname(fd): Returns a string specifying the terminal associated with fd. If no such terminal exists, an OSError exception is raised. (Unix only)
- write(fd, str): Writes str as a string to the file whose file descriptor is fd. Returns the number of bytes written. This is particularly useful if the connection is interrupted due to a network problem or a full disk.