描述

Updating a Random-Access File

更新可随机访问的文件

Credit: Luther Blissett

问题 Problem

需要从由固定长度的记录块组成的文件中随机读取某特定的2进制记录，改变此记录的值，然后在文件中更新此记录。

解决 Solution

Read the record, unpack it, perform whatever computations you need for the update, pack the fields back into the record, seek to the start of the record again, and write it back. Phew. Faster to code than to say:

读取记录(见4.14)、解组、对记录进行需要的计算更新，编组更新后记录，seek到文件中记录原来的字节位移处，从新写入记录。啊哈！写代码比讲解快：

   1 import struct
   2 
   3 thefile = open('somebinfile', 'r+b')                          #以可读写方式2进制模式打开文件
   4 record_size = struct.calcsize(format_string)                  #记录模式串
   5  
   6 thefile.seek(record_size * record_number)                     #seek文件指针到目的记录处
   7 buffer = thefile.read(record_size)                            #读取 
   8 fields = list(struct.unpack(format_string, buffer))       #按模式解组纪录，处理返回的tuple,获得域list 
   9 
  10 # Perform computations, suitably modifying fields, then:      #计算，处理  
  11 
  12 buffer = struct.pack(format_string, *fields)                  #编组 
  13 thefile.seek(record_size * record_number)                     #seek文件指针到原来处
  14 thefile.write(buffer)                                         #写回
  15 
  16 thefile.close(  )

讨论 Discussion

此方法仅适用于如下文件：文件(一般是2进制文件)包含的记录具有一致、固定大小，而对于处理普通文本文件并不适用。同时, 记录块的大小必须与(代码中)结构模式串确定的大小一致。

典型的模式串,比如"8l"(#译注："l"in word "letter", not "1" in "123", see any differene? odd!),确定了记录块由8个4字节的整数组成, 每个整数会被解析成有符号的值,解组成Python的int类型。

In this case, the fields variable in the recipe would be bound to a list of eight ints.

如上，脚本中将记录中各值域解组成一个有8个整数的list.

Note that struct.unpack returns a tuple. Because tuples are immutable, the computation would have to rebind the entire fields variable. A list is not immutable, so each field can be rebound as needed. Thus, for convenience, we explicitly ask for a list when we bind fields.

注意struct.unpack返回一个元组。由于元组不可改变，所以计算处理记录的值域变量时必须绑定(到其他结构中)。list 是可变的，其中的每个元素可以按需要从新赋值。因此，为方便计，这里我们使用了一个list来绑定记录值域变量。

Make sure, however, not to alter the length of the list. In this case, it needs to remain composed of exactly eight integers, or the struct.pack call will raise an exception when we call it with a format_string that is still "8l". Also note that this recipe is not suitable for working with records that are not all of the same, unchanging length.

不过：记住不要改变list的长度，这里list的元素必须还是由8个整数组成，否者原来的模式串"81"进行struct.pack编组是会抛出异常。同时要注意本节方法对于具有统一、不变长度记录块的文件之外的文件并不适用。

To seek back to the start of the record, instead of using the record_size*record_number offset again, you may choose to do a relative seek:

seek到原来记录的位移，可以不再使用record*record_number位移，而是进行相对位移：

thefile.seek(-record_size, 1)

The second argument to the seek method (1) tells the file object to seek relative to the current position (here, so many bytes back, because we used a negative number as the first argument).

参数1使得文件对象指针?相对于当前位置移动(这里第一个参数使用了负数，嘿嘿，正好移动到原记录开始处)

seek's default is to seek to an absolute offset within the file (i.e., from the start of the file). You can also explicitly request this default behavior by calling seek with a second argument of 0. (呵呵！不译！)

Of course, you don't need to open the file just before you do the first seek or close it right after the write. Once you have a file object that is correctly opened (i.e., for update, and as a binary rather than a text file), you can perform as many updates on the file as you want before closing the file again. These calls are shown here to emphasize the proper technique for opening a file for random-access updates and the importance of closing a file when you are done with it.

当然，不需要再第一次seek前打开文件,也不需要在write后关闭文件.只要有一个正确打开的文件(为了更新而打开的2进制文件)对象,可以进行需要的多次更新. 这里打开关闭文件只是为了强调打开随机访问文件的方法以及关闭文件的重要性.

The file needs to be opened for updating (i.e., to allow both reading and writing). That's what the 'r+b' argument to open means: open for reading and writing, but do not implicitly perform any transformations on the file's contents, because the file is a binary one (the 'b' part is unnecessary but still recommended for clarity on Unix and Unix-like systems梙owever, it's absolutely crucial on other platforms, such as Macintosh and Windows). If you're creating the binary file from scratch but you still want to be able to reread and update some records without closing and reopening the file, you can use a second argument of 'w+b' instead. However, I have never witnessed this strange combination of requirements; binary files are normally first created (by opening them with 'wb', writing data, and closing the file) and later opened for update with 'r+b'.

为了跟新文件而打开文件,需要使用"r+b"参数: 以可读写的方式、2进制方式打开文件,并不隐式对文件内容进行任何变换(在Unix以及类Unix平台上为了清晰性建议也适用"b", 在Windows和Mactintosh平台上"b"是必须的).

参考 See Also

The sections of the Library Reference on file objects and the struct module; Perl Cookbook Recipe 8.13.

-  ⇤ ← Revision 1 as of 2004-09-24 23:03:18 → 
  Size: 4452
  Editor: 61
  Comment:
+   ← Revision 2 as of 2004-09-25 00:18:35 → ⇥
  Size: 6707
  Editor: 61
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 26:
+读取记录(见4.14)、 解组、 对记录进行需要的计算更新，编组更新后记录，seek到文件中记录原来的字节位移处，从新写入记录。
啊哈！写代码比讲解快：

{{{
#!python
-Line 29:
+Line 34:
-thefile = open('somebinfile', 'r+b')
record_size = struct.calcsize(format_string)
+thefile = open('somebinfile', 'r+b')                          #以可读写方式2进制模式打开文件
record_size = struct.calcsize(format_string)                  #记录模式串
 
thefile.seek(record_size * record_number)                     #seek文件指针到目的记录处
buffer = thefile.read(record_size)                            #读取 
fields = list(struct.unpack(format_string, buffer))       #按模式解组纪录，处理返回的tuple,获得域list
-Line 32:
+Line 41:
-thefile.seek(record_size * record_number)
buffer = thefile.read(record_size)
fields = list(struct.unpack(format_string, buffer))
+# Perform computations, suitably modifying fields, then:      #计算，处理
-Line 36:
+Line 43:
-# Perform computations, suitably modifying fields, then:

buffer = struct.pack(format_string, *fields)
thefile.seek(record_size * record_number)
thefile.write(buffer)
+buffer = struct.pack(format_string, *fields)                  #编组 
thefile.seek(record_size * record_number)                     #seek文件指针到原来处
thefile.write(buffer)                                         #写回
-Line 43:
+Line 48:
+}}}
-Line 46:
+Line 52:
-This approach works only on files (generally binary ones) defined in terms of records that are all the same, fixed size; it doesn't work on normal text files. Furthermore, the size of each record must be that defined by a struct's format string, as shown in the recipe's code. A typical format string, for example, might be "8l", to specify that each record is made up of eight four-byte integers, each to be interpreted as a signed value and unpacked into a Python int. In this case, the fields variable in the recipe would be bound to a list of eight ints. Note that struct.unpack returns a tuple. Because tuples are immutable, the computation would have to rebind the entire fields variable. A list is not immutable, so each field can be rebound as needed. Thus, for convenience, we explicitly ask for a list when we bind fields. Make sure, however, not to alter the length of the list. In this case, it needs to remain composed of exactly eight integers, or the struct.pack call will raise an exception when we call it with a format_string that is still "8l". Also note that this recipe is not suitable for working with records that are not all of the same, unchanging length.
+此方法仅适用于如下文件：文件(一般是2进制文件)包含的记录具有一致、固定大小， 而对于处理普通文本文件并不适用。
同时, 记录块的大小必须与(代码中)结构模式串确定的大小一致。 


典型的模式串,比如"8l"(#译注 ："l"in word "letter", not "1" in "123", see any differene? odd!),确定了记录块由8个4字节的整数组成, 每个整数会被解析成有符号的值,解组成Python的int类型。 

In this case, the fields variable in the recipe would be bound to a list of eight ints.

如上，脚本中将记录中各值域解组成一个有8个整数的list.

 Note that struct.unpack returns a tuple. Because tuples are immutable, the computation would have to rebind the entire fields variable. A list is not immutable, so each field can be rebound as needed. Thus, for convenience, we explicitly ask for a list when we bind fields. 

注意struct.unpack返回一个元组。由于元组不可改变，所以计算处理记录的值域变量时必须绑定(到其他结构中)。list 是可变的，其中的每个元素可以按需要从新赋值。因此，为方便计，这里我们使用了一个list来绑定记录值域变量。 


Make sure, however, not to alter the length of the list. In this case, it needs to remain composed of exactly eight integers, or the struct.pack call will raise an exception when we call it with a format_string that is still "8l". Also note that this recipe is not suitable for working with records that are not all of the same, unchanging length. 

不过：记住不要改变list的长度，这里list的元素必须还是由8个整数组成，否者原来的模式串"81"进行'''struct.pack'''编组是会抛出异常。 同时要注意本节方法对于具有统一、不变长度记录块的文件之外的文件并不适用。
-Line 50:
+Line 73:
+seek到原来记录的位移，可以不再使用'''record*record_number'''位移， 而是进行相对位移：
{{{
-Line 51:
+Line 76:
-The second argument to the seek method (1) tells the file object to seek relative to the current position (here, so many bytes back, because we used a negative number as the first argument). seek's default is to seek to an absolute offset within the file (i.e., from the start of the file). You can also explicitly request this default behavior by calling seek with a second argument of 0.
+}}}

The second argument to the seek method (1) tells the file object to seek relative to the current position (here, so many bytes back, because we used a negative number as the first argument). 

参数'''1'''使得文件对象指针?相对于当前位置移动(这里第一个参数使用了负数，嘿嘿，正好移动到原记录开始处)

seek's default is to seek to an absolute offset within the file (i.e., from the start of the file). You can also explicitly request this default behavior by calling seek with a second argument of 0. (呵呵！不译！)
-Line 55:
+Line 86:
+当然，不需要再第一次'''seek'''前打开文件,也不需要在'''write'''后关闭文件.只要有一个正确打开的文件(为了更新而打开的2进制文件)对象,可以进行需要的多次更新. 这里打开关闭文件只是为了强调打开随机访问文件的方法以及关闭文件的重要性.
-Line 56:
+Line 89:
+为了跟新文件而打开文件,需要使用'''"r+b"'''参数: 以可读写的方式、2进制方式打开文件,并不隐式对文件内容进行任何变换(在Unix以及类Unix平台上为了清晰性建议也适用'''"b"''', 在Windows和Mactintosh平台上'''"b"'''是必须的).

Diff for "PyCkBk-4-15"

描述

问题 Problem

解决 Solution

讨论 Discussion

参考 See Also