'''Python Standard Library''' ''翻译: Python 江湖群'' 2008-03-28 13:11:52 <> ---- [index.html 返回首页] ---- = 1. 数据表示 = "PALO ALTO, Calif. - Intel says its Pentium Pro and new Pentium II chips have a flaw that can cause computers to sometimes make mistakes but said the problems could be fixed easily with rewritten software." - Reuters telegram ---- == 1.1. 概览 == 本章描述了一些用于在 Python 对象和其他数据表示类型间相互转换的模块. 这些模块通常用于读写特定的文件格式或是储存/取出 Python 变量. === 1.1.1. 二进制数据 === Python 提供了一些用于二进制数据解码/编码的模块. {{{struct}}} 模块用于在二进制数据结构(例如 C 中的 struct )和 Python 元组间转换. {{{array}}} 模块将二进制数据阵列 ( C arrays )封装为 Python 序列对象. === 1.1.2. 自描述格式 === {{{marshal}}} 和 {{{pickle}}} 模块用于在不同的 Python 程序间共享/传递数据. {{{marshal}}} 模块使用了简单的自描述格式( Self-Describing Formats ), 它支持大多的内建数据类型, 包括 code 对象. Python 自身也使用了这个格式来储存编译后代码( .pyc 文件). {{{pickle}}} 模块提供了更复杂的格式, 它支持用户定义的类, 自引用数据结构等等. {{{pickle}}} 是用 Python 写的, 相对来说速度较慢, 不过还有一个 {{{cPickle}}} 模块, 使用 C 实现了相同的功能, 速度和 {{{marshal}}} 不相上下. === 1.1.3. 输出格式 === 一些模块提供了增强的格式化输出, 用来补充内建的 {{{repr}}} 函数和 {{{%}}} 字符串格式化操作符. {{{pprint}}} 模块几乎可以将任何 Python 数据结构很好地打印出来(提高可读性). {{{repr}}} 模块可以用来替换内建同名函数. 该模块与内建函数不同的是它限制了很多输出形式: 他只会输出字符串的前 30 个字符, 它只打印嵌套数据结构的几个等级, 等等. === 1.1.4. 编码二进制数据 === Python 支持大部分常见二进制编码, 例如 {{{base64}}} , {{{binhex}}} (一种 Macintosh 格式) , {{{quoted printable}}} , 以及 {{{uu}}} 编码. ---- == 1.2. array 模块 == {{{array}}} 模块实现了一个有效的阵列储存类型. 阵列和列表类似, 但其中所有的项目必须为相同的类型. 该类型在阵列创建时指定. Examples 4-1 到 4-5 都是很简单的范例. [[#eg-4-1|Example 4-1]] 创建了一个 ''array'' 对象, 然后使用 {{{tostring}}} 方法将内部缓冲区( internal buffer )复制到字符串. ==== 1.2.0.1. Example 4-1. 使用 array 模块将数列转换为字符串 ==== {{{ File: array-example-1.py import array a = array.array("B", range(16)) # unsigned char b = array.array("h", range(16)) # signed short print a print repr(a.tostring()) print b print repr(b.tostring()) *B*array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]) '\000\001\002\003\004\005\006\007\010\011\012\013\014\015\016\017' array('h', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]) '\000\000\001\000\002\000\003\000\004\000\005\000\006\000\007\000 \010\000\011\000\012\000\013\000\014\000\015\000\016\000\017\000'*b* }}} ''array'' 对象可以作为一个普通列表对待, 如 [[#eg-4-2|Example 4-2]] 所示. 不过, 你不能连接两个不同类型的阵列. ==== 1.2.0.2. Example 4-2. 作为普通序列操作阵列 ==== {{{ File: array-example-2.py import array a = array.array("B", [1, 2, 3]) a.append(4) a = a + a a = a[2:-2] print a print repr(a.tostring()) for i in a: print i, *B*array('B', [3, 4, 1, 2]) '\003\004\001\002' 3 4 1 2*b* }}} 该模块还提供了用于转换原始二进制数据到整数序列(或浮点数数列, 具体情况决定)的方法, 如 [[#eg-4-3|Example 4-3]] 所示. ==== 1.2.0.3. Example 4-3. 使用阵列将字符串转换为整数列表 ==== {{{ File: array-example-3.py import array a = array.array("i", "fish license") # signed integer print a print repr(a.tostring()) print a.tolist() *B*array('i', [1752394086, 1667853344, 1702063717]) 'fish license' [1752394086, 1667853344, 1702063717]*b* }}} 最后, [[#eg-4-4|Example 4-4]] 展示了如何使用该模块判断当前平台的字节序( endianess ) . ==== 1.2.0.4. Example 4-4. 使用 array 模块判断平台字节序 ==== {{{ File: array-example-4.py import array def little_endian(): return ord(array.array("i",[1]).tostring()[0]) if little_endian(): print "little-endian platform (intel, alpha)" else: print "big-endian platform (motorola, sparc)" *B*big-endian platform (motorola, sparc)*b* }}} Python 2.0 以及以后版本提供了 {{{sys.byteorder}}} 属性, 可以更简单地判断字节序 (属性值为 "{{{little}}}" 或 "{{{big}}}" ), 如 Example 4-5 所示. ==== 1.2.0.5. Example 4-5. 使用 sys.byteorder 属性判断平台字节序( Python 2.0 及以后) ==== {{{ File: sys-byteorder-example-1.py import sys # 2.0 and later if sys.byteorder == "little": print "little-endian platform (intel, alpha)" else: print "big-endian platform (motorola, sparc)" *B*big-endian platform (motorola, sparc)*b* }}} ---- == 1.3. struct 模块 == {{{struct}}} 模块用于转换二进制字符串和 Python 元组. {{{pack}}} 函数接受格式字符串以及额外参数, 根据指定格式将额外参数转换为二进制字符串. {{{upack}}} 函数接受一个字符串作为参数, 返回一个元组. 如 [[#eg-4-6|Example 4-6]] 所示. ==== 1.3.0.1. Example 4-6. 使用 struct 模块 ==== {{{ File: struct-example-1.py import struct # native byteorder buffer = struct.pack("ihb", 1, 2, 3) print repr(buffer) print struct.unpack("ihb", buffer) # data from a sequence, network byteorder data = [1, 2, 3] buffer = apply(struct.pack, ("!ihb",) + tuple(data)) print repr(buffer) print struct.unpack("!ihb", buffer) # in 2.0, the apply statement can also be written as: # buffer = struct.pack("!ihb", *data) *B*'\001\000\000\000\002\000\003' (1, 2, 3) '\000\000\000\001\000\002\003' (1, 2, 3)*b* }}} ---- == 1.4. xdrlib 模块 == {{{xdrlib}}} 模块用于在 Python 数据类型和 Sun 的 external data representation (XDR) 间相互转化, 如 [[#eg-4-7|Example 4-7]] 所示. ==== 1.4.0.1. Example 4-7. 使用 xdrlib 模块 ==== {{{ File: xdrlib-example-1.py import xdrlib # # create a packer and add some data to it p = xdrlib.Packer() p.pack_uint(1) p.pack_string("spam") data = p.get_buffer() print "packed:", repr(data) # # create an unpacker and use it to decode the data u = xdrlib.Unpacker(data) print "unpacked:", u.unpack_uint(), repr(u.unpack_string()) u.done() *B*packed: '\000\000\000\001\000\000\000\004spam' unpacked: 1 'spam'*b* }}} Sun 在 remote procedure call (RPC) 协议中使用了 XDR 格式. [[#eg-4-8|Example 4-8]] 虽然不完整, 但它展示了如何建立一个 RPC 请求包. ==== 1.4.0.2. Example 4-8. 使用 xdrlib 模块发送 RPC 调用包 ==== {{{ File: xdrlib-example-2.py import xdrlib # some constants (see the RPC specs for details) RPC_CALL = 1 RPC_VERSION = 2 MY_PROGRAM_ID = 1234 # assigned by Sun MY_VERSION_ID = 1000 MY_TIME_PROCEDURE_ID = 9999 AUTH_NULL = 0 transaction = 1 p = xdrlib.Packer() # send a Sun RPC call package p.pack_uint(transaction) p.pack_enum(RPC_CALL) p.pack_uint(RPC_VERSION) p.pack_uint(MY_PROGRAM_ID) p.pack_uint(MY_VERSION_ID) p.pack_uint(MY_TIME_PROCEDURE_ID) p.pack_enum(AUTH_NULL) p.pack_uint(0) p.pack_enum(AUTH_NULL) p.pack_uint(0) print repr(p.get_buffer()) *B*'\000\000\000\001\000\000\000\001\000\000\000\002\000\000\004\322 \000\000\003\350\000\000\'\017\000\000\000\000\000\000\000\000\000 \000\000\000\000\000\000\000'*b* }}} ---- == 1.5. marshal 模块 == {{{marshal}}} 模块可以把不连续的数据组合起来 - 与字符串相互转化, 这样它们就可以写入文件或是在网络中传输. 如 [[#eg-4-9|Example 4-9]] 所示. {{{marshal}}} 模块使用了简单的自描述格式. 对于每个数据项目, 格式化后的字符串都包含一个类型代码, 然后是一个或多个类型标识区域. 整数使用小字节序( little-endian order )储存, 字符串储存时和它自身内容长度相同(可能包含空字节), 元组由组成它的对象组合表示. ==== 1.5.0.1. Example 4-9. 使用 marshal 模块组合不连续数据 ==== {{{ File: marshal-example-1.py import marshal value = ( "this is a string", [1, 2, 3, 4], ("more tuples", 1.0, 2.3, 4.5), "this is yet another string" ) data = marshal.dumps(value) # intermediate format print type(data), len(data) print "-"*50 print repr(data) print "-"*50 print marshal.loads(data) *B* 118 -------------------------------------------------- '(\004\000\000\000s\020\000\000\000this is a string [\004\000\000\000i\001\000\000\000i\002\000\000\000 i\003\000\000\000i\004\000\000\000(\004\000\000\000 s\013\000\000\000more tuplesf\0031.0f\0032.3f\0034. 5s\032\000\000\000this is yet another string' -------------------------------------------------- ('this is a string', [1, 2, 3, 4], ('more tuples', 1.0, 2.3, 4.5), 'this is yet another string')*b* }}} {{{marshal}}} 模块还可以处理 code 对象(它用于储存预编译的 Python 模块). 如 [[#eg-4-10|Example 4-10]] 所示. ==== 1.5.0.2. Example 4-10. 使用 marshal 模块处理代码 ==== {{{ File: marshal-example-2.py import marshal script = """ print 'hello' """ code = compile(script, "