11. Brief Tour of the Standard Library – Part II Python标准库概览 — 第二部分

This second tour covers more advanced modules that support professional programming needs. These modules rarely occur in small scripts.

本部分覆盖了支持专业编程需要的更高级的模块。 这些模块在小脚本中很少出现。

11.1. Output Formatting 输出格式化

The reprlib module provides a version of repr() customized for abbreviated displays of large or deeply nested containers.

reprlib 模块为大型的或深度嵌套的容器缩写显示提供了 repr() 函数的一个定制版本。

>>> import reprlib
>>> reprlib.repr(set('supercalifragilisticexpialidocious'))
"set(['a', 'c', 'd', 'e', 'f', 'g', ...])"

The pprint module offers more sophisticated control over printing both built-in and user defined objects in a way that is readable by the interpreter. When the result is longer than one line, the “pretty printer” adds line breaks and indentation to more clearly reveal data structure.

pprint 模块为内置和用户自定义对象提供更精确的输出控制,从某种程度上更利于解释器阅读。 当结果比一行更长时,“美化打印器”就会添加行中断和缩进,一边更清晰的显示数据结构。

>>> import pprint
>>> t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta',
...     'yellow'], 'blue']]]
...
>>> pprint.pprint(t, width=30)
[[[['black', 'cyan'],
   'white',
   ['green', 'red']],
  [['magenta', 'yellow'],
   'blue']]]

The textwrap module formats paragraphs of text to fit a given screen width.

textwrap 模块格式文本段落,以便适应指定的屏幕宽度。

>>> import textwrap
>>> doc = """The wrap() method is just like fill() except that it returns
... a list of strings instead of one big string with newlines to separate
... the wrapped lines."""
...
>>> print(textwrap.fill(doc, width=40))
The wrap() method is just like fill()
except that it returns a list of strings
instead of one big string with newlines
to separate the wrapped lines.

The locale module accesses a database of culture specific data formats. The grouping attribute of locale’s format function provides a direct way of formatting numbers with group separators.

locale 模块用来访问特殊数据格式的文化数据库。 locale 的分组格式化函数属性为数字的分组分割格式化提供了直接的方法。

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'English_United States.1252')
'English_United States.1252'
>>> conv = locale.localeconv()          # get a mapping of conventions
>>> x = 1234567.8
>>> locale.format("%d", x, grouping=True)
'1,234,567'
>>> locale.format("%s%.*f", (conv['currency_symbol'],
...               conv['frac_digits'], x), grouping=True)
'$1,234,567.80'

11.2. Templating 模板

The string module includes a versatile Template class with a simplified syntax suitable for editing by end-users. This allows users to customize their applications without having to alter the application.

string 模块包含一个通用的 Template 类,为最终用户的编辑提供了简化的语法。 这允许用户无需改变就可以定制他们的应用程序。

The format uses placeholder names formed by $ with valid Python identifiers (alphanumeric characters and underscores). Surrounding the placeholder with braces allows it to be followed by more alphanumeric letters with no intervening spaces. Writing $$ creates a single escaped $.

这个格式使用 $ 加有效的Python标志符(数字,字母和下划线)形式的占位符名称。 通过在占位符两侧使用大括号便可以不用空格分割在其后面跟随更多的字母和数字字符。 使用 $$ 来创建一个单独 $ 转码字符。

>>> from string import Template
>>> t = Template('${village}folk send $$10 to $cause.')
>>> t.substitute(village='Nottingham', cause='the ditch fund')
'Nottinghamfolk send $10 to the ditch fund.'

The substitute() method raises a KeyError when a placeholder is not supplied in a dictionary or a keyword argument. For mail-merge style applications, user supplied data may be incomplete and the safe_substitute() method may be more appropriate — it will leave placeholders unchanged if data is missing.

当一个占位符在字典或关键字参数中没有被提供时, substitute() 方法就会抛出一个 KeyError 异常。 对于邮件合并风格的应用程序,用户提供的数据可能并不完整,这时使用 safe_substitute() 方法可能更适合 — 如果数据不完整,它就不会改变占位符。

>>> t = Template('Return the $item to $owner.')
>>> d = dict(item='unladen swallow')
>>> t.substitute(d)
Traceback (most recent call last):
  . . .
KeyError: 'owner'
>>> t.safe_substitute(d)
'Return the unladen swallow to $owner.'

Template subclasses can specify a custom delimiter. For example, a batch renaming utility for a photo browser may elect to use percent signs for placeholders such as the current date, image sequence number, or file format.

模板子类可以指定一个自定义分隔符。 例如,图像查看器的批量重命名工具可能选择使用百分号作为占位符,像当前日期,图片序列号或文件格式。

>>> import time, os.path
>>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
>>> class BatchRename(Template):
...     delimiter = '%'
>>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format):  ')
Enter rename style (%d-date %n-seqnum %f-format):  Ashley_%n%f

>>> t = BatchRename(fmt)
>>> date = time.strftime('%d%b%y')
>>> for i, filename in enumerate(photofiles):
...     base, ext = os.path.splitext(filename)
...     newname = t.substitute(d=date, n=i, f=ext)
...     print('{0} --> {1}'.format(filename, newname))

img_1074.jpg --> Ashley_0.jpg
img_1076.jpg --> Ashley_1.jpg
img_1077.jpg --> Ashley_2.jpg

Another application for templating is separating program logic from the details of multiple output formats. This makes it possible to substitute custom templates for XML files, plain text reports, and HTML web reports.

模板的另一个应用是把多样的输出格式细节从程序逻辑中分类出来。 这便使得XML文件,纯文本报表和HTMLWEB报表定制模板成为可能。

11.3. Working with Binary Data Record Layouts 使用二进制数据记录布局

The struct module provides pack() and unpack() functions for working with variable length binary record formats. The following example shows how to loop through header information in a ZIP file without using the zipfile module. Pack codes "H" and "I" represent two and four byte unsigned numbers respectively. The "<" indicates that they are standard size and in little-endian byte order.

struct 模块为使用变长的二进制记录格式提供了 pack()unpack() 函数。 下面的示例演示了在不使用 zipfile 模块的情况下如何迭代一个ZIP文件的头信息。 压缩码 "H""I" 分别表示2和4字节无符号数字, "<" 表明它们都是标准大小并且按照 little-endian 字节排序。

import struct

data = open('myfile.zip', 'rb').read()
start = 0
for i in range(3):                      # show the first 3 file headers
    start += 14
    fields = struct.unpack('<IIIHH', data[start:start+16])
    crc32, comp_size, uncomp_size, filenamesize, extra_size = fields

    start += 16
    filename = data[start:start+filenamesize]
    start += filenamesize
    extra = data[start:start+extra_size]
    print(filename, hex(crc32), comp_size, uncomp_size)

    start += extra_size + comp_size     # skip to the next header

11.4. Multi-threading 多线程

Threading is a technique for decoupling tasks which are not sequentially dependent. Threads can be used to improve the responsiveness of applications that accept user input while other tasks run in the background. A related use case is running I/O in parallel with computations in another thread.

线程是一种为了分离那些无顺序依赖关系任务的技术。 线程可以用来提高应用程序的响应速度,那些应用程序可以在接收用户输入的同时在后台运行其他任务。 一种相关的应用就是在运行I/O的同时在另一个线程中执行计算。

The following code shows how the high level threading module can run tasks in background while the main program continues to run.

下列代码演示了:mod:`threading`高级模块如何在主程序继续执行的同时又在后台运行任务。

import threading, zipfile

class AsyncZip(threading.Thread):
    def __init__(self, infile, outfile):
        threading.Thread.__init__(self)
        self.infile = infile
        self.outfile = outfile
    def run(self):
        f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
        f.write(self.infile)
        f.close()
        print('Finished background zip of:', self.infile)

background = AsyncZip('mydata.txt', 'myarchive.zip')
background.start()
print('The main program continues to run in foreground.')

background.join()    # Wait for the background task to finish
print('Main program waited until background was done.')

The principal challenge of multi-threaded applications is coordinating threads that share data or other resources. To that end, the threading module provides a number of synchronization primitives including locks, events, condition variables, and semaphores.

多线程应用程序的主要挑战是协调线程,诸如线程间共享数据或其它资源。 为了达到那个目的,线程模块提供了许多同步化的原生支持,包括:锁,事件,条件变量和信号灯。

While those tools are powerful, minor design errors can result in problems that are difficult to reproduce. So, the preferred approach to task coordination is to concentrate all access to a resource in a single thread and then use the queue module to feed that thread with requests from other threads. Applications using Queue objects for inter-thread communication and coordination are easier to design, more readable, and more reliable.

尽管这些工具很强大,微小的设计错误也可能造成难以挽回的故障。 因此,任务协调的首选方法是把对一个资源的所有访问集中在一个单独的线程中,然后使用 queue 模块用那个线程服务其他线程的请求。 为内部线程通信和协调而使用 Queue 对象的应用程序更易于设计,更可读,并且更可靠。

11.5. Logging 日志

The logging module offers a full featured and flexible logging system. At its simplest, log messages are sent to a file or to sys.stderr.

logging 模块提供了一个完整功能和灵活的日志系统。 最简单的形式就是把日志信息发送到一个文件或 sys.stderr

import logging
logging.debug('Debugging information')
logging.info('Informational message')
logging.warning('Warning:config file %s not found', 'server.conf')
logging.error('Error occurred')
logging.critical('Critical error -- shutting down')

This produces the following output:

输出如下:

WARNING:root:Warning:config file server.conf not found ERROR:root:Error occurred CRITICAL:root:Critical error – shutting down

By default, informational and debugging messages are suppressed and the output is sent to standard error. Other output options include routing messages through email, datagrams, sockets, or to an HTTP Server. New filters can select different routing based on message priority: DEBUG, INFO, WARNING, ERROR, and CRITICAL.

默认情况下,提示信息和调试信息都会被捕获并被发送到标注错误输出。 其他的输出选项包括通过邮箱路由信息,数据报文,sockets或者HTTP服务器。 新的过滤器可以基于信息优先权选择不同的路由: DEBUGINFOWARNINGERRORCRITICAL

The logging system can be configured directly from Python or can be loaded from a user editable configuration file for customized logging without altering the application.

可以从Python中直接配置日志系统,或者为定制日志从一个用户可编辑的的配置文件中加载而无需修改应用程序。

11.6. Weak References 弱引用

Python does automatic memory management (reference counting for most objects and garbage collection to eliminate cycles). The memory is freed shortly after the last reference to it has been eliminated.

Python自动进行内容管理(为大多数对象做引用计数并为消除循环引用作 garbage collection )。 在对象最后一个引用消除后,内存就会立即被释放。

This approach works fine for most applications but occasionally there is a need to track objects only as long as they are being used by something else. Unfortunately, just tracking them creates a reference that makes them permanent. The weakref module provides tools for tracking objects without creating a reference. When the object is no longer needed, it is automatically removed from a weakref table and a callback is triggered for weakref objects. Typical applications include caching objects that are expensive to create.

这种方法在大多数应用程序中工作良好,但偶尔也需要在对象被其它东西使用时追踪对象。 不幸的,仅仅为跟踪它们而创建的引用会使其持久存在。 weakref 模块提供了无需创建引用便可跟踪对象的工具。 当对象不再需要时,它会被从一个弱引用表中自动的删除并且会为弱引用对象触发回调。 典型的应用程序的创建都是昂贵的,包括的缓存对象。

>>> import weakref, gc
>>> class A:
...     def __init__(self, value):
...             self.value = value
...     def __repr__(self):
...             return str(self.value)
...
>>> a = A(10)                   # create a reference
>>> d = weakref.WeakValueDictionary()
>>> d['primary'] = a            # does not create a reference
>>> d['primary']                # fetch the object if it is still alive
10
>>> del a                       # remove the one reference
>>> gc.collect()                # run garbage collection right away
0
>>> d['primary']                # entry was automatically removed
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    d['primary']                # entry was automatically removed
  File "C:/python31/lib/weakref.py", line 46, in __getitem__
    o = self.data[key]()
KeyError: 'primary'

11.7. Tools for Working with Lists 列表工具

Many data structure needs can be met with the built-in list type. However, sometimes there is a need for alternative implementations with different performance trade-offs.

很多数据结构要求可能用内置的列表类型就可以满足。 然而,有时出于不同性能取舍需要从中选择一种实现。

The array module provides an array() object that is like a list that stores only homogeneous data and stores it more compactly. The following example shows an array of numbers stored as two byte unsigned binary numbers (typecode "H") rather than the usual 16 bytes per entry for regular lists of python int objects.

array 模块提供了一个类似列表的 array() 对象,它存储同一类型的数据并且更为简洁。 以下示例演示了一个存储2字节无符号二进制数字(编码类型 "H" )数字数组,而非常见的每个项为16字节的Python整型对象的正规列表。

>>> from array import array
>>> a = array('H', [4000, 10, 700, 22222])
>>> sum(a)
26932
>>> a[1:3]
array('H', [10, 700])

The collections module provides a deque() object that is like a list with faster appends and pops from the left side but slower lookups in the middle. These objects are well suited for implementing queues and breadth first tree searches.

collections 模块提供了一个类似列表的 deque() 对象,它从左边添加(append)和弹出(pop)更快,但在中间查找时更慢。 这些对象更适合实现队列和广度优先的搜索树。

>>> from collections import deque
>>> d = deque(["task1", "task2", "task3"])
>>> d.append("task4")
>>> print("Handling", d.popleft())
Handling task1

unsearched = deque([starting_node])
def breadth_first_search(unsearched):
    node = unsearched.popleft()
    for m in gen_moves(node):
        if is_goal(m):
            return m
        unsearched.append(m)

In addition to alternative list implementations, the library also offers other tools such as the bisect module with functions for manipulating sorted lists.

除了替代列表的实现外,该库还提供了其它工具,像操作排序列表的 bisect 模块函数。

>>> import bisect
>>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
>>> bisect.insort(scores, (300, 'ruby'))
>>> scores
[(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]

The heapq module provides functions for implementing heaps based on regular lists. The lowest valued entry is always kept at position zero. This is useful for applications which repeatedly access the smallest element but do not want to run a full list sort.

heapq 模块为基于正规列表的堆实现提供了函数。 最小的值入口总是在位置0上。 这对那些希望重复访问最小元素而不想做一次完整列表排序的应用程序很有用。

>>> from heapq import heapify, heappop, heappush
>>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
>>> heapify(data)                      # rearrange the list into heap order
>>> heappush(data, -5)                 # add a new entry
>>> [heappop(data) for i in range(3)]  # fetch the three smallest entries
[-5, 0, 1]

11.8. Decimal Floating Point Arithmetic 十进制浮点数计算

The decimal module offers a Decimal datatype for decimal floating point arithmetic. Compared to the built-in float implementation of binary floating point, the new class is especially helpful for financial applications and other uses which require exact decimal representation, control over precision, control over rounding to meet legal or regulatory requirements, tracking of significant decimal places, or for applications where the user expects the results to match calculations done by hand.

decimal 模块为十进制浮点数计算提供了一个 Decimal 数据类型。 与内置的 float 二进制浮点数实现相比,新类对商业应用程序和其他诸如需要精确十进制表示、控制精度、为法律或管理的需要控制舍入、确保小数点的有效位数的应用要求,或者那些用户想要控制数学计算结果的应用程序尤为有用。

For example, calculating a 5% tax on a 70 cent phone charge gives different results in decimal floating point and binary floating point. The difference becomes significant if the results are rounded to the nearest cent.

例如,计算一次70分钟电话费对应的5%的税费,使用十进制浮点数和二进制浮点数的结果是不同的。 如果要对最接近的分钟数进行舍入,这种差别就变得很重要。

>>> from decimal import *
>>> Decimal('0.70') * Decimal('1.05')
Decimal("0.7350")
>>> .70 * 1.05
0.73499999999999999

The Decimal result keeps a trailing zero, automatically inferring four place significance from multiplicands with two place significance. Decimal reproduces mathematics as done by hand and avoids issues that can arise when binary floating point cannot exactly represent decimal quantities.

Decimal 的结果总会保留结尾中的0,还会从带有两个小数位的被乘数自动推断为4个小数位。 Decimal 让数学计算像手动处理一样,并且避免了当二进制浮点数无法精确的表示小数位时可能出现的结果。

Exact representation enables the Decimal class to perform modulo calculations and equality tests that are unsuitable for binary floating point.

高精度使得 Decimal 类可以进行那些不适合二进制浮点数的模运算和等式测试。

>>> Decimal('1.00') % Decimal('.10')
Decimal("0.00")
>>> 1.00 % 0.10
0.09999999999999995

>>> sum([Decimal('0.1')]*10) == Decimal('1.0')
True
>>> sum([0.1]*10) == 1.0
False

The decimal module provides arithmetic with as much precision as needed.

Decimal 模块为算术提供了需要的高精度支持。

>>> getcontext().prec = 36
>>> Decimal(1) / Decimal(7)
Decimal("0.142857142857142857142857142857142857")