Differences between revisions 1 and 4 (spanning 3 versions)
Revision 1 as of 2007-10-14 13:15:38
Size: 8464
Editor: lwl
Comment:
Revision 4 as of 2008-01-27 10:47:02
Size: 129
Editor: lwl
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[[TableOfContents]]

CHAPTER III -- REGULAR EXPRESSIONS
== 第三章 --- 正则表达式 ==

Regular expressions allow extremely valuable text processing
techniques, but ones that warrant careful explanation. Python's
[re] module, in particular, allows numerous enhancements to basic
regular expressions (such as named backreferences, lookahead
assertions, backreference skipping, non-greedy quantifiers, and
others). A solid introduction to the subtleties of regular
expressions is valuable to programmers engaged in text processing
tasks.

正则表达式是极有价值的文字处理技术,不过也需要详细的解释。
Python的[re]模块,特别为基本的正则表达式(比如向回引用<+??+>,向前断言<+??+>,
略过向回引用<+??+>,非贪婪性限定词,以及其他)增加了众多的增强。
一个文章如能详细介绍正则表达式的精妙之处,会让从事文本处理的程序员觉得
很有价值。


The prequel of this chapter contains a tutorial on regular
expressions that allows a reader unfamiliar with regular
expressions to move quickly from simple to complex elements of
regular expression syntax. This tutorial is aimed primarily at
beginners, but programmers familiar with regular expressions in
other programming tools can benefit from a quick read of the
tutorial, which explicates the particular regular expression
dialect in Python.

本章的导语包含了一个关于正则表达式的指导手册,它可以帮助
不熟悉正则表达式的读者迅速由简到繁地掌握相关语法。
该手册主要针对初学者,如果读者熟悉其他编程工具中的正则表达式,
也可以快速阅读该手册来获取Python中关于正则表达式的特别方言。


It is important to note up-front that regular expressions,
while very powerful, also have limitations. In brief, regular
expressions cannot match patterns that nest to arbitrary
depths. If that statement does not make sense, read Chapter 4,
which discusses parsers--to a large extent, parsing exists to
address the limitations of regular expressions. In general, if
you have doubts about whether a regular expression is
sufficient for your task, try to understand the examples in
Chapter 4, particularly the discussion of how you might spell a
floating point number.

坦白说正则表达式很强大,但是也并非万能。简单来说,正则表达式
不能匹配那些有无限制嵌套的模式。如果不明白以上说法,请参阅第四章,
那里广泛地讨论了解析器,还通过解析现存<+不解+>来说明正则表达式的
不足之处。广泛来说,如果你怀疑正则表达式能否胜任你的工作,可以
尝试去理解第四章中的例子,特别是关于拼写浮点数之可能性的讨论。


Section 3.1 examines a number of text processing problems that
are solved most naturally using regular expression. As in
other chapters, the solutions presented to problems can
generally be adopted directly as little utilities for performing
tasks. However, as elsewhere, the larger goal in presenting
problems and solutions is to address a style of thinking about
a wider class of problems than those whose solutions are
presented directly in this book. Readers who are interested
in a range of ready utilities and modules will probably want to
check additional resources on the Web, such as the Vaults of
Parnassus <http://www.vex.net/parnassus/> and the Python
Cookbook <http://aspn.activestate.com/ASPN/Python/Cookbook/>.

3.1节检查了一些用正则表达式自然解决的文本处理问题。和其他章一样,
相关的解决方案可以直接被采用为完成任务的小工具。但是,读者需要
注意到问题背后所表达的关于更广问题的思维方式,而非仅仅这些代码。
读者如果对现成的工具和模块有兴趣,可以查看网上的资源,比如说
the Vaults of Parnassus <http://www.vex.net/parnassus/> 和
the Python Cookbook <http://aspn.activestate.com/ASPN/Python/Cookbook/>.



Section 3.2 is a "reference with commentary" on the Python
standard library modules for doing regular expression tasks.
Several utility modules and backward-compatibility regular
expression engines are available, but for most readers, the only
important module will be [re] itself. The discussions
interspersed with each module try to give some guidance on why
you would want to use a given module or function, and the
reference documentation tries to contain more examples of actual
typical usage than does a plain reference. In many cases, the
examples and discussion of individual functions address common
and productive design patterns in Python. The cross-references
are intended to contextualize a given function (or other thing)
in terms of related ones (and to help a reader decide which is
right for her). The actual listing of functions, constants,
classes, and the like are in alphabetical order within each
category.

3.2节是个带评论的参考手册,介绍了如何使用Python标准库模块来完成正则表达式任务。
其中涉及了若干工具模块和向后兼容的正则表达式引擎,但是对绝大部分读者
而言,唯一重要的模块就是 [re] 自己。讨论散布于各个模块,并试图
指引你明白为何使用给出的模块或者函数。参考文档则比普通参考包含了
更多的实际使用例子。在许多情况下,每个独立函数的例子和讨论说明了
Python中普通而又多产的设计模式。交叉引用意图对给定的函数(或者其他东西)
给出上下文关系,列举相关内容(这样读者可以自行决定什么合适自己)。
每个分类都按照字母表列出了函数,常数,类还有相似的<+??+>。
  
  
SECTION 0 -- A Regular Expression Tutorial
=== 第0节 -- 一个关于正则表达式的简明教程 ===
{{{
    Some people, when confronted with a problem, think "I know,
    I'll use regular expressions." Now they have two problems.
     -- Jamie Zawinski, '<alt.religion.emacs>' (08/12/1997)
 有的人遇到问题时候会想:“我知道,我可以使用正则表达式。”
 然后他们就有了两个问题。
     -- Jamie Zawinski, '<alt.religion.emacs>' (08/12/1997)
}}}
  
TOPIC -- Just What is a Regular Expression, Anyway?
==== 主题 -- 废话少说,什么是正则表达式? ====
  
Many readers will have some background with regular
expressions, but some will not have any. Those with
experience using regular expressions in other languages (or in
Python) can probably skip this tutorial section. But readers
new to regular expressions (affectionately called 'regexes' by
users) should read this section; even some with experience can
benefit from a refresher.

并非所有读者都接触过正则表达式,正则表达式的新朋友们
(他们被亲切地称呼为'regexes')应该读一读本节。
如果你有过此类经验(无论是否与Python相关),均可略过本节。
不过温故知新,再看一遍也许有新的发现呢。

A regular expression is a compact way of describing complex
patterns in texts. You can use them to search for patterns
and, once found, to modify the patterns in complex ways. They
can also be used to launch programmatic actions that depend on
patterns.

一个正则表达式是一种简洁描述文本中的复杂模式的方法。你可以用它们
来搜索模式,一旦找到,你就可以用复杂的方法修改模式。他们还可以
用来发起一些依赖于模式的计划性行动。

Jamie Zawinski's tongue-in-cheek comment in the epigram is
worth thinking about. Regular expressions are amazingly
powerful and deeply expressive. That is the very reason that
writing them is just as error-prone as writing any other
complex programming code. It is always better to solve a
genuinely simple problem in a simple way; when you go beyond
simple, think about regular expressions.

Jamie Zawinski在其讽刺短诗中半开玩笑的评论值得深思。正则表达式
具有让人惊讶无比的能力,同时也富有表现力。这个也正是为什么
编写它们和编写其他复杂程序代码一样容易出错。如果能用一种简单
的方法解决一个真正简单的问题总是更好;当你超越了简单,请想想
正则表达式。
 * 已经翻译好了,请到google code上签出。
 * 将来用脚本统一转换成MoinMoin格式。
 * attachment:re.png
  • 已经翻译好了,请到google code上签出。
  • 将来用脚本统一转换成MoinMoin格式。

  • attachment:re.png

TPiP/Chap3 (last edited 2009-12-25 07:13:53 by localhost)