Differences between revisions 12 and 14 (spanning 2 versions)
Revision 12 as of 2006-03-27 15:02:13
Size: 21573
Editor: wolfg
Comment:
Revision 14 as of 2006-03-27 16:24:09
Size: 27375
Editor: wolfg
Comment:
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
["wolfg"] “无情重构”虽不是专有名词,但也快成固定说法了。不妨改成这样:最大的好处是单元测试给了你 <!> 很酷的重构 ^自由去无情地重构^。况且mercilessly本意就有“残忍地”意思。 “XP中的重构实践有一个修饰词,称为无情。” [http://www.xprogramming.com/SD2000Tutorial/sld067.htm] [http://www.extremeprogramming.org/rules/refactor.html] ["wolfg"] “无情重构”虽不是专有名词,但也快成固定说法了。不妨改成这样:最大的好处是单元测试给了你 <!> 很酷的重构 ^自由去无情地重构^。况且mercilessly本意就有“残忍地”意思。 “XP中的重构实践有一个修饰词,称为无情。”参加下面的链接:

 *
[http://www.xprogramming.com/SD2000Tutorial/sld067.htm]

 *
[http://www.extremeprogramming.org/rules/refactor.html]
Line 177: Line 181:
 * Just a note in passing here: this time, I ran the unit test without the -v option, so instead of the full doc string for each test, you only get a dot for each test that passes. (If a test failed, you'd get an F, and if it had an error, you'd get an E. You'd still get complete tracebacks for each failure and error, so you could track down any problems.)
 * 有一点说明一下:这里,我在进行单元测试时没有使用 -v 选项,因此输出的也不再是每个测试完整的 doc string,而是每个测试的通过以一个点来标示。(失败的测试标以 F, 发生错误则标以 E, 你仍旧可以获得失败和错误的完整追踪以便查找问题所在)
 * Just a note in passing here: this time, I ran the unit test without the -v option, so instead of the full doc string for each test, you only get a dot for each test that passes. (If a test failed, you'd get an F , and if it had an error, you'd get an E. You'd still get complete tracebacks for each failure and error, so you could track down any problems.)
 * 有一点说明一下:这里,我在 <!> ^运^行单元测试时没有使用 -v 选项,因此输出的也不再是每个测试完整的 doc string,而是 <!> 每个测试的通过以一个点来标示^用一个圆点来表示每个通过的测试^。(失败的测试 <!> 标以 F^用F表示^, 发生错误则 <!> 标以 E^E表示^, 你仍旧可以获得失败和错误的完整追踪 <!> ^信息^以便查找问题所在 <!> ^。^
Line 180: Line 184:
 * <!> 不预先编译正则表达式时,你可以在 3.385 秒之内完成 13 个测试。^运行13个测试耗时3.385秒,与没有预编译正则表达式时的3.685秒比比。^这是一个 8% 的整体提速,记住单元测试的大量时间实际上花在做其他工作上。(我 <!> 另外^单独^测试了正则表达式部分的耗时,不考虑单元测试的其他环节,正则表达式编译可以 <!>另 search ^让匹配平均^提速 54% 。)小小修改还真是值得。  * <!> 不预先编译正则表达式时,你可以在 3.385 秒之内完成 13 个测试。^运行13个测试耗时3.385秒,与之相比是没有预编译正则表达式时的3.685秒。^这是一个 8% 的整体提速,记住单元测试的大量时间实际上花在做其他工作上。(我 <!> 另外^单独^测试了正则表达式部分的耗时,不考虑单元测试的其他环节,正则表达式编译可以 <!>另 search ^让匹配平均^提速 54% 。)小小修改还真是值得。
Line 185: Line 189:

 * There is one other performance optimization that I want to try. Given the complexity of regular expression syntax, it should come as no surprise that there is frequently more than one way to write the same expression. After some discussion about this module on comp.lang.python, someone suggested that I try using the {m,n} syntax for the optional repeated characters.
 * 我还想做另外一个性能优化工作。就 <!> 复杂的正则表达式语法^正则表达式语法的复杂性^而言,通常有不止一种方法来构造相同的 <!> 表示并不令人惊讶^表达式是不会令人惊讶的^。 在 comp.lang.python 上 <!> 关于^对^该模块的 <!> 讨论中^进行一些讨论后^,有人建议我使用 {m,n} 语法来 <!> 指代^查找^可选重复字符。

----

'''Example 15.13. roman82.py'''

 * This file is available in py/roman/stage8/ in the examples directory.
 * 这个文件可以 <!> 从 ^在例子目录下的^ py/roman/stage8/ --(的 examples)-- 目录中 <!> 获得 ^找到^ 。

----

'''Example 15.13.的说明'''
 * You have replaced M?M?M?M? with M{0,4}. Both mean the same thing: “match 0 to 4 M characters”. Similarly, C?C?C? became C{0,3} (“match 0 to 3 C characters”) and so forth for X and I.
 * 你 <!> 可以^已经^将 M?M?M?M? 替换为 M{0,4}。 <!> 他们^它们^的含义相同: “匹配 0 到 4 个 M 字符”。 类似地, C?C?C? <!> 可以改成^改成了^ C{0,3} (“匹配 0 到 3 个 C 字符”) 接下来的 <!> 修改 X 和 I 的匹配^X 和 I 也一样^。

----

 * This form of the regular expression is a little shorter (though not any more readable). The big question is, is it any faster?
 * 这样的正则表达简短一些 (虽然 <!> 不那么直观^可读性不太好^)。 核心问题是,是否能加快速度?

----

'''Example 15.14.的结果说明'''
 * 1 Overall, the unit tests run 2% faster with this form of regular expression. That doesn't sound exciting, but remember that the search function is a small part of the overall unit test; most of the time is spent doing other things. (Separately, I time-tested just the regular expressions, and found that the search function is 11% faster with this syntax.) By precompiling the regular expression and rewriting part of it to use this new syntax, you've improved the regular expression performance by over 60%, and improved the overall performance of the entire unit test by over 10%.
 * 1 总体而言, 这种正则表达使单元测试提速 2%。 这不太令人振奋,但记住 search 函数只是整体单元测试的一个小部分,很多时间花在了其他方面。 (我另外的测试表明这个 <!> 应用新语法的正则表达另^应用了新语法的正则表达式使^ search 函数提速 11% 。) 通过预先编译--(正则表达)--和 <!> 以新语法写正则表达另^使用新语法重写可以使^正则表达式 <!> 提速^的性能提升^超过 60%,另单元测试的整体性能提升超过 10%.
 * 2 More important than any performance boost is the fact that the module still works perfectly. This is the freedom I was talking about earlier: the freedom to tweak, change, or rewrite any piece of it and verify that you haven't messed anything up in the process. This is not a license to endlessly tweak your code just for the sake of tweaking it; you had a very specific objective (“make fromRoman faster”), and you were able to accomplish that objective without any lingering doubts about whether you introduced new bugs in the process.
 * 2 比任何的性能提升更重要的是模块仍然运转完好。 这便是我早先提到的自由:自由地 <!> 挪动^调整^、修改或者重写任何部分 <!> 而不至于弄出乱子^并且保证在此过程中没有把事情搞得一团糟^。 这并不是 <!> 为无端的更改代码给出借口^给只是为了调整代码而无休止地调整以许可^,--(而是指)--你有很切实的目标(“ <!> 另^让^ fromRoman 更快”), <!> 并且不至于因为怕引入新的错误而迟疑^而且你可以实现这个目标,不会因为考虑在改动过程中是否会引入新的bug而有所迟疑^。

----

 * One other tweak I would like to make, and then I promise I'll stop refactoring and put this module to bed. As you've seen repeatedly, regular expressions can get pretty hairy and unreadable pretty quickly. I wouldn't like to come back to this module in six months and try to maintain it. Sure, the test cases pass, so I know that it works, but if I can't figure out how it works, it's still going to be difficult to add new features, fix new bugs, or otherwise maintain it. As you saw in Section 7.5, “Verbose Regular Expressions”, Python provides a way to document your logic line-by-line.
 * 还有另外一个我想做的 <!> 改动 ^调整^,我保证这是最后一个,之后我会停下来,让这个模块歇歇。就像你多次看到的,正则表达式越晦涩难懂越快 <!> ,^。^我可不想在六个月内再回头试图维护它。是呀! <!> 独立测试^测试用例^通过了,我便知道它工作正常,但如果我搞不懂它是如何工作的,添加新功能, <!> 修改^修正^新Bug,或者维护它 <!> 将^都将^变得很困难。 正如你在 第 7.5 节 “松散正则表达式”--(,)-- 看到的, Python 提供了 <!> 单行进行逻辑注释^逐行注释你的逻辑^的方法。
 * {i} 正则表达式越晦涩难懂越快 译得精彩 :)
 * {i} 第 7.5 节 将 Verbose Regular Expressions 翻译成“松散正则表达式”似乎不妥。

----

2006-03-27 校对记要

TableOfContents

校对文件: roman.xml

章节: [http://www.woodpecker.org.cn/obp/diveintopython-zh-5.4/zh-cn/dist/htmlflat/diveintopython.html#roman2 第15章]

JasonXie 没有什么原则性争议问题,我没有意见。只是15章 第3节 第一段 无情重构 的改法有些不妥,容易让读者当成是一个专有名词。

["wolfg"] “无情重构”虽不是专有名词,但也快成固定说法了。不妨改成这样:最大的好处是单元测试给了你 <!> 很酷的重构 自由去无情地重构。况且mercilessly本意就有“残忍地”意思。 “XP中的重构实践有一个修饰词,称为无情。”参加下面的链接:

第15章第2节

[http://www.woodpecker.org.cn/obp/diveintopython-zh-5.4/zh-cn/dist/htmlflat/diveintopython.html#roman.change]

标题

  • Handling changing requirements
  • 应对 <!> 要求需求变化


第一段

  • Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible nasty things involving scissors and hot wax, requirements will change. Most customers don't know what they want until they see it, and even if they do, they aren't that good at articulating what they want precisely enough to be useful. And even if they do, they'll want more in the next release anyway. So be prepared to update your test cases as requirements change.
  • 尽管你竭尽努力地分析你的客户需求,并点灯熬油地提炼出 <!> 相应精确<!>求,但 <!> 要求是需求还是会变化。 大部分客户在看到产品前不知道他们 <!>想要什么。即便知道,也不 <!> 能很好地善于精确表述出他们的有效需求。即便能表述出来,他们 <!> 下一次在下一个版本一定 <!>会要求更多的功能。 因此你需要 <!> 让你的独立测试做好更新准备做好更新测试用例的准备以应对 <!> 要求需求的改变。


第2段

  • Suppose, for instance, that you wanted to expand the range of the Roman numeral conversion functions. Remember the rule that said that no character could be repeated more than three times? Well, the Romans were willing to make an exception to that rule by having 4 M characters in a row to represent 4000. If you make this change, you'll be able to expand the range of convertible numbers from 1..3999 to 1..4999. But first, you need to make some changes to the test cases.
  • 假设你想要扩展罗马数字转换 <!> 函数的范围。 <!> 记住 这条原则 :没有哪个字符可以重复三遍以上 还记得“没有哪个字符可以重复三遍以上”这条规则 吗<!> 啊哈!罗马数字可以连续出现 4 次M 字符来表示 4000 便是一个例外呃, 现在罗马人希望给这条规则来个例外,用连续出现 4 个M 字符来表示 4000。 如果 <!> 你做出这个改变这样改了,你 <!>可以把转换范围从 1..3999 扩展到 1..4999。 但首先,你先要对 <!> 独立测试测试用例进行修改。


Example 15.6

  • Example 15.6. Modifying test cases for new requirements (romantest71.py)
  • 例 15.6. <!> 修改独立测试以适应新要求 为新需求修改测试用例(romantest71.py)

  • This file is available in py/roman/stage7/ in the examples directory.
  • 这个文件可以 <!>在例子目录下的 py/roman/stage7/ 的 examples 目录中 <!> 获得 找到


Example 15.6 的说明

  • 1 The existing known values don't change (they're all still reasonable values to test), but you need to add a few more in the 4000 range. Here I've included 4000 (the shortest), 4500 (the second shortest), 4888 (the longest), and 4999 (the largest).
  • 1 原来的已知值没有改变(它们仍然是合理的测试值)但你需要添加几个大于 4000 的值。 这里我添加了 4000 (罗马数字表示最短的一个), 4500 (次短的一个), 4888 (最长的一个)和 4999 (值最大的一个)。

  • 2 The definition of “large input” has changed. This test used to call toRoman with 4000 and expect an error; now that 4000-4999 are good values, you need to bump this up to 5000.
  • 2 “最大输入”的 <!> 概念定义改变了。 以前是以 4000 调用 toRoman 并期待一个错误;而现在 4000-4999 成为了有效输入,需要将这个 <!> 测试的对象“最大输入”提升至 5000。

  • 3 The definition of “too many repeated numerals” has also changed. This test used to call fromRoman with 'MMMM' and expect an error; now that MMMM is considered a valid Roman numeral, you need to bump this up to 'MMMMM'.
  • 3 “过多字符重复” 的 <!> 概念定义也改变了。 这个测试以前是以 'MMMM' 调用 <!> fromRoman 并期待一个错误;而现在 MMMM 被认为是一个有效的罗马数字表示,需要将这个 <!> 测试的对象提升至“过多字符重复”改为 'MMMMM'。

  • 4 The sanity check and case checks loop through every number in the range, from 1 to 3999. Since the range has now expanded, these for loops need to be updated as well to go up to 4999.
  • 4 <!> 回旋完备测试和大小写测试 <!> 原来在 1 到 3999 范围内循环。由于现在范围扩展了,这个 for 循环 <!> 需要将范围提升至 4999。


  • Now your test cases are up to date with the new requirements, but your code is not, so you expect several of the test cases to fail.
  • 现在你的 <!> 独立测试已经根据要求完成了更新测试用例和新需求保持一致了, 但是你的程序代码还没有,因此几个 <!> 独立测试测试用例的失败是意料之中的事。

  • Example 15.7. Output of romantest71.py against roman71.py
  • 例 15.7. <!> romantest71.py 测试 roman71.py 的 <!> 输出结果


对输出的说明

  • 1 Our case checks now fail because they loop from 1 to 4999, but toRoman only accepts numbers from 1 to 3999, so it will fail as soon the test case hits 4000.
  • 1 我们的 <!> 独立测试大小写检查失败是 <!>因为循环范围是 1 到 4999 而导致<!> 但是 toRoman 只接受 1 到 3999的 <!> 范围之间的数,因此 <!> 测试到达循环到 4000 就会失败。

  • 2 The fromRoman known values test will fail as soon as it hits 'MMMM', because fromRoman still thinks this is an invalid Roman numeral.
  • 2 fromRoman 的已知值测试在遇到 'MMMM' 就会失败,因为 fromRoman 还认为这是一个无效的罗马数字表示。
  • 3 The toRoman known values test will fail as soon as it hits 4000, because toRoman still thinks this is out of range.
  • 3 toRoman 的已知值测试在遇到 4000 就会失败,因为 toRoman 仍旧认为这 <!>出了有效值范围。

  • 4 The sanity check will also fail as soon as it hits 4000, because toRoman still thinks this is out of range.
  • 4 <!> 回旋 完备测试在遇到 4000 <!> 便也会失败,因为 toRoman 还认为这 <!>出了有效值范围。


  • Now that you have test cases that fail due to the new requirements, you can think about fixing the code to bring it in line with the test cases. (One thing that takes some getting used to when you first start coding unit tests is that the code being tested is never “ahead” of the test cases. While it's behind, you still have some work to do, and as soon as it catches up to the test cases, you stop coding.)
  • <!> 现在你遭遇了要求改变带来的独立测试失败既然新的需求导致了测试用例的失败,你 <!> 可以考虑修改代码 <!> 使它再能以便它能再次通过 <!> 独立测试测试用例。( <!> 在引入单元测试之后的一个必然现象是在你开始编写单元测试时要习惯一件事:被测试代码永远不会 <!> 走在独立测试“之前”在编写测试用例“之前”编写。正因为如此,你还有一些工作要做, <!> 直到它赶上独立测试才停下来 一旦可以通过所有的测试用例,停止编码。)

  • Example 15.8. Coding the new requirements (roman72.py)
  • 例 15.8. <!> 根据新要求修改代码为新的需求编写代码 (roman72.py)

  • This file is available in py/roman/stage7/ in the examples directory.
  • 这个文件可以 <!>在例子目录下的 py/roman/stage7/ 的 examples 目录中 <!> 获得 找到


Example 15.8 的说明

  • 1 toRoman only needs one small change, in the range check. Where you used to check 0 < n < 4000, you now check 0 < n < 5000. And you change the error message that you raise to reflect the new acceptable range (1..4999 instead of 1..3999). You don't need to make any changes to the rest of the function; it handles the new cases already. (It merrily adds 'M' for each thousand that it finds; given 4000, it will spit out 'MMMM'. The only reason it didn't do this before is that you explicitly stopped it with the range check.)

  • 1 toRoman 只需要在取值范围检查一处做 <!> 出微小修改个小改动。将原来的 0 < n < 4000,更改为现在的检查 0 < n < 5000。 你还要更改你 raise 的错误信息以反映接受新取值范围(1..4999 而不再是 1..3999)。 你不需要改变函数的其他部分,它们已经适用于新的 <!> 变化情况。(它们会欣然地为新的 <!> 一千1000添加'M',以 4000为例,他们会返回 'MMMM' )之前没能这样做是因为 <!> 范围检查时就被停了下来。)

  • 2 You don't need to make any changes to fromRoman at all. The only change is to romanNumeralPattern; if you look closely, you'll notice that you added another optional M in the first section of the regular expression. This will allow up to 4 M characters instead of 3, meaning you will allow the Roman numeral equivalents of 4999 instead of 3999. The actual fromRoman function is completely general; it just looks for repeated Roman numeral characters and adds them up, without caring how many times they repeat. The only reason it didn't handle 'MMMM' before is that you explicitly stopped it with the regular expression pattern matching.
  • 2 你对 fromRoman 也不需要做过多的修改。 唯一的修改就在 romanNumeralPattern:如果你注意的话,你会发现只需在正则表达式的第一部分增加一个可选的 M 。 <!> 可以这就允许最多 4 个 M 字符而不是 3 个,意味着你允许的 <!> 罗马数字可以到相当于 4999 而不再是 3999 代表 4999 而不是 3999的罗马数字。 fromRoman 函数本身是普遍适用的,它并不在意字符被多少次的重复,只是根据重复的罗马字符对应的数值进行累加。 以前没能处理 'MMMM' 是因为你通过正则表达式的检查强行停止了。


  • You may be skeptical that these two small changes are all that you need. Hey, don't take my word for it; see for yourself:
  • 你可能会怀疑 <!> 两条的微小改变竟是你需要做的一切你所需的就是这两处小改动<!> 不必相信我嘿,不相信我的话,你自己看看吧:

  • Example 15.9. Output of romantest72.py against roman72.py
  • <!> 例 15.9. Output of romantest72.py against roman72.py例 15.9. 用 romantest72.py 测试 roman72.py 的 结果


  • All the test cases pass. Stop coding.
  • <!> 通过了所有的独立测试,停止代码编写所有的测试用例都通过了,停止编写代码


最后一句

  • Comprehensive unit testing means never having to rely on a programmer who says “Trust me.”
  • 全面 <!>单元测试意味着不必依赖于程序员的一面之词: “相信我!”

第15章第3节

[http://www.woodpecker.org.cn/obp/diveintopython-zh-5.4/zh-cn/dist/htmlflat/diveintopython.html#roman.refactoring]

标题

  • 15.3. Refactoring
  • 15.3. <!> 重组重构


第1段

  • The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually prove that you didn't. The best thing about unit testing is that it gives you the freedom to refactor mercilessly.
  • <!> 完备全面的单元测试带来的最大好处 <!> 并不是在 不是你的全部 <!> 独立测试测试用例 <!>最终通过之时 <!> 的成就感;也不是 <!> 受到责怪时,证明你没有扰乱别人的代码被责怪破坏了别人的代码时能够证明自己的自信。最大的好处是单元测试给了你 <!> 很酷的重构无情重构的自由。


第2段

  • Refactoring is the process of taking working code and making it work better. Usually, “better” means “faster”, although it can also mean “using less memory”, or “using less disk space”, or simply “more elegantly”. Whatever it means to you, to your project, in your environment, refactoring is important to the long-term health of any program.
  • 重构是在可运行代码的基础上使之更良好工作的 <!> 工作过程。 通常,“更好”意味着“更快”,也可能意味着 “使用更少的内存”,或者 “使用更少的磁盘空间”,或者仅仅是“更 <!> 有格调优雅的代码”。 <!> 无论是不管对你,对你的项目 <!> 意味什么<!> 对你的处境来讲在你的环境中,重构对任何程序的长期良性运转都是重要的。


第3段

  • Here, “better” means “faster”. Specifically, the fromRoman function is slower than it needs to be, because of that big nasty regular expression that you use to validate Roman numerals. It's probably not worth trying to do away with the regular expression altogether (it would be difficult, and it might not end up any faster), but you can speed up the function by precompiling the regular expression.
  • 这里, “更好” 意味着 “更快”。更具体地说, fromRoman 函数可以更快,关键在于 <!> 面目可憎那个丑陋的、用于验证罗马数字有效性的正则表达式。也许 <!> 在正则表达式本身上的努力怎样都并不值得尝试不用正则表达式去解决是不值得的,(这样做很难,而且可能也快不了多少),但可以通过预编译正则表达式 <!>使函数提速。


例 15.10.的说明

  • 1 This is the syntax you've seen before: re.search takes a regular expression as a string (pattern) and a string to match against it ('M'). If the pattern matches, the function returns a match object which can be queried to find out exactly what matched and how.
  • 1 这是你曾在 re.search 中看到的语法。 把一个正则表达式作为字符串(pattern)并用 <!> 这个字符串来匹配('M')。 如果能够匹配,如果匹配成功便有函数返回 <!> 所匹配的对象用以确定匹配的形式和内容一个match对象,可以用来确定匹配的部分和如何匹配的

  • 2 This is the new syntax: re.compile takes a regular expression as a string and returns a pattern object. Note there is no string to match here. Compiling a regular expression has nothing to do with matching it against any specific strings (like 'M'); it only involves the regular expression itself.
  • 2 这里是一个新的语法: re.compile 把一个正则表达式作为字符串 <!> 参数接受并返回一个 <!> 模版(pattern)pattern对象。注意这里没有 <!> 字符产的匹配去匹配字符串。编译正则表达式和以特定字符串('M')进行匹配不是一回事,所牵扯的只是正则表达式本身。

  • 3 The compiled pattern object returned from re.compile has several useful-looking functions, including several (like search and sub) that are available directly in the re module.
  • 3 re.compile 返回的 <!> 被编译模版已编译的pattern对象 有几个值得关注的功能:包括了几个 re 模块直接提供的功能(比如: search 和 sub)。

  • 4 Calling the compiled pattern object's search function with the string 'M' accomplishes the same thing as calling re.search with both the regular expression and the string 'M'. Only much, much faster. (In fact, the re.search function simply compiles the regular expression and calls the resulting pattern object's search method for you.)
  • 4 <!> 以 'M' 用 'M' 做参数来调用 <!> 被编译模版对象 已编译的pattern对象的 search 函数与 <!> 凭借正则表达式和字符串 'M' 调用 re.search 可以 <!> 达到得到相同的结果,只是快了很多。(事实上,re.search 函数 <!> 仅仅只是将正则表达式编译,然后为你调用 <!> 的所编译后得到的 <!> 模版对象pattern对象的 search 方法。)


  • Whenever you are going to use a regular expression more than once, you should compile it to get a pattern object, then call the methods on the pattern object directly.
  • 在需要 <!> 不止一次使用 多次使用同一个正则表达式的情况下,应该将 <!> 正则表达式进行编译以获得一个 <!> 模版 pattern对象,然后直接调用 <!> 模版这个pattern对象的方法即可


Example 15.11.

  • Example 15.11. Compiled regular expressions in roman81.py
  • <!> 编译 roman81.py 中的正则表达式roman81.py 中已编译的正则表达式

  • This file is available in py/roman/stage8/ in the examples directory.
  • 这个文件可以 <!>在例子目录下的 py/roman/stage8/ 的 examples 目录中 <!> 获得 找到


Example 15.11. 的说明

  • 1 This looks very similar, but in fact a lot has changed. romanNumeralPattern is no longer a string; it is a pattern object which was returned from re.compile.
  • 1 看起来很相似,但实质却有很大改变。 romanNumeralPattern 不再是一个字符串了,而是一个由 re.compile 返回的 <!> 模版pattern对象。

  • 2 That means that you can call methods on romanNumeralPattern directly. This will be much, much faster than calling re.search every time. The regular expression is compiled once and stored in romanNumeralPattern when the module is first imported; then, every time you call fromRoman, you can immediately match the input string against the regular expression, without any intermediate steps occurring under the covers.
  • 2 这意味着你可以直接调用 romanNumeralPattern 的方法。这比每次调用 re.search <!> 容易很多要快很多。 模块被首次 <!>入(import)之时,正则表达式被一次编译并存储于 romanNumeralPattern。 之后每次调用 fromRoman 时,你可以立刻以正则表达式匹配输入的字符串,而不需要在重复背后的这些 <!> (编译的)工作。


  • Example 15.12. Output of romantest81.py against roman81.py
  • 例 15.12. <!> romantest81.py 测试 roman81.py 的 <!> 输出结果


Example 15.12. 结果的说明

  • Just a note in passing here: this time, I ran the unit test without the -v option, so instead of the full doc string for each test, you only get a dot for each test that passes. (If a test failed, you'd get an F , and if it had an error, you'd get an E. You'd still get complete tracebacks for each failure and error, so you could track down any problems.)
  • 有一点说明一下:这里,我在 <!>行单元测试时没有使用 -v 选项,因此输出的也不再是每个测试完整的 doc string,而是 <!> 每个测试的通过以一个点来标示用一个圆点来表示每个通过的测试。(失败的测试 <!> 标以 F用F表示, 发生错误则 <!> 标以 EE表示, 你仍旧可以获得失败和错误的完整追踪 <!> 信息以便查找问题所在 <!>

  • You ran 13 tests in 3.385 seconds, compared to 3.685 seconds without precompiling the regular expressions. That's an 8% improvement overall, and remember that most of the time spent during the unit test is spent doing other things. (Separately, I time-tested the regular expressions by themselves, apart from the rest of the unit tests, and found that compiling this regular expression speeds up the search by an average of 54%.) Not bad for such a simple fix.
  • <!> 不预先编译正则表达式时,你可以在 3.385 秒之内完成 13 个测试。运行13个测试耗时3.385秒,与之相比是没有预编译正则表达式时的3.685秒。这是一个 8% 的整体提速,记住单元测试的大量时间实际上花在做其他工作上。(我 <!> 另外单独测试了正则表达式部分的耗时,不考虑单元测试的其他环节,正则表达式编译可以 <!>另 search 让匹配平均提速 54% 。)小小修改还真是值得。

  • Oh, and in case you were wondering, precompiling the regular expression didn't break anything, and you just proved it.
  • 对了, <!> 不必顾虑什么,预先编译正则表达式并没有 <!> 对其他部分产生负面影响破坏什么<!> 前面的证实完全可以打消你的这个顾虑你刚刚证实这一点


  • There is one other performance optimization that I want to try. Given the complexity of regular expression syntax, it should come as no surprise that there is frequently more than one way to write the same expression. After some discussion about this module on comp.lang.python, someone suggested that I try using the {m,n} syntax for the optional repeated characters.
  • 我还想做另外一个性能优化工作。就 <!> 复杂的正则表达式语法正则表达式语法的复杂性而言,通常有不止一种方法来构造相同的 <!> 表示并不令人惊讶表达式是不会令人惊讶的。 在 comp.lang.python 上 <!> 关于该模块的 <!> 讨论中进行一些讨论后,有人建议我使用 {m,n} 语法来 <!> 指代查找可选重复字符。


Example 15.13. roman82.py

  • This file is available in py/roman/stage8/ in the examples directory.
  • 这个文件可以 <!>在例子目录下的 py/roman/stage8/ 的 examples 目录中 <!> 获得 找到


Example 15.13.的说明

  • You have replaced M?M?M?M? with M{0,4}. Both mean the same thing: “match 0 to 4 M characters”. Similarly, C?C?C? became C{0,3} (“match 0 to 3 C characters”) and so forth for X and I.
  • <!> 可以已经将 M?M?M?M? 替换为 M{0,4}。 <!> 他们它们的含义相同: “匹配 0 到 4 个 M 字符”。 类似地, C?C?C? <!> 可以改成改成了 C{0,3} (“匹配 0 到 3 个 C 字符”) 接下来的 <!> 修改 X 和 I 的匹配X 和 I 也一样


  • This form of the regular expression is a little shorter (though not any more readable). The big question is, is it any faster?
  • 这样的正则表达简短一些 (虽然 <!> 不那么直观可读性不太好)。 核心问题是,是否能加快速度?


Example 15.14.的结果说明

  • 1 Overall, the unit tests run 2% faster with this form of regular expression. That doesn't sound exciting, but remember that the search function is a small part of the overall unit test; most of the time is spent doing other things. (Separately, I time-tested just the regular expressions, and found that the search function is 11% faster with this syntax.) By precompiling the regular expression and rewriting part of it to use this new syntax, you've improved the regular expression performance by over 60%, and improved the overall performance of the entire unit test by over 10%.
  • 1 总体而言, 这种正则表达使单元测试提速 2%。 这不太令人振奋,但记住 search 函数只是整体单元测试的一个小部分,很多时间花在了其他方面。 (我另外的测试表明这个 <!> 应用新语法的正则表达另应用了新语法的正则表达式使 search 函数提速 11% 。) 通过预先编译正则表达<!> 以新语法写正则表达另使用新语法重写可以使正则表达式 <!> 提速的性能提升超过 60%,另单元测试的整体性能提升超过 10%.

  • 2 More important than any performance boost is the fact that the module still works perfectly. This is the freedom I was talking about earlier: the freedom to tweak, change, or rewrite any piece of it and verify that you haven't messed anything up in the process. This is not a license to endlessly tweak your code just for the sake of tweaking it; you had a very specific objective (“make fromRoman faster”), and you were able to accomplish that objective without any lingering doubts about whether you introduced new bugs in the process.
  • 2 比任何的性能提升更重要的是模块仍然运转完好。 这便是我早先提到的自由:自由地 <!> 挪动调整、修改或者重写任何部分 <!> 而不至于弄出乱子并且保证在此过程中没有把事情搞得一团糟。 这并不是 <!> 为无端的更改代码给出借口给只是为了调整代码而无休止地调整以许可而是指你有很切实的目标(“ <!> fromRoman 更快”), <!> 并且不至于因为怕引入新的错误而迟疑而且你可以实现这个目标,不会因为考虑在改动过程中是否会引入新的bug而有所迟疑


  • One other tweak I would like to make, and then I promise I'll stop refactoring and put this module to bed. As you've seen repeatedly, regular expressions can get pretty hairy and unreadable pretty quickly. I wouldn't like to come back to this module in six months and try to maintain it. Sure, the test cases pass, so I know that it works, but if I can't figure out how it works, it's still going to be difficult to add new features, fix new bugs, or otherwise maintain it. As you saw in Section 7.5, “Verbose Regular Expressions”, Python provides a way to document your logic line-by-line.
  • 还有另外一个我想做的 <!> 改动 调整,我保证这是最后一个,之后我会停下来,让这个模块歇歇。就像你多次看到的,正则表达式越晦涩难懂越快 <!>我可不想在六个月内再回头试图维护它。是呀! <!> 独立测试测试用例通过了,我便知道它工作正常,但如果我搞不懂它是如何工作的,添加新功能, <!> 修改修正新Bug,或者维护它 <!>都将变得很困难。 正如你在 第 7.5 节 “松散正则表达式”, 看到的, Python 提供了 <!> 单行进行逻辑注释逐行注释你的逻辑的方法。

  • {i} 正则表达式越晦涩难懂越快 译得精彩 :)

  • {i} 第 7.5 节 将 Verbose Regular Expressions 翻译成“松散正则表达式”似乎不妥。


DiveIntoPythonZh/2006-03-27 (last edited 2009-12-25 07:13:55 by localhost)