文章来自《Python cookbook》. 翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!
-- 61.182.251.99 [DateTime(2004-09-21T22:29:37Z)] TableOfContents
描述
处理文件中每个单词
Credit: Luther Blissett
问题 Problem
You need to do something to every word in a file, similar to the foreach function of csh.
需要处理文件中每个单词, 类似于csh的foreach功能。
解决 Solution
This is best handled by two nested loops, one on lines and one on the words in each line:
最佳方法是使用2层嵌套循环,对文件的各行循环和对每行内的单词循环:
for line in open(thefilepath).xreadlines( ): #方法1
for word in line.split( ):
dosomethingwith(word)This implicitly defines words as sequences of nonspaces separated by sequences of spaces (just as the Unix program wc does).
代码中隐含单词定义是:被空白符号分开的非空白符号的系列(同Unix程序wc一样)。
For other definitions of words, you can use regular expressions. For example:
对于单词的不同定义,可以使用正规表达式,比如:
import re
re_word = re.compile(r'[\w-]+')
for line in open(thefilepath).xreadlines( ):
for word in re_word.findall(line):
dosomethingwith(word)In this case, a word is defined as a maximal sequence of alphanumerics and hyphens. 此时,单词的定义是:由字母、数字和-的组成的最长序列(#yi贪婪查找)
For other definitions of words you will obviously need different regular expressions. The outer loop, on all lines in the file, can of course be done in many ways. The xreadlines method is good, but you can also use the list obtained by the readlines method, the standard library module fileinput, or, in Python 2.2, even just: for line in open(thefilepath): which is simplest and fastest. In Python 2.2, it's often a good idea to wrap iterations as iterator objects, most commonly by simple generators: from _ _future_ _ import generators def words_of_file(thefilepath): for word in words_of_file(thefilepath): This approach lets you separate, cleanly and effectively, two different concerns: how to iterate over all items (in this case, words in a file) and what to do with each item in the iteration. Once you have cleanly encapsulated iteration concerns in an iterator object (often, as here, a generator), most of your uses of iteration become simple for statements. You can often reuse the iterator in many spots in your program, and if maintenance is ever needed, you can then perform it in just one place梩he definition of the iterator梤ather than having to hunt for all uses. The advantages are thus very similar to those you obtain, in any programming language, by appropriately defining and using functions rather than copying and pasting pieces of code all over the place. With Python 2.2's iterators, you can get these advantages for looping control structures, too.
Documentation for the fileinput module in the Library Reference; PEP 255 on simple generators (http://www.python.org/peps/pep-0255.html); Perl Cookbook Recipe 8.3. 讨论 Discussion
参考 See Also
