文章来自《Python cookbook》. 翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!
-- 61.182.251.99 [DateTime(2004-09-22T19:44:16Z)] TableOfContents
描述
Reading a Text File by Paragraphs
按段落读取文本文件
Credit: Alex Martelli, Magnus Lie Hetland
问题 Problem
You need to read a file paragraph by paragraph, in which a paragraph is defined as a sequence of nonempty lines (in other words, paragraphs are separated by empty lines).
需要按段落读取文件,段落的定义是由非空行组成的行序列(既空行 分隔段落)
解决 Solution
A wrapper class is, as usual, the right Pythonic architecture for this (in Python 2.1 and earlier):
按照Python语言风格(在Python 2.1及更早版本中)普通的解决架构是使用一个包装类(wrapper class):
class Paragraphs: def _ _init_ _(self, fileobj, separator='\n'): # Ensure that we get a line-reading sequence in the best way possible: # 保证用最佳方法读取行系列 (#译注:困惑阿,xreadlines在2.3?中已经deprecated了) import xreadlines try: # Check if the file-like object has an xreadlines method # 检查可能是文件对象的参数是否具有'''xreadlines'''方法 self.seq = fileobj.xreadlines( ) except AttributeError: # No, so fall back to the xreadlines module's implementation # 如果参数对象不具有xreadlines方法,使用xreadlines模块的实现 self.seq = xreadlines.xreadlines(fileobj) self.line_num = 0 # current index into self.seq (line number) #实例变量, 行号索引, self.para_num = 0 # current index into self (paragraph number) #实例变量,段落号索引, # Ensure that separator string includes a line-end character at the end #检查参数'''分隔字符串'''末尾包含 '\n' if separator[-1:] != '\n': separator += '\n' self.separator = separator #实例变量,段落号索引, def _ _getitem_ _(self, index): if index != self.para_num: raise TypeError, "Only sequential access supported" self.para_num += 1 # Start where we left off and skip 0+ separator lines while 1: # Propagate IndexError, if any, since we're finished if it occurs line = self.seq[self.line_num] self.line_num += 1 if line != self.separator: break # Accumulate 1+ nonempty lines into result result = [line] while 1: # Intercept IndexError, since we have one last paragraph to return try: # Let's check if there's at least one more line in self.seq line = self.seq[self.line_num] except IndexError: # self.seq is finished, so we exit the loop break # Increment index into self.seq for next time self.line_num += 1 if line == self.separator: break result.append(line) return ''.join(result) # Here's an example function, showing how to use class Paragraphs: def show_paragraphs(filename, numpars=5): pp = Paragraphs(open(filename)) for p in pp: print "Par#%d, line# %d: %s" % ( pp.para_num, pp.line_num, repr(p)) if pp.para_num>numpars: break
讨论 Discussion
...