Size: 3183
Comment:
|
Size: 4640
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 25: | Line 25: |
按照Python语言风格(在Python 2.1及更早版本中)普通的解决架构是使用一个'''包装'''类(wrapper class) | 按照Python语言风格(在Python 2.1及更早版本中)普通的正确解决方法的基础是使用一个'''包装'''类(wrapper class): |
Line 32: | Line 34: |
# 保证 | # 保证用最佳方法读取行系列 (#译注:困惑阿,xreadlines在2.3?中已经deprecated了) |
Line 36: | Line 38: |
self.seq = fileobj.xreadlines( ) | # 检查可能是文件对象的参数是否具有'''xreadlines'''方法,以获得对象的各行组成的序列 self.seq = fileobj.xreadlines( ) |
Line 39: | Line 42: |
# 如果参数对象不具有xreadlines方法,使用xreadlines模块的实现 | |
Line 42: | Line 46: |
#实例变量, 行号索引, | |
Line 43: | Line 48: |
#实例变量,段落号索引, | |
Line 45: | Line 50: |
#检查参数'''分隔字符串'''末尾包含 '\n' | |
Line 46: | Line 52: |
self.separator = separator | self.separator = separator #实例变量,行分隔字符串 |
Line 51: | Line 57: |
raise TypeError, "Only sequential access supported" | # 实现仅支持顺序提取,如果下标不合理,抛出TypeError raise TypeError, "Only sequential access supported" |
Line 54: | Line 62: |
while 1: | #从前一段落结束处开始,忽略可能的'''空行''' while 1: #循环处理可能的空行,遇到非空行开始段落处理 |
Line 56: | Line 66: |
line = self.seq[self.line_num] | #前面已经处理下标不合理错误。 这里如果有错误,抛出这个异常 line = self.seq[self.line_num] #仅对空行计数,忽略空行处理 |
Line 58: | Line 70: |
if line != self.separator: break | if line != self.separator: break #遇到非空行, 处理 |
Line 60: | Line 72: |
result = [line] | #添加非空行到结果 result = [line] #开始处理段落 |
Line 65: | Line 79: |
#检查行序列是否还有剩余元素未处理 | |
Line 68: | Line 83: |
#序列已处理完毕,退出循环 | |
Line 72: | Line 88: |
result.append(line) return ''.join(result) |
result.append(line) #添加非空行到结果 return ''.join(result) #对段落行序列使用'''join'''构成段落字符串返回 |
Line 76: | Line 93: |
#如何使用Paragraphs类的函数如下: |
文章来自《Python cookbook》. 翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!
-- 61.182.251.99 [DateTime(2004-09-22T19:44:16Z)] TableOfContents
描述
Reading a Text File by Paragraphs
按段落读取文本文件
Credit: Alex Martelli, Magnus Lie Hetland
问题 Problem
You need to read a file paragraph by paragraph, in which a paragraph is defined as a sequence of nonempty lines (in other words, paragraphs are separated by empty lines).
需要按段落读取文件,段落的定义是由非空行组成的行序列(既空行 分隔段落)
解决 Solution
A wrapper class is, as usual, the right Pythonic architecture for this (in Python 2.1 and earlier):
按照Python语言风格(在Python 2.1及更早版本中)普通的正确解决方法的基础是使用一个包装类(wrapper class):
class Paragraphs: def _ _init_ _(self, fileobj, separator='\n'): # Ensure that we get a line-reading sequence in the best way possible: # 保证用最佳方法读取行系列 (#译注:困惑阿,xreadlines在2.3?中已经deprecated了) import xreadlines try: # Check if the file-like object has an xreadlines method # 检查可能是文件对象的参数是否具有'''xreadlines'''方法,以获得对象的各行组成的序列 self.seq = fileobj.xreadlines( ) except AttributeError: # No, so fall back to the xreadlines module's implementation # 如果参数对象不具有xreadlines方法,使用xreadlines模块的实现 self.seq = xreadlines.xreadlines(fileobj) self.line_num = 0 # current index into self.seq (line number) #实例变量, 行号索引, self.para_num = 0 # current index into self (paragraph number) #实例变量,段落号索引, # Ensure that separator string includes a line-end character at the end #检查参数'''分隔字符串'''末尾包含 '\n' if separator[-1:] != '\n': separator += '\n' self.separator = separator #实例变量,行分隔字符串 def _ _getitem_ _(self, index): if index != self.para_num: # 实现仅支持顺序提取,如果下标不合理,抛出TypeError raise TypeError, "Only sequential access supported" self.para_num += 1 # Start where we left off and skip 0+ separator lines #从前一段落结束处开始,忽略可能的'''空行''' while 1: #循环处理可能的空行,遇到非空行开始段落处理 # Propagate IndexError, if any, since we're finished if it occurs #前面已经处理下标不合理错误。 这里如果有错误,抛出这个异常 line = self.seq[self.line_num] #仅对空行计数,忽略空行处理 self.line_num += 1 if line != self.separator: break #遇到非空行, 处理 # Accumulate 1+ nonempty lines into result #添加非空行到结果 result = [line] #开始处理段落 while 1: # Intercept IndexError, since we have one last paragraph to return try: # Let's check if there's at least one more line in self.seq #检查行序列是否还有剩余元素未处理 line = self.seq[self.line_num] except IndexError: # self.seq is finished, so we exit the loop #序列已处理完毕,退出循环 break # Increment index into self.seq for next time self.line_num += 1 if line == self.separator: break result.append(line) #添加非空行到结果 return ''.join(result) #对段落行序列使用'''join'''构成段落字符串返回 # Here's an example function, showing how to use class Paragraphs: #如何使用Paragraphs类的函数如下: def show_paragraphs(filename, numpars=5): pp = Paragraphs(open(filename)) for p in pp: print "Par#%d, line# %d: %s" % ( pp.para_num, pp.line_num, repr(p)) if pp.para_num>numpars: break
讨论 Discussion
...