|
Size: 3699
Comment:
|
Size: 4640
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 26: | Line 26: |
| 按照Python语言风格(在Python 2.1及更早版本中)普通的解决架构是使用一个'''包装'''类(wrapper class): | 按照Python语言风格(在Python 2.1及更早版本中)普通的正确解决方法的基础是使用一个'''包装'''类(wrapper class): |
| Line 38: | Line 38: |
| # 检查可能是文件对象的参数是否具有'''xreadlines'''方法 self.seq = fileobj.xreadlines( ) |
# 检查可能是文件对象的参数是否具有'''xreadlines'''方法,以获得对象的各行组成的序列 self.seq = fileobj.xreadlines( ) |
| Line 52: | Line 52: |
| self.separator = separator #实例变量,段落号索引, | self.separator = separator #实例变量,行分隔字符串 |
| Line 57: | Line 57: |
| raise TypeError, "Only sequential access supported" | # 实现仅支持顺序提取,如果下标不合理,抛出TypeError raise TypeError, "Only sequential access supported" |
| Line 60: | Line 62: |
| while 1: | #从前一段落结束处开始,忽略可能的'''空行''' while 1: #循环处理可能的空行,遇到非空行开始段落处理 |
| Line 62: | Line 66: |
| line = self.seq[self.line_num] | #前面已经处理下标不合理错误。 这里如果有错误,抛出这个异常 line = self.seq[self.line_num] #仅对空行计数,忽略空行处理 |
| Line 64: | Line 70: |
| if line != self.separator: break | if line != self.separator: break #遇到非空行, 处理 |
| Line 66: | Line 72: |
| result = [line] | #添加非空行到结果 result = [line] #开始处理段落 |
| Line 71: | Line 79: |
| #检查行序列是否还有剩余元素未处理 | |
| Line 74: | Line 83: |
| #序列已处理完毕,退出循环 | |
| Line 78: | Line 88: |
| result.append(line) return ''.join(result) |
result.append(line) #添加非空行到结果 return ''.join(result) #对段落行序列使用'''join'''构成段落字符串返回 |
| Line 82: | Line 93: |
| #如何使用Paragraphs类的函数如下: |
文章来自《Python cookbook》. 翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!
-- 61.182.251.99 [DateTime(2004-09-22T19:44:16Z)] TableOfContents
描述
Reading a Text File by Paragraphs
按段落读取文本文件
Credit: Alex Martelli, Magnus Lie Hetland
问题 Problem
You need to read a file paragraph by paragraph, in which a paragraph is defined as a sequence of nonempty lines (in other words, paragraphs are separated by empty lines).
需要按段落读取文件,段落的定义是由非空行组成的行序列(既空行 分隔段落)
解决 Solution
A wrapper class is, as usual, the right Pythonic architecture for this (in Python 2.1 and earlier):
按照Python语言风格(在Python 2.1及更早版本中)普通的正确解决方法的基础是使用一个包装类(wrapper class):
class Paragraphs:
def _ _init_ _(self, fileobj, separator='\n'):
# Ensure that we get a line-reading sequence in the best way possible:
# 保证用最佳方法读取行系列 (#译注:困惑阿,xreadlines在2.3?中已经deprecated了)
import xreadlines
try:
# Check if the file-like object has an xreadlines method
# 检查可能是文件对象的参数是否具有'''xreadlines'''方法,以获得对象的各行组成的序列
self.seq = fileobj.xreadlines( )
except AttributeError:
# No, so fall back to the xreadlines module's implementation
# 如果参数对象不具有xreadlines方法,使用xreadlines模块的实现
self.seq = xreadlines.xreadlines(fileobj)
self.line_num = 0 # current index into self.seq (line number)
#实例变量, 行号索引,
self.para_num = 0 # current index into self (paragraph number)
#实例变量,段落号索引,
# Ensure that separator string includes a line-end character at the end
#检查参数'''分隔字符串'''末尾包含 '\n'
if separator[-1:] != '\n': separator += '\n'
self.separator = separator #实例变量,行分隔字符串
def _ _getitem_ _(self, index):
if index != self.para_num:
# 实现仅支持顺序提取,如果下标不合理,抛出TypeError
raise TypeError, "Only sequential access supported"
self.para_num += 1
# Start where we left off and skip 0+ separator lines
#从前一段落结束处开始,忽略可能的'''空行'''
while 1: #循环处理可能的空行,遇到非空行开始段落处理
# Propagate IndexError, if any, since we're finished if it occurs
#前面已经处理下标不合理错误。 这里如果有错误,抛出这个异常
line = self.seq[self.line_num] #仅对空行计数,忽略空行处理
self.line_num += 1
if line != self.separator: break #遇到非空行, 处理
# Accumulate 1+ nonempty lines into result
#添加非空行到结果
result = [line] #开始处理段落
while 1:
# Intercept IndexError, since we have one last paragraph to return
try:
# Let's check if there's at least one more line in self.seq
#检查行序列是否还有剩余元素未处理
line = self.seq[self.line_num]
except IndexError:
# self.seq is finished, so we exit the loop
#序列已处理完毕,退出循环
break
# Increment index into self.seq for next time
self.line_num += 1
if line == self.separator: break
result.append(line) #添加非空行到结果
return ''.join(result) #对段落行序列使用'''join'''构成段落字符串返回
# Here's an example function, showing how to use class Paragraphs:
#如何使用Paragraphs类的函数如下:
def show_paragraphs(filename, numpars=5):
pp = Paragraphs(open(filename))
for p in pp:
print "Par#%d, line# %d: %s" % (
pp.para_num, pp.line_num, repr(p))
if pp.para_num>numpars: break
讨论 Discussion
...
