|
⇤ ← Revision 1 as of 2004-09-22 19:44:16
Size: 3183
Comment:
|
Size: 3699
Comment:
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 25: | Line 25: |
| 按照Python语言风格(在Python 2.1及更早版本中)普通的解决架构是使用一个'''包装'''类(wrapper class) | 按照Python语言风格(在Python 2.1及更早版本中)普通的解决架构是使用一个'''包装'''类(wrapper class): |
| Line 32: | Line 34: |
| # 保证 | # 保证用最佳方法读取行系列 (#译注:困惑阿,xreadlines在2.3?中已经deprecated了) |
| Line 36: | Line 38: |
| # 检查可能是文件对象的参数是否具有'''xreadlines'''方法 | |
| Line 39: | Line 42: |
| # 如果参数对象不具有xreadlines方法,使用xreadlines模块的实现 | |
| Line 42: | Line 46: |
| #实例变量, 行号索引, | |
| Line 43: | Line 48: |
| #实例变量,段落号索引, | |
| Line 45: | Line 50: |
| #检查参数'''分隔字符串'''末尾包含 '\n' | |
| Line 46: | Line 52: |
| self.separator = separator | self.separator = separator #实例变量,段落号索引, |
文章来自《Python cookbook》. 翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!
-- 61.182.251.99 [DateTime(2004-09-22T19:44:16Z)] TableOfContents
描述
Reading a Text File by Paragraphs
按段落读取文本文件
Credit: Alex Martelli, Magnus Lie Hetland
问题 Problem
You need to read a file paragraph by paragraph, in which a paragraph is defined as a sequence of nonempty lines (in other words, paragraphs are separated by empty lines).
需要按段落读取文件,段落的定义是由非空行组成的行序列(既空行 分隔段落)
解决 Solution
A wrapper class is, as usual, the right Pythonic architecture for this (in Python 2.1 and earlier):
按照Python语言风格(在Python 2.1及更早版本中)普通的解决架构是使用一个包装类(wrapper class):
class Paragraphs:
def _ _init_ _(self, fileobj, separator='\n'):
# Ensure that we get a line-reading sequence in the best way possible:
# 保证用最佳方法读取行系列 (#译注:困惑阿,xreadlines在2.3?中已经deprecated了)
import xreadlines
try:
# Check if the file-like object has an xreadlines method
# 检查可能是文件对象的参数是否具有'''xreadlines'''方法
self.seq = fileobj.xreadlines( )
except AttributeError:
# No, so fall back to the xreadlines module's implementation
# 如果参数对象不具有xreadlines方法,使用xreadlines模块的实现
self.seq = xreadlines.xreadlines(fileobj)
self.line_num = 0 # current index into self.seq (line number)
#实例变量, 行号索引,
self.para_num = 0 # current index into self (paragraph number)
#实例变量,段落号索引,
# Ensure that separator string includes a line-end character at the end
#检查参数'''分隔字符串'''末尾包含 '\n'
if separator[-1:] != '\n': separator += '\n'
self.separator = separator #实例变量,段落号索引,
def _ _getitem_ _(self, index):
if index != self.para_num:
raise TypeError, "Only sequential access supported"
self.para_num += 1
# Start where we left off and skip 0+ separator lines
while 1:
# Propagate IndexError, if any, since we're finished if it occurs
line = self.seq[self.line_num]
self.line_num += 1
if line != self.separator: break
# Accumulate 1+ nonempty lines into result
result = [line]
while 1:
# Intercept IndexError, since we have one last paragraph to return
try:
# Let's check if there's at least one more line in self.seq
line = self.seq[self.line_num]
except IndexError:
# self.seq is finished, so we exit the loop
break
# Increment index into self.seq for next time
self.line_num += 1
if line == self.separator: break
result.append(line)
return ''.join(result)
# Here's an example function, showing how to use class Paragraphs:
def show_paragraphs(filename, numpars=5):
pp = Paragraphs(open(filename))
for p in pp:
print "Par#%d, line# %d: %s" % (
pp.para_num, pp.line_num, repr(p))
if pp.para_num>numpars: break
讨论 Discussion
...
