Differences between revisions 1 and 2
Revision 1 as of 2004-09-22 19:44:16
Size: 3183
Editor: 61
Comment:
Revision 2 as of 2004-09-22 19:54:34
Size: 3699
Editor: 61
Comment:
Deletions are marked like this. Additions are marked like this.
Line 25: Line 25:
按照Python语言风格(在Python 2.1及更早版本中)普通的解决架构是使用一个'''包装'''类(wrapper class)
按照Python语言风格(在Python 2.1及更早版本中)普通的解决架构是使用一个'''包装'''类(wrapper class):
Line 32: Line 34:
        # 保证         # 保证用最佳方法读取行系列 (#译注:困惑阿,xreadlines在2.3?中已经deprecated了)
Line 36: Line 38:
            # 检查可能是文件对象的参数是否具有'''xreadlines'''方法
Line 39: Line 42:
            # 如果参数对象不具有xreadlines方法,使用xreadlines模块的实现
Line 42: Line 46:
                             #实例变量, 行号索引,
Line 43: Line 48:
                             #实例变量,段落号索引,
Line 45: Line 50:
        #检查参数'''分隔字符串'''末尾包含 '\n'
Line 46: Line 52:
        self.separator = separator         self.separator = separator         #实例变量,段落号索引,

文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- 61.182.251.99 [DateTime(2004-09-22T19:44:16Z)] TableOfContents

描述

Reading a Text File by Paragraphs

按段落读取文本文件

Credit: Alex Martelli, Magnus Lie Hetland

问题 Problem

You need to read a file paragraph by paragraph, in which a paragraph is defined as a sequence of nonempty lines (in other words, paragraphs are separated by empty lines).

需要按段落读取文件,段落的定义是由非空行组成的行序列(既空行 分隔段落)

解决 Solution

A wrapper class is, as usual, the right Pythonic architecture for this (in Python 2.1 and earlier):

按照Python语言风格(在Python 2.1及更早版本中)普通的解决架构是使用一个包装类(wrapper class):

class Paragraphs:

    def _ _init_ _(self, fileobj, separator='\n'):

        # Ensure that we get a line-reading sequence in the best way possible:
        # 保证用最佳方法读取行系列                (#译注:困惑阿,xreadlines在2.3?中已经deprecated了)
        import xreadlines
        try:
            # Check if the file-like object has an xreadlines method
            # 检查可能是文件对象的参数是否具有'''xreadlines'''方法
            self.seq = fileobj.xreadlines(  )
        except AttributeError:
            # No, so fall back to the xreadlines module's implementation
            # 如果参数对象不具有xreadlines方法,使用xreadlines模块的实现
            self.seq = xreadlines.xreadlines(fileobj)

        self.line_num = 0    # current index into self.seq (line number)
                             #实例变量, 行号索引,
        self.para_num = 0    # current index into self (paragraph number)
                             #实例变量,段落号索引,
        # Ensure that separator string includes a line-end character at the end
        #检查参数'''分隔字符串'''末尾包含 '\n' 
        if separator[-1:] != '\n': separator += '\n'
        self.separator = separator         #实例变量,段落号索引,


    def _ _getitem_ _(self, index):
        if index != self.para_num:
            raise TypeError, "Only sequential access supported"
        self.para_num += 1
        # Start where we left off and skip 0+ separator lines
        while 1:
        # Propagate IndexError, if any, since we're finished if it occurs
            line = self.seq[self.line_num]
            self.line_num += 1
            if line != self.separator: break
        # Accumulate 1+ nonempty lines into result
        result = [line]
        while 1:
        # Intercept IndexError, since we have one last paragraph to return
            try:
                # Let's check if there's at least one more line in self.seq
                line = self.seq[self.line_num]
            except IndexError:
                # self.seq is finished, so we exit the loop
                break
            # Increment index into self.seq for next time
            self.line_num += 1
            if line == self.separator: break
            result.append(line)
        return ''.join(result)

# Here's an example function, showing how to use class Paragraphs:
def show_paragraphs(filename, numpars=5):
    pp = Paragraphs(open(filename))
    for p in pp:
        print "Par#%d, line# %d: %s" % (
            pp.para_num, pp.line_num, repr(p))
        if pp.para_num>numpars: break

讨论 Discussion

...

参考 See Also

PyCkBk-4-9 (last edited 2009-12-25 07:16:21 by localhost)