Differences between revisions 1 and 2
Revision 1 as of 2004-09-23 16:58:36
Size: 4135
Editor: 61
Comment:
Revision 2 as of 2004-09-23 17:09:00
Size: 4466
Editor: 61
Comment:
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:

...
读取含有'''行连接字符'''的文件行
Line 18: Line 17:
文件含有逻辑上很长的行,分隔在2个或者更多的物理文件行上,行尾使用'\'符号标记下一行仍然是逻辑上的同一行。需要处理逻辑行的序列,将分开的各物理行合并连接起来。
   
Line 22: Line 22:
Python 2.1中,正确方法仍是使用封装类:

文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- 61.182.251.99 [DateTime(2004-09-23T16:58:36Z)] TableOfContents

描述

Reading Lines with Continuation Characters 读取含有行连接字符的文件行

问题 Problem

You have a file that includes long logical lines split over two or more physical lines, with backslashes to indicate that a continuation line follows. You want to process a sequence of logical lines, rejoining those split lines.

文件含有逻辑上很长的行,分隔在2个或者更多的物理文件行上,行尾使用'\'符号标记下一行仍然是逻辑上的同一行。需要处理逻辑行的序列,将分开的各物理行合并连接起来。

解决 Solution

As usual, a class is the right way to wrap this functionality in Python 2.1: Python 2.1中,正确方法仍是使用封装类:

class LogicalLines:

  • def _ _init_ _(self, fileobj):
    • # Ensure that we get a line-reading sequence in the best way possible: import xreadlines try:
      • # Check if the file-like object has an xreadlines method self.seq = fileobj.xreadlines( )

      except AttributeError:

      • # No, so fall back to the xreadlines module's implementation self.seq = xreadlines.xreadlines(fileobj)
      self.phys_num = 0 # current index into self.seq (physical line number) self.logi_num = 0 # current index into self (logical line number)
    def _ _getitem_ _(self, index):
    • if index != self.logi_num:
      • raise TypeError, "Only sequential access supported"

      self.logi_num += 1 result = [] while 1:
      • # Intercept IndexError, since we may have a last line to return try:

        • # Let's see if there's at least one more line in self.seq line = self.seq[self.phys_num]

        except IndexError:

        • # self.seq is finished, so break the loop if we have any # more data to return; else, reraise the exception, because # if we have no further data to return, we're finished too if result: break else: raise
        self.phys_num += 1 if line.endswith('\\\n'):
        • result.append(line[:-2])
        else:
        • result.append(line) break

      return .join(result)

# Here's an example function, showing off usage: def show_logicals(fileob, numlines=5):

  • ll = LogicalLines(fileob) for l in ll:

    • print "Log#%d, phys# %d: %s" % (
      • ll.logi_num, ll.phys_num, repr(l))

      if ll.logi_num>numlines: break

if _ _name_ _=='_ _main_ _':

  • from cStringIO import StringIO ff = StringIO(

r"""prima \ seconda \ terza quarta \ quinta sesta settima \ ottava """)

  • show_logicals( ff )

讨论 Discussion

This is another sequence-bunching problem, like Recipe 4.9. In Python 2.1, a class wrapper is the most natural approach to getting reusable code for sequence-bunching tasks. We need to support the sequence protocol ourselves and handle the sequence protocol in the sequence we wrap. In Python 2.1 and earlier, the sequence protocol is as follows: a sequence must be indexable by successively larger integers (0, 1, 2, ...), and it must raise an IndexError as soon as an integer that is too large is used as its index. So, if we need to work with Python 2.1 and earlier, we must behave this way ourselves and be prepared for just such behavior from the sequence we are wrapping.

In Python 2.2, thanks to iterators, the sequence protocol is much simpler. A call to the next method of an iterator yields its next item, and the iterator raises a StopIteration when it's done. Combined with a simple generator function that returns an iterator, this makes sequence bunching and similar tasks far easier:

from _ _future_ _ import generators

def logical_lines(fileobj):

  • logical_line = [] for physical_line in fileobj:
    • if physical_line.ends_with('\\\n'):
      • logical_line.append(physical_line[:-2])
      else:
      • yield .join(logical_line)+physical_line logical_line = []

    if logical_line: yield .join(logical_line)

参考 See Also

PyCkBk-4-10 (last edited 2009-12-25 07:09:08 by localhost)