Differences between revisions 1 and 2
Revision 1 as of 2004-09-21 05:42:12
Size: 1207
Editor: 61
Comment:
Revision 2 as of 2004-09-21 05:52:40
Size: 1417
Editor: 61
Comment:
Deletions are marked like this. Additions are marked like this.
Line 12: Line 12:
Line 20: Line 21:
需要读取文件的全部数据,但不是一次全部读出: 需要读取文件的全部数据,但不是一次全部读出:
Line 33: Line 34:
        # How likely is it that this is the last line of the file?
        if random.uniform(0,lineNum)<1: #译注: 这里是随机的吗?
        # 本行有多大可能性是文件最后一行?
        if random.uniform(0,lineNum)<1:                 
Line 38: Line 39:
}}}
Line 39: Line 41:
#译注: 算法的解释见讨论
Line 40: Line 43:
== 讨论 Discussion ==

当然,更明了的方法是这样的:
{{{
random.choice(file_object.readlines( ))
Line 41: Line 49:
== 讨论 Discussion == 但是,这需要将全部文件内容读入内存,对确实很大的文件可能有问题。

文章来自《Python cookbook》.

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!

-- 61.182.251.99 [DateTime(2004-09-21T05:42:12Z)] TableOfContents

描述

Retrieving a Line at Random from a File of Unknown Size

读取未知大小文件的随机一行

问题 Problem

有文件,不清楚大小(但是可能非常大),需要忽略文件本身,只读取数据的随机一行。

解决 Solution

We do need to read the whole file, but we don't have to read it all at once: 需要读取文件的全部数据,但不是一次全部读出:

import random

def randomLine(file_object):
    "顺序读取文件内容,取文件的随机的一行"
    lineNum = 0
    selected_line = ''

    while 1:
        aLine = file_object.readline(  )
        if not aLine: break
        lineNum = lineNum + 1
        # 本行有多大可能性是文件最后一行?          
        if random.uniform(0,lineNum)<1:                   
            selected_line = aLine
    file_object.close(  )
    return selected_line

#译注: 算法的解释见讨论

讨论 Discussion

当然,更明了的方法是这样的:

random.choice(file_object.readlines(  ))

但是,这需要将全部文件内容读入内存,对确实很大的文件可能有问题。

...

参考 See Also

PyCkBk-4-6 (last edited 2009-12-25 07:16:21 by localhost)