|
Size: 1775
Comment:
|
Size: 2240
Comment: 删除对PageComment2组件的引用
|
| Deletions are marked like this. | Additions are marked like this. |
| Line 1: | Line 1: |
| ## page was renamed from MicroProj/2008-01-02 | |
| Line 17: | Line 18: |
| subject [CPyUG:37589] 请教在大数据量的情况下构建字典的问题 }}} |
subject [CPyUG:37589]}}}[http://groups.google.com/group/python-cn/t/cb06efbb633fa35d 请教在大数据量的情况下构建字典的问题] |
| Line 49: | Line 50: |
| a = struct.unpack('llll', x) | a = struct.unpack('IIII', x) |
| Line 55: | Line 56: |
== shelve == shelve,基本可行,改造代码如下: {{{#!python import shelve my_file = file('test.dat','rb') content = my_file.read() record_number = len(content) / 16 db = shelve.open('test.dat.db') for i in range(0,record_number): a = struct.unpack("IIII",content[i*16:i*16+16]) db[str(a[0])+'_'+str(a[1])] = (a[2],a[3]) db.sync() }}} |
|
| Line 56: | Line 75: |
| [[PageComment2]] |
::-- ZoomQuiet [DateTime(2008-01-02T05:40:31Z)] TableOfContents
1. 大容量字典创建
1.1. 问题
{{{wanzathe <[email protected]> reply-to [email protected], to "python-cn:CPyUG" <[email protected]>, date Jan 2, 2008 1:20 PM subject [CPyUG:37589]}}}[http://groups.google.com/group/python-cn/t/cb06efbb633fa35d 请教在大数据量的情况下构建字典的问题]
有一个二进制格式存储的数据文件test.dat(intA+intB+intC+intD),想根据这个二进制文件创建字典 {(intA,intB):(intC,intD)}:
myDict = {}
input_file = file('./test.dat','rb')
content = input.read()
record_number = len(content) / 16
for i in range(0,record_number):
a = struct.unpack("IIII",content[i*16:i*16+16])
myDict[(a[0],a[1])] = (a[2],a[3])问题: 因test.dat文件数据量巨大(150M,大概1000w条记录),这么做基本上是不可行的,速度慢,内存占用太厉害:( 目前眼前一片迷茫,还请各位大侠指点一二,万分感激!
1.2. 方案1
Qiangning Hong <[email protected]> reply-to [email protected], to [email protected], date Jan 2, 2008 1:30 PM subject [CPyUG:37592] Re: 请教在大数据量的情况下构建字典的问题
如果你真的是要一个dict对象的话,下面这段代码应该会内存占用小一些:
如果你仅仅是希望能够用类似dict的方式来访问数据的话,建议你看看shelve模块
1.3. shelve
shelve,基本可行,改造代码如下:
