Size: 1829
Comment:
|
Size: 3537
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 76: | Line 76: |
== 对比 == {{{ [email protected]> reply-to [email protected] to python-cn`CPyUG`华蟒用户组 <[email protected]> date Sat, Dec 13, 2008 at 01:26 subject [CPyUG:73653] Re: 如何高效的统计列表里面的重复项 }}} {{{ time python test2.py 865149 real 0m4.840s user 0m4.610s sys 0m0.210s time python test3.py 865113 real 0m5.724s user 0m5.490s sys 0m0.220s }}} test2.py:: {{{#!python #!/usr/bin/env python import random li = [] d = {} for i in range(10 ** 6 * 2): li.append(int(random.random() * 10 ** 6)) for e in li: if d.has_key(e): d[e] = d[e] + 1 else: d[e] = 1 print len(d) }}} test3.py:: {{{#!python #!/usr/bin/env python import random li = [] d = {} for i in range(10 ** 6 * 2): li.append(int(random.random() * 10 ** 6)) for e in li: try: d[e] = d[e] + 1 except: d[e] = 1 print len(d) }}} [email protected]:: {{{ reply-to [email protected] to [email protected] date Sat, Dec 13, 2008 at 03:05 subject [CPyUG:73655] Re: 如何高效的统计列表里面的重复项 }}} {{{ $ time python test_dict_speed.py (5.0090830326080322, 9.3741579055786133) real 0m33.376s user 0m32.002s sys 0m0.872s }}} `$ cat test_dict_speed.py ` {{{#!python import random, time MAX = 10**6 ls = [random.randint(1, MAX) for x in xrange(2*MAX)] t0 = time.time() d = {} for x in ls: d[x] = d.get(x, 0) + 1 t1 = time.time() d = {} for e in ls: try: d[e] = d[e] + 1 except: d[e] = 1 t2 = time.time() print (t1 - t0, t2 - t1) }}} ##endInc |
统计列表重复项
提问
2008/12/11 卢熙 <[email protected]>
- 要到达以下的效果:
alist = ['aaa', 'ccc', 'bbb', 'aaa', 'aaa', 'ccc'] adict = fn(alist) print {'aaa': 3, 'bbb': 1, 'ccc': 2}
- 在实际应用中,len(alist)很有可能超过10万,请问这个fn函数该如何写才能非常高效的完成这个任务?
方案1:for
萧萧 <[email protected]> reply-to [email protected] to [email protected] date Thu, Dec 11, 2008 at 22:51 subject [CPyUG:73576] Re: 如何高效的统计列表里面的重复项
>>> alist = ['aaa', 'ccc', 'bbb', 'aaa', 'aaa', 'ccc'] >>> adict = {} >>> for i in alist: ... try: ... adict[i] += 1 ... except: ... adict.setdefault(i, 1) >>> adict {'aaa': 3, 'bbb': 1, 'ccc': 2} ##endInc
方案2:count()
萧萧 <[email protected]> reply-to [email protected] to [email protected] date Fri, Dec 12, 2008 at 11:18
alist = ['aaa', 'ccc', 'bbb', 'aaa', 'aaa', 'ccc'] adict = dict([(i, alist.count(i) for i in list(set(alist))])
方案3:fromkeys()
don li <[email protected]> reply-to [email protected] to [email protected] date Fri, Dec 12, 2008 at 11:53
对比
[email protected]> reply-to [email protected] to python-cn`CPyUG`华蟒用户组 <[email protected]> date Sat, Dec 13, 2008 at 01:26 subject [CPyUG:73653] Re: 如何高效的统计列表里面的重复项
time python test2.py 865149 real 0m4.840s user 0m4.610s sys 0m0.210s time python test3.py 865113 real 0m5.724s user 0m5.490s sys 0m0.220s
- test2.py
- test3.py
reply-to [email protected] to [email protected] date Sat, Dec 13, 2008 at 03:05 subject [CPyUG:73655] Re: 如何高效的统计列表里面的重复项
$ time python test_dict_speed.py (5.0090830326080322, 9.3741579055786133) real 0m33.376s user 0m32.002s sys 0m0.872s
$ cat test_dict_speed.py
1 import random, time
2 MAX = 10**6
3
4 ls = [random.randint(1, MAX) for x in xrange(2*MAX)]
5
6 t0 = time.time()
7
8 d = {}
9 for x in ls: d[x] = d.get(x, 0) + 1
10 t1 = time.time()
11
12 d = {}
13 for e in ls:
14 try: d[e] = d[e] + 1
15 except: d[e] = 1
16 t2 = time.time()
17
18 print (t1 - t0, t2 - t1)
反馈
创建 by -- ZoomQuiet [DateTime(2008-12-12T01:33:16Z)]