ZODB

简述

起因

公元2005年2月19日会课中, limodou提到用ZODB来完成知识存储的想法. 他挖坑,偶就先跳进去了.

学习笔记

《ZODB/ZEO Programming Guide》一共才25页, 花了3小时看完, 先写点不算翻译也不算感想的东西吧.

ZODB的安装

windows版本从[http://zope.org/Products/ZODB3.2]下载.
BSD下直接在ports/databases/zodb3中安装
ZODB主要包括了ZODB,ZEO,BTREE等几个重要都包, 他们可以独立于ZOPE运行的, 其实ZODB是ZOPE的地层, 整个ZOPE就架在ZODB上.

基本概念

ZODB虽然是OODB, 但是任何有一些和关系数据库类似的概念
ZODB的数据存储形式, 是多选的, 可以是普通文件(FileStorage), DB4和ZEO连接
Python类通过继承Persistent可以变为ZODB化的
ZODB是基于"事务"的
ZODB的逻辑结构是网状结构的, 最基本的ZODB是一棵以root为根的树

例子

先来看一个例子, 这个例子是可以运行的, 源于《ZODB/ZEO Programming Guide》

   1 from ZODB import FileStorage, DB
   2 import ZODB
   3 from Persistence import Persistent
   4 from BTrees.OOBTree import OOBTree
   5 
   6 class User(Persistent):
   7     pass
   8 
   9 def test1():
  10     storage = FileStorage.FileStorage("test-filestorage.fs")
  11     db = DB(storage)
  12     conn = db.open()
  13     dbroot = conn.root()
  14     # Ensure that a 'userdb' key is present
  15     # in the root
  16     if not dbroot.has_key('userdb'):
  17         dbroot['userdb'] = OOBTree()
  18     userdb = dbroot['userdb']
  19     # Create new User instance
  20     newuser = User()
  21     # Add whatever attributes you want to track
  22     newuser.id = 'amk'
  23     newuser.first_name = 'Andrew'
  24     newuser.last_name = 'Kuchling'
  25     # Add object to the BTree, keyed on the ID
  26     userdb[newuser.id] = newuser
  27     # Commit the change
  28     get_transaction().commit()
  29     conn.close()
  30     storage.close()
  31 
  32 def test2():
  33     storage = FileStorage.FileStorage("test-filestorage.fs")
  34     db = DB(storage)
  35     conn = db.open()
  36     dbroot = conn.root()
  37     it = [dbroot]
  38     for t in it:
  39         for k, v in t.items():
  40             if isinstance(v, OOBTree):
  41                 print k, ':'
  42                 it.append(v)
  43             elif isinstance(v, User):
  44                 print 'Key:', k
  45                 print 'ID:', v.id
  46                 print 'first_name:', v.first_name
  47                 print 'last_name:', v.last_name
  48     
  49 if __name__ == "__main__":
  50     test1()
  51     test2()

test1向数据库写数据, test2从数据库读数据.

逐步分解

连接数据库, 这个例子中使用普通文本:

   1 from ZODB import FileStorage, DB
   2 storage = FileStorage.FileStorage('/tmp/test-filestorage.fs')
   3 db = DB(storage)
   4 conn = db.open()

建一个ZODB化的类User

   1 import ZODB
   2 from Persistence import Persistent
   3 
   4 class User(Persistent):
   5     pass

获取数据库的根, 若没有userdb添加一个userdb实例

   1 dbroot = conn.root()
   2 # Ensure that a 'userdb' key is present
   3 # in the root
   4 if not dbroot.has_key('userdb'):
   5    from BTrees.OOBTree import OOBTree
   6    dbroot['userdb'] = OOBTree()
   7 userdb = dbroot['userdb']

dbroot和userdb都是OOBTree的实例, 什么是BTree稍后解释, 你可以暂且认为是ZODB化的dict.

做userdb中插入一条User记录:

   1 # Create new User instance
   2 newuser = User()
   3 # Add whatever attributes you want to track
   4 newuser.id = 'amk'
   5 newuser.first_name = 'Andrew' ; newuser.last_name = 'Kuchling'
   6 ...
   7 # Add object to the BTree, keyed on the ID
   8 userdb[newuser.id] = newuser
   9 # Commit the change
  10 get_transaction().commit()

你也许会奇怪get_transaction()是哪来的, 有什么用? get_transaction是在import ZODB的时候加入到builtins里面的, 他获得一个事务.

事务有两个方法:'commit' 和'abord',分别是提交和废弃.
关闭数据库连接

   1 conn.close()
   2 storage.close()

不关闭数据库连接 test2就无法执行, FileStorage不支持多连接啊
读取数据 test2()
- 先连接数据库, 和test1一样

   1     storage = FileStorage.FileStorage("test-filestorage.fs")
   2     db = DB(storage)
   3     conn = db.open()

然后获取dbroot:

   1 dbroot = conn.root()

因为ZODB是树状结构的, 所以我深度优先遍历这个棵树:

   1     it = [dbroot]
   2     for t in it:
   3         print t
   4         for k, v in t.items():
   5             if isinstance(v, OOBTree):
   6                 print k, ':'
   7                 it.append(v)
   8             elif isinstance(v, User):
   9                 print 'Key:', k
  10                 print 'ID:', v.id
  11                 print 'first_name:', v.first_name
  12                 print 'last_name:', v.last_name

这个只是试验,证明test1的确在数据库中存放了数据.

至此例子分析完毕.

ZODB的关系模型

ZODB除了提供Persistent类ZODB化python外还提供了 PersistentMapping, PersistentList,

顾名思义他们分别是用来模拟Mapping结构和List结构的(python中的dict和list).

为什么要提供这两种结构呢?因为ZODB不能有效得处理python中的可变对象(dict和list). 当改变ZODB对象时, 应该将对象标记为脏的(dirty),这样在commit时就知道到底哪些数据需要更新了. 在ZODB对象中用'_p_changed'属性标记脏数据.

但是在改变可变类型时_p_changed并不改变, 需要用户手动设置, 如:

   1 userobj.friends.append(otherUser)
   2 userobj._p_changed = 1

PersistentMapping, PersistentList只解决了正确性的问题. 而BTree则应该是真正ZODB化的解决方案.

BTree

学过数据结构的应该都觉得BTree有点眼熟吧, 对, BTree就是平衡二叉树(balanced tree). 为了处理大很大的数据量, ZODB引进BTree作为Mapping的实现, 他在使用方法上类似于dict.
BTree是按需存取的, 他在使用时才会将数据读入内存, 这样就可以处理非常大的Mapping结构.
BTree是平衡二叉树, 因此在按key读取时速度非常快, 应该是O(log2(n))这个级别的时间复杂度.
BTree包含了多种Mapping类供选择, 供了BTree, Bucket, Set,TreeSet四种数据结构, 按key和value的数据类型分为'I'和'O'分别表示整型(Int)和对象类型(Object), 用'I' 'O'修饰数据结构就得到了BTree中可用的类: OOBTree, OOBucket, OOSet, OOTreeSet, IOBTree, IOBucket, IOSet, IOTreeSet, OIBTree, OIBucket, OISet, OITreeSet, IIBTree, IIBucket, IISet, IITreeSet,

ZODB笔记