Contents
Introduction
- This handcrafted package contains Python wrappers for Berkeley DB, the Open Source embedded database system. Berkeley DB is a programmatic toolkit that provides high-performance built-in database support for desktop and server applications. The Berkeley DB access methods include B+tree, Extended Linear Hashing, Fixed and Variable-length records, and Queues. Berkeley DB provides full transactional support, database recovery, online backups, multi-threaded and multi-process access, etc. The Python wrappers allow you to store Python string objects of any length, keyed either by strings or integers depending on the database access method. With the use of another module in the package standard shelve-like functionality is provided allowing you to store any picklable Python object!
Berkeley DB 4.x Python Extension Package
Introduction
- This is a simple bit of documentation for the bsddb3.db Python extension module which wraps the Berkeley DB 4.x C library. The extension module is located in a Python package along with a few pure python modules. It is expected that this module will be used in the following general ways by different programmers in different situations. The goals of this module are to allow all of these methods without making things too complex for the simple cases, and without leaving out funtionality needed by the complex cases.
Backwards compatibility: It is desirable for this package to be a near drop-in replacement for the bsddb module shipped with Python which is designed to wrap either DB 1.85, or the 1.85 compatibility interface. This means that there will need to be equivalent object creation functions available, (btopen(), hashopen(), and rnopen()) and the objects returned will need to have the same or at least similar methods available, (specifically, first(), last(), next(), and prev() will need to be available without the user needing to explicitly use a cursor.) All of these have been implemented in Python code in the bsddb3.init.py module.
Simple persistent dictionary: One small step beyond the above. The programmer may be aware of and use the new DB object type directly, but only needs it from a single process and thread. The programmer should not have to be bothered with using a DBEnv, and the DB object should behave as much like a dictionary as possible.
Concurrent access dictionaries: This refers to the ability to simultaneously have one writer and multiple readers of a DB (either in multiple threads or processes) and is implemented simply by creating a DBEnv with certain flags. No extra work is required to allow this access mode in bsddb3.
Advanced transactional data store: This mode of use is where the full capabilities of the Berkeley DB library are called into action. The programmer will probably not use the dictionary access methods as much as the regular methods of the DB object, so he can pass transaction objects to the methods. Again, most of this advanced functionality is activated simply by opening a DBEnv with the proper flags, and also by using transactions and being aware of and reacting to deadlock exceptions, etc.
Types Provided
- The bsddb3.db extension module provides the following object types:
DB: The basic database object, capable of Hash, BTree, Recno, and Queue access methods.
DBEnv: Provides a Database Environment for more advanced database use. Apps using transactions, logging, concurrent access, etc. will need to have an environment object.
DBCursor: A pointer-like object used to traverse a database.
DBTxn: A database transaction. Allows for multi-file commit, abort and checkpoint of database modifications.
DBLock: An opaque handle for a lock. See DBEnv.lock_get() and DBEnv.lock_put(). Locks are not necessarily associated with anything in the database, but can be used for any syncronization task across all threads and processes that have the DBEnv open.
DBSequence: Sequences provide an arbitrary number of persistent objects that return an increasing or decreasing sequence of integers. Opening a sequence handle associates it with a record in a database.
Exceptions Provided
- The Berkeley DB C API uses function return codes to signal various errors. The bsddb3.db module checks for these error codes and turns them into Python exceptions, allowing you to use familiar try:... except:... constructs and not have to bother with checking every method’s return value. Each of the error codes is turned into an exception specific to that error code, as outlined in the table below. If you are using the C API documentation then it is very easy to map the error return codes specified there to the name of the Python exception that will be raised. Simply refer to the table below.
Each exception derives from the DBError exception class so if you just want to catch generic errors you can use DBError to do it. Since DBNotFoundError is raised when a given key is not found in the database, DBNotFoundError also derives from the standard KeyError exception to help make a DB look and act like a dictionary. When any of these exceptions is raised, the associated value is a tuple containing an integer representing the error code and a string for the error message itself.
DBError |
Base class, all others derive from this |
DBIncompleteError |
DB_INCOMPLETE |
DBKeyEmptyError |
DB_KEYEMPTY |
DBKeyExistError |
DB_KEYEXIST |
DBLockDeadlockError |
DB_LOCK_DEADLOCK |
DBLockNotGrantedError |
DB_LOCK_NOTGRANTED |
DBNotFoundError |
DB_NOTFOUND (also derives from KeyError) |
DBOldVersionError |
DB_OLD_VERSION |
DBRunRecoveryError |
DB_RUNRECOVERY |
DBVerifyBadError |
DB_VERIFY_BAD |
DBNoServerError |
DB_NOSERVER |
DBNoServerHomeError |
DB_NOSERVER_HOME |
DBNoServerIDError |
DB_NOSERVER_ID |
DBInvalidArgError |
EINVAL |
DBAccessError |
EACCES |
DBNoSpaceError |
ENOSPC |
DBNoMemoryError |
ENOMEM |
DBAgainError |
EAGAIN |
DBBusyError |
EBUSY |
DBFileExistsError |
EEXIST |
DBNoSuchFileError |
ENOENT |
DBPermissionsError |
EPERM |
Other Package Modules
dbshelve.py: This is an implementation of the standard Python shelve concept for storing objects that uses bsddb3 specifically, and also exposes some of the more advanced methods and capabilities of the underlying DB.
dbtables.py: This is a module by Gregory Smith that implements a simplistic table structure on top of a DB.
dbutils.py: A catch-all for python code that is generally useful when working with DB’s
dbobj.py: Contains subclassable versions of DB and DBEnv.
dbrecio.py: Contains the DBRecIO class that can be used to do partial reads and writes from a DB record using a file-like interface. Contributed by Itamar Shtull-Trauring.
Testing
A full unit test suite is being developed to exercise the various object types, their methods and the various usage modes described in the introduction. PyUnit is used and the tests are structured such that they can be run unattended and automated. There are currently almost 300 test cases! (March 2008)
Reference
See the C language API online documentation on Oracle’s website for more details of the functionality of each of these methods. The names of all the Python methods should be the same or similar to the names in the C API. NOTE: All the methods shown below having more than one keyword argument are actually implemented using keyword argument parsing, so you can use keywords to provide optional parameters as desired. Those that have only a single optional argument are implemented without keyword parsing to help keep the implementation simple. If this is too confusing let me know and I’ll think about using keywords for everything.
DBEnv
DBEnv Attributes
- DBEnv(flags=0)
- database home directory (read-only)
DBEnv Methods
- DBEnv(flags=0)
Constructor. More info...
- set_rpc_server(host, cl_timeout=0, sv_timeout=0)
Establishes a connection for this dbenv to a RPC server. More info...
- close(flags=0)
Close the database environment, freeing resources. More info...
- open(homedir, flags=0, mode=0660)
Prepare the database environment for use. More info...
- remove(homedir, flags=0)
Remove a database environment. More info...
- dbremove(file, database=None, txn=None, flags=0)
Removes the database specified by the file and database parameters. If no database is specified, the underlying file represented by file is removed, incidentally removing all of the databases it contained. More info...
- dbrename(file, database=None, newname, txn=None, flags=0)
Renames the database specified by the file and database parameters to newname. If no database is specified, the underlying file represented by file is renamed, incidentally renaming all of the databases it contained. More info...
- set_encrypt(passwd, flags=0)
Set the password used by the Berkeley DB library to perform encryption and decryption. More info...
- set_timeout(timeout, flags)
Sets timeout values for locks or transactions in the database environment. More info...
- set_shm_key(key)
Specify a base segment ID for Berkeley DB environment shared memory regions created in system memory on VxWorks or systems supporting X/Open-style shared memory interfaces; for example, UNIX systems supporting shmget(2) and related System V IPC interfaces. More info...
- set_cachesize(gbytes, bytes, ncache=0)
Set the size of the shared memory buffer pool. More info...
- set_data_dir(dir)
Set the environment data directory. More info...
- set_flags(flags, onoff)
Set additional flags for the DBEnv. The onoff parameter specifes if the flag is set or cleared. More info...
- set_tmp_dir(dir)
Set the directory to be used for temporary files. More info...
- set_get_returns_none(flag)
- By default when DB.get or DBCursor.get, get_both, first, last, next or prev encounter a DB_NOTFOUND error they return None instead of raising DBNotFoundError. This behaviour emulates Python dictionaries and is convenient for looping.
0 all DB and DBCursor get and set methods will raise a DBNotFoundError rather than returning None.
1 Default in module version <4.2.4 The DB.get and DBCursor.get, get_both, first, last, next and prev methods return None.
2 Default in module version >=4.2.4 Extends the behaviour of 1 to the DBCursor set, set_both, set_range and set_recno methods.
The default of returning None makes it easy to do things like this without having to catch DBNotFoundError (KeyError):
- or this:
- Making the cursor set methods return None is useful in order to do this:
- The downside to this it that it is inconsistent with the rest of the package and noticeably diverges from the Oracle Berkeley DB API. If you prefer to have the get and set methods raise an exception when a key is not found, use this method to tell them to do so. Calling this method on a DBEnv object will set the default for all DB’s later created within that environment. Calling it on a DB object sets the behaviour for that DB only. The previous setting is returned.
- set_private(object)
- Link an arbitrary object to the DBEnv.
- get_private()
- Give the object linked to the DBEnv.
- set_lg_bsize(size)
Set the size of the in-memory log buffer, in bytes. More info...
- set_lg_dir(dir)
The path of a directory to be used as the location of logging files. Log files created by the Log Manager subsystem will be created in this directory. More info...
- set_lg_max(size)
Set the maximum size of a single file in the log, in bytes. More info...
- get_lg_max(size)
Returns the maximum log file size. More info...
- set_lg_regionmax(size)
Set the maximum size of a single region in the log, in bytes. More info...
- set_lk_detect(mode)
Set the automatic deadlock detection mode. More info...
- set_lk_max(max)
Set the maximum number of locks. (This method is deprecated.) More info...
- set_lk_max_locks(max)
Set the maximum number of locks supported by the Berkeley DB lock subsystem. More info...
- set_lk_max_lockers(max)
Set the maximum number of simultaneous locking entities supported by the Berkeley DB lock subsystem. More info...
- set_lk_max_objects(max)
Set the maximum number of simultaneously locked objects supported by the Berkeley DB lock subsystem. More info...
- set_mp_mmapsize(size)
- Files that are opened read-only in the memory pool (and that satisfy a few other criteria) are, by default, mapped into the process address space instead of being copied into the local cache. This can result in better-than-usual performance, as available virtual memory is normally much larger than the local cache, and page faults are faster than page copying on many systems. However, in the presence of limited virtual memory it can cause resource starvation, and in the presence of large databases, it can result in immense process sizes.
This method sets the maximum file size, in bytes, for a file to be mapped into the process address space. If no value is specified, it defaults to 10MB. More info...
- log_archive(flags=0)
Returns a list of log or database file names. By default, log_archive returns the names of all of the log files that are no longer in use (e.g., no longer involved in active transactions), and that may safely be archived for catastrophic recovery and then removed from the system. More info...
- log_flush()
Force log records to disk. Useful if the environment, database or transactions are used as ACI, instead of ACID. For example, if the environment is opened as DB_TXN_NOSYNC. More info...
- log_set_config(flags, onoff)
Configures the Berkeley DB logging subsystem. More info...
- lock_detect(atype, flags=0)
Run one iteration of the deadlock detector, returns the number of transactions aborted. More info...
- lock_get(locker, obj, lock_mode, flags=0)
Acquires a lock and returns a handle to it as a DBLock object. The locker parameter is an integer representing the entity doing the locking, and obj is an object representing the item to be locked. More info...
- lock_id()
Acquires a locker id, guaranteed to be unique across all threads and processes that have the DBEnv open. More info...
- lock_id_free(id)
Frees a locker ID allocated by the “dbenv.lock_id()” method. More info...
- lock_put(lock)
Release the lock. More info...
- lock_stat(flags=0)
- Returns a dictionary of locking subsystem statistics with the following keys:
id |
Last allocated lock ID. |
cur_maxid |
The current maximum unused locker ID. |
nmodes |
Number of lock modes. |
maxlocks |
Maximum number of locks possible. |
maxlockers |
Maximum number of lockers possible. |
maxobjects |
Maximum number of objects possible. |
nlocks |
Number of current locks. |
maxnlocks |
Maximum number of locks at once. |
nlockers |
Number of current lockers. |
nobjects |
Number of current lock objects. |
maxnobjects |
Maximum number of lock objects at once. |
maxnlockers |
Maximum number of lockers at once. |
nrequests |
Total number of locks requested. |
nreleases |
Total number of locks released. |
nupgrade |
Total number of locks upgraded. |
ndowngrade |
Total number of locks downgraded. |
lock_wait |
The number of lock requests not immediately available due to conflicts, for which the thread of control waited. |
lock_nowait |
The number of lock requests not immediately available due to conflicts, for which the thread of control did not wait. |
ndeadlocks |
Number of deadlocks. |
locktimeout |
Lock timeout value. |
nlocktimeouts |
The number of lock requests that have timed out. |
txntimeout |
Transaction timeout value. |
ntxntimeouts |
The number of transactions that have timed out. This value is also a component of ndeadlocks, the total number of deadlocks detected. |
objs_wait |
The number of requests to allocate or deallocate an object for which the thread of control waited. |
objs_nowait |
The number of requests to allocate or deallocate an object for which the thread of control did not wait. |
lockers_wait |
The number of requests to allocate or deallocate a locker for which the thread of control waited. |
lockers_nowait |
The number of requests to allocate or deallocate a locker for which the thread of control did not wait. |
locks_wait |
The number of requests to allocate or deallocate a lock structure for which the thread of control waited. |
locks_nowait |
The number of requests to allocate or deallocate a lock structure for which the thread of control did not wait. |
hash_len |
Maximum length of a lock hash bucket. |
regsize |
Size of the region. |
region_wait |
Number of times a thread of control was forced to wait before obtaining the region lock. |
region_nowait |
Number of times a thread of control was able to obtain the region lock without waiting. |
- set_tx_max(max)
Set the maximum number of active transactions. More info...
- set_tx_timestamp(timestamp)
Recover to the time specified by timestamp rather than to the most current possible date. More info...
- txn_begin(parent=None, flags=0)
Creates and begins a new transaction. A DBTxn object is returned. More info...
- txn_checkpoint(kbyte=0, min=0, flag=0)
Flushes the underlying memory pool, writes a checkpoint record to the log and then flushes the log. More info...
- txn_stat()
- Return a dictionary of transaction statistics with the following keys:
- lsn_reset(file=None, flags=0)
This method allows database files to be moved from one transactional database environment to another. More info...
- log_stat(flags=0)
- Returns a dictionary of logging subsystem statistics with the following keys:
magic |
The magic number that identifies a file as a log file. |
version |
The version of the log file type. |
||mode|| The mode of any created log files.
lg_bsize |
The in-memory log record cache size. |
lg_size |
The log file size. |
record |
The number of records written to this log. |
w_mbytes |
The number of megabytes written to this log. |
w_bytes |
The number of bytes over and above w_mbytes written to this log. |
wc_mbytes |
The number of megabytes written to this log since the last checkpoint. |
wc_bytes |
The number of bytes over and above wc_mbytes written to this log since the last checkpoint. |
wcount |
The number of times the log has been written to disk. |
wcount_fill |
The number of times the log has been written to disk because the in-memory log record cache filled up. |
rcount |
The number of times the log has been read from disk. |
scount |
The number of times the log has been flushed to disk. |
cur_file |
The current log file number. |
cur_offset |
The byte offset in the current log file. |
disk_file |
The log file number of the last record known to be on disk. |
disk_offset |
The byte offset of the last record known to be on disk. |
maxcommitperflush |
The maximum number of commits contained in a single log flush. |
mincommitperflush |
The minimum number of commits contained in a single log flush that contained a commit. |
regsize |
The size of the log region, in bytes. |
region_wait |
The number of times that a thread of control was forced to wait before obtaining the log region mutex. |
region_nowait The number of times that a thread of control was able to obtain the log region mutex without waiting. |
- txn_recover()
- Returns a list of tuples (GID, TXN) of transactions prepared but still unresolved. This is used while doing environment recovery in an application using distributed transactions.
This method must be called only from a single thread at a time. It should be called after DBEnv recovery. More info...
- set_verbose(which, onoff)
Turns specific additional informational and debugging messages in the Berkeley DB message output on and off. To see the additional messages, verbose messages must also be configured for the application. More info...
- get_verbose(which)
Returns whether the specified which parameter is currently set or not. More info...
- set_event_notify(eventFunc)
Configures a callback function which is called to notify the process of specific Berkeley DB events. More info...
DBEnv Replication Manager Methods
This module automates many of the tasks needed to provide replication abilities in a Berkeley DB system. The module is fairly limited, but enough in many cases. Users more demanding must use the full Base Replication API. This module requires POSIX support, so you must compile Berkeley DB with it if you want to be able to use the Replication Manager.
- repmgr_start(nthreads, flags)
Starts the replication manager. More info...
- repmgr_set_local_site(host, port, flags=0)
Specifies the host identification string and port number for the local system. More info...
- repmgr_add_remote_site(host, port, flags=0)
- Adds a new replication site to the replication manager’s list of known sites. It is not necessary for all sites in a replication group to know about all other sites in the group.
Method returns the environment ID assigned to the remote site. More info...
- repmgr_set_ack_policy(ack_policy)
Specifies how master and client sites will handle acknowledgment of replication messages which are necessary for “permanent” records. More info...
- repmgr_get_ack_policy()
Returns the replication manager’s client acknowledgment policy. More info...
- repmgr_site_list()
- Returns a dictionary with the status of the sites currently known by the replication manager.
- repmgr_stat(flags=0)
- Returns a dictionary with the replication manager statistics. Keys are:
perm_failed |
The number of times a message critical for maintaining database integrity (for example, a transaction commit), originating at this site, did not receive sufficient acknowledgement from clients, according to the configured acknowledgement policy and acknowledgement timeout. |
msgs_queued |
The number of outgoing messages which could not be transmitted immediately, due to a full network buffer, and had to be queued for later delivery. |
msgs_dropped |
The number of outgoing messages that were completely dropped, because the outgoing message queue was full. (Berkeley DB replication is tolerant of dropped messages, and will automatically request retransmission of any missing messages as needed.) |
connection_drop |
The number of times an existing TCP/IP connection failed. |
connect_fail |
The number of times an attempt to open a new TCP/IP connection failed. |
- repmgr_stat_print(flags=0)
Displays the replication manager statistical information. More info...
DBEnv Replication Methods
- rep_elect(nsites, nvotes)
Holds an election for the master of a replication group. More info...
- rep_set_transport(envid, transportFunc)
Initializes the communication infrastructure for a database environment participating in a replicated application. More info...
- rep_process_messsage(control, rec, envid)
- Processes an incoming replication message sent by a member of the replication group to the local database environment.
- rep_start(flags, cdata=None)
- Configures the database environment as a client or master in a group of replicated database environments.
The DB_ENV->rep_start method is not called by most replication applications. It should only be called by applications implementing their own network transport layer, explicitly holding replication group elections and handling replication messages outside of the replication manager framework. More info...
- rep_sync()
Forces master synchronization to begin for this client. This method is the other half of setting the DB_REP_CONF_DELAYCLIENT flag via the DB_ENV->rep_set_config method. More info...
- rep_set_config(which, onoff)
Configures the Berkeley DB replication subsystem. More info...
- rep_get_config(which)
Returns whether the specified which parameter is currently set or not. More info...
- rep_set_limit(bytes)
Sets a byte-count limit on the amount of data that will be transmitted from a site in response to a single message processed by the DB_ENV->rep_process_message method. The limit is not a hard limit, and the record that exceeds the limit is the last record to be sent. More info...
- rep_get_limit()
Gets a byte-count limit on the amount of data that will be transmitted from a site in response to a single message processed by the DB_ENV->rep_process_message method. The limit is not a hard limit, and the record that exceeds the limit is the last record to be sent. More info...
- rep_set_request(minimum, maximum)
Sets a threshold for the minimum and maximum time that a client waits before requesting retransmission of a missing message. Specifically, if the client detects a gap in the sequence of incoming log records or database pages, Berkeley DB will wait for at least min microseconds before requesting retransmission of the missing record. Berkeley DB will double that amount before requesting the same missing record again, and so on, up to a maximum threshold of max microseconds. More info...
- rep_get_request()
Returns a tuple with the minimum and maximum number of microseconds a client waits before requesting retransmission. More info...
- rep_set_nsites(nsites)
Specifies the total number of sites in a replication group. More info...
- rep_get_nsites()
Returns the total number of sites in the replication group. More info...
- rep_set_priority(priority)
Specifies the database environment’s priority in replication group elections. The priority must be a positive integer, or 0 if this environment cannot be a replication group master. More info...
- rep_get_priority()
Returns the database environment priority. More info...
- rep_set_timeout(which, timeout)
Specifies a variety of replication timeout values. More info...
- rep_get_timeout(which)
Returns the timeout value for the specified which parameter. More info...
DB
DB Methods
- DB(dbEnv=None, flags=0)
Constructor. More info...
- append(data, txn=None)
A convenient version of put() that can be used for Recno or Queue databases. The DB_APPEND flag is automatically used, and the record number is returned. More info...
- associate(secondaryDB, callback, txn=None, flags=0)
Used to associate secondaryDB to act as a secondary index for this (primary) database. The callback parameter should be a reference to a Python callable object that will construct and return the secondary key or DB_DONOTINDEX if the item should not be indexed. The parameters the callback will receive are the primaryKey and primaryData values. More info...
- close(flags=0)
Flushes cached data and closes the database. More info...
- consume(txn=None, flags=0)
For a database with the Queue access method, returns the record number and data from the first available record and deletes it from the queue. More info...
- consume_wait(txn=None, flags=0)
For a database with the Queue access method, returns the record number and data from the first available record and deletes it from the queue. If the Queue database is empty, the thread of control will wait until there is data in the queue before returning. More info...
- cursor(txn=None, flags=0)
Create a cursor on the DB and returns a DBCursor object. If a transaction is passed then the cursor can only be used within that transaction and you must be sure to close the cursor before commiting the transaction. More info...
- delete(key, txn=None, flags=0)
Removes a key/data pair from the database. More info...
- fd()
Returns a file descriptor for the database. More info...
- get(key, default=None, txn=None, flags=0, dlen=-1, doff=-1)
Returns the data object associated with key. If key is an integer then the DB_SET_RECNO flag is automatically set for BTree databases and the actual key and the data value are returned as a tuple. If default is given then it is returned if the key is not found in the database. Partial records can be read using dlen and doff, however be sure to not read beyond the end of the actual data or you may get garbage. More info...
- pget(key, default=None, txn=None, flags=0, dlen=-1, doff=-1)
This method is available only on secondary databases. It will return the primary key, given the secondary one, and associated data. More info...
- set_private(object)
- Link an arbitrary object to the DB.
- get_private()
- Give the object linked to the DB.
- get_both(key, data, txn=None, flags=0)
A convenient version of get() that automatically sets the DB_GET_BOTH flag, and which will be successful only if both the key and data value are found in the database. (Can be used to verify the presence of a record in the database when duplicate keys are allowed.) More info...
- get_byteswapped()
May be used to determine if the database was created on a machine with the same endianess as the current machine. More info...
- get_size(key, txn=None)
- Return the size of the data object associated with key.
- get_type()
Return the database’s access method type. More info...
- join(cursorList, flags=0)
Create and return a specialized cursor for use in performing joins on secondary indices. More info...
- key_range(key, txn=None, flags=0)
Returns an estimate of the proportion of keys that are less than, equal to and greater than the specified key. More info...
- open(filename, dbname=None, dbtype=DB_UNKNOWN, flags=0, mode=0660, txn=None)
Opens the database named dbname in the file named filename. The dbname argument is optional and allows applications to have multiple logical databases in a single physical file. It is an error to attempt to open a second database in a file that was not initially created using a database name. In-memory databases never intended to be shared or preserved on disk may be created by setting both the filename and dbname arguments to None. More info...
- put(key, data, txn=None, flags=0, dlen=-1, doff=-1)
Stores the key/data pair in the database. If the DB_APPEND flag is used and the database is using the Recno or Queue access method then the record number allocated to the data is returned. Partial data objects can be written using dlen and doff. More info...
- remove(filename, dbname=None, flags=0)
Remove a database. More info...
- rename(filename, dbname, newname, flags=0)
Rename a database. More info...
- set_encrypt(passwd, flags=0)
Set the password used by the Berkeley DB library to perform encryption and decryption. Because databases opened within Berkeley DB environments use the password specified to the environment, it is an error to attempt to set a password in a database created within an environment. More info...
- set_bt_compare(compareFunc)
Set the B-Tree database comparison function. This can only be called once before the database has been opened. compareFunc takes two arguments: (left key string, right key string) It must return a -1, 0, 1 integer similar to cmp. You can shoot your database in the foot, beware! Read the Berkeley DB docs for the full details of how the comparison function MUST behave. More info...
- set_bt_minkey(minKeys)
Set the minimum number of keys that will be stored on any single BTree page. More info...
- set_cachesize(gbytes, bytes, ncache=0)
Set the size of the database’s shared memory buffer pool. More info...
- set_get_returns_none(flag)
- Controls what get and related methods do when a key is not found.
- See the DBEnv set_get_returns_none documentation. The previous setting is returned.
- set_flags(flags)
Set additional flags on the database before opening. More info...
- set_h_ffactor(ffactor)
Set the desired density within the hash table. More info...
- set_h_nelem(nelem)
Set an estimate of the final size of the hash table. More info...
- set_lorder(lorder)
Set the byte order for integers in the stored database metadata. More info...
- set_pagesize(pagesize)
Set the size of the pages used to hold items in the database, in bytes. More info...
- set_re_delim(delim)
Set the delimiting byte used to mark the end of a record in the backing source file for the Recno access method. More info...
- set_re_len(length)
For the Queue access method, specify that the records are of length length. For the Recno access method, specify that the records are fixed-length, not byte delimited, and are of length length. More info...
- set_re_pad(pad)
Set the padding character for short, fixed-length records for the Queue and Recno access methods. More info...
- set_re_source(source)
Set the underlying source file for the Recno access method. More info...
- set_q_extentsize(extentsize)
Set the size of the extents used to hold pages in a Queue database, specified as a number of pages. Each extent is created as a separate physical file. If no extent size is set, the default behavior is to create only a single underlying database file. More info...
- stat(flags=0, txn=None)
- Return a dictionary containing database statistics with the following keys.
magic |
Magic number that identifies the file as a Hash database. |
version |
Version of the Hash database. |
nkeys |
Number of unique keys in the database. |
ndata |
Number of key/data pairs in the database. |
pagecnt |
The number of pages in the database. |
pagesize |
Underlying Hash database page (& bucket) size. |
nelem |
Estimated size of the hash table specified at database creation time. |
ffactor |
Desired fill factor (number of items per bucket) specified at database creation time. |
buckets |
Number of hash buckets. |
free |
Number of pages on the free list. |
bfree |
Number of bytes free on bucket pages. |
bigpages |
Number of big key/data pages. |
big_bfree |
Number of bytes free on big item pages. |
overflows |
Number of overflow pages (overflow pages are pages that contain items that did not fit in the main bucket page). |
ovfl_free |
Number of bytes free on overflow pages. |
dup |
Number of duplicate pages. |
dup_free |
Number of bytes free on duplicate pages. |
- For BTree and Recno databases:
magic |
Magic number that identifies the file as a Btree database. |
version |
Version of the Btree database. |
nkeys |
For the Btree Access Method, the number of unique keys in the database.For the Recno Access Method, the number of records in the database. If the database has been configured to not re-number records during deletion, the number of records may include records that have been deleted. |
ndata |
For the Btree Access Method, the number of key/data pairs in the database.For the Recno Access Method, the number of records in the database. If the database has been configured to not re-number records during deletion, the number of records may include records that have been deleted. |
pagecnt |
The number of pages in the database. |
pagesize |
Underlying database page size. |
minkey |
Minimum keys per page. |
re_len |
Length of fixed-length records. |
re_pad |
Padding byte value for fixed-length records. |
levels |
Number of levels in the database. |
int_pg |
Number of database internal pages. |
leaf_pg |
Number of database leaf pages. |
dup_pg |
Number of database duplicate pages. |
over_pg |
Number of database overflow pages. |
empty_pg |
Number of empty database pages. |
free |
Number of pages on the free list. |
int_pgfree |
Num of bytes free in database internal pages. |
leaf_pgfree |
Number of bytes free in database leaf pages. |
dup_pgfree |
Num bytes free in database duplicate pages. |
over_pgfree |
Num of bytes free in database overflow pages. |
- For Queue databases:
magic |
Magic number that identifies the file as a Queue database. |
version |
Version of the Queue file type. |
nkeys |
Number of records in the database. |
ndata |
Number of records in the database. |
pagesize |
Underlying database page size. |
extentsize |
Underlying database extent size, in pages. |
pages |
Number of pages in the database. |
re_len |
Length of the records. |
re_pad |
Padding byte value for the records. |
pgfree |
Number of bytes free in database pages. |
first_recno |
First undeleted record in the database. |
cur_recno |
Last allocated record number in the database. |
- sync(flags=0)
Flushes any cached information to disk. More info...
- truncate(txn=None, flags=0)
Empties the database, discarding all records it contains. The number of records discarded from the database is returned. More info...
- upgrade(filename, flags=0)
Upgrades all of the databases included in the file filename, if necessary. More info...
- verify(filename, dbname=None, outfile=None, flags=0)
Verifies the integrity of all databases in the file specified by the filename argument, and optionally outputs the databases’ key/data pairs to a file. More info...
DB Mapping and Compatibility Methods
- These methods of the DB type are for implementing the Mapping Interface, as well as others for making a DB behave as much like a dictionary as possible. The main downside to using a DB as a dictionary is you are not able to specify a transaction object.
- DB_length() [ usage: len(db) ]
- Return the number of key/data pairs in the database.
- DB_subscript(key) [ usage: db[key] ]
- Return the data associated with key.
- DB_ass_sub(key, data) [ usage: db[key] = data ]
- Assign or update a key/data pair, or delete a key/data pair if data is NULL.
- keys(txn=None)
- Return a list of all keys in the database. Warning: this method traverses the entire database so it can possibly take a long time to complete.
- items(txn=None)
- Return a list of tuples of all key/data pairs in the database. Warning: this method traverses the entire database so it can possibly take a long time to complete.
- values(txn=None)
- Return a list of all data values in the database. Warning: this method traverses the entire database so it can possibly take a long time to complete.
- has_key(key, txn=None)
- Returns true if key is present in the database.
DBCursor
DBCursor Methods
- close()
Discards the cursor. If the cursor is created within a transaction then you must be sure to close the cursor before commiting the transaction. More info...
- count(flags=0)
Returns a count of the number of duplicate data items for the key referenced by the cursor. More info...
- delete(flags=0)
Deletes the key/data pair currently referenced by the cursor. More info...
- dup(flags=0)
Create a new cursor. More info...
- put(key, data, flags=0, dlen=-1, doff=-1)
Stores the key/data pair into the database. Partial data records can be written using dlen and doff. More info...
- get(flags, dlen=-1, doff=-1)
- See get(key, data, flags, dlen=-1, doff=-1) below.
- get(key, flags, dlen=-1, doff=-1)
- See get(key, data, flags, dlen=-1, doff=-1) below.
- get(key, data, flags, dlen=-1, doff=-1)
Retrieves key/data pairs from the database using the cursor. All the specific functionalities of the get method are actually provided by the various methods below, which are the preferred way to fetch data using the cursor. These generic interfaces are only provided as an inconvenience. Partial data records are returned if dlen and doff are used in this method and in many of the specific methods below. More info...
- pget(flags, dlen=-1, doff=-1)
- See pget(key, data, flags, dlen=-1, doff=-1) below.
- pget(key, flags, dlen=-1, doff=-1)
- See pget(key, data, flags, dlen=-1, doff=-1) below.
- pget(key, data, flags, dlen=-1, doff=-1)
Similar to the already described get(). This method is available only on secondary databases. It will return the primary key, given the secondary one, and associated data More info...
DBCursor Get Methods
- These DBCursor methods are all wrappers around the get() function in the C API.
- current(flags=0, dlen=-1, doff=-1)
Returns the key/data pair currently referenced by the cursor. More info...
- get_current_size()
- Returns length of the data for the current entry referenced by the cursor.
- first(flags=0, dlen=-1, doff=-1)
Position the cursor to the first key/data pair and return it. More info...
- last(flags=0, dlen=-1, doff=-1)
Position the cursor to the last key/data pair and return it. More info...
- next(flags=0, dlen=-1, doff=-1)
Position the cursor to the next key/data pair and return it. More info...
- prev(flags=0, dlen=-1, doff=-1)
Position the cursor to the previous key/data pair and return it. More info...
- consume(flags=0, dlen=-1, doff=-1)
- For a database with the Queue access method, returns the record number and data from the first available record and deletes it from the queue.
NOTE: This method is deprecated in Berkeley DB version 3.2 in favor of the new consume method in the DB class.
- get_both(key, data, flags=0)
Like set() but positions the cursor to the record matching both key and data. (An alias for this is set_both, which makes more sense to me...) More info...
- get_recno()
Return the record number associated with the cursor. The database must use the BTree access method and have been created with the DB_RECNUM flag. More info...
- join_item(flags=0)
For cursors returned from the DB.join method, returns the combined key value from the joined cursors. More info...
- next_dup(flags=0, dlen=-1, doff=-1)
If the next key/data pair of the database is a duplicate record for the current key/data pair, the cursor is moved to the next key/data pair of the database, and that pair is returned. More info...
- next_nodup(flags=0, dlen=-1, doff=-1)
The cursor is moved to the next non-duplicate key/data pair of the database, and that pair is returned. More info...
- prev_nodup(flags=0, dlen=-1, doff=-1)
The cursor is moved to the previous non-duplicate key/data pair of the database, and that pair is returned. More info...
- set(key, flags=0, dlen=-1, doff=-1)
Move the cursor to the specified key in the database and return the key/data pair found there. More info...
- set_range(key, flags=0, dlen=-1, doff=-1)
Identical to set() except that in the case of the BTree access method, the returned key/data pair is the smallest key greater than or equal to the specified key (as determined by the comparison function), permitting partial key matches and range searches. More info...
- set_recno(recno, flags=0, dlen=-1, doff=-1)
Move the cursor to the specific numbered record of the database, and return the associated key/data pair. The underlying database must be of type Btree and it must have been created with the DB_RECNUM flag. More info...
- set_both(key, data, flags=0)
See get_both(). The only difference in behaviour can be disabled using set_get_returns_none(2). More info...
DBTxn
DBTxn Methods
- abort()
Aborts the transaction More info...
- commit(flags=0)
Ends the transaction, committing any changes to the databases. More info...
- id()
The txn_id function returns the unique transaction id associated with the specified transaction. More info...
- prepare(gid)
Initiates the beginning of a two-phase commit. Begining with Berkeley DB 3.3 a global identifier paramater is required, which is a value unique across all processes involved in the commit. It must be a string of DB_XIDDATASIZE bytes. More info...
- discard()
- This method frees up all the per-process resources associated with the specified transaction, neither committing nor aborting the transaction. The transaction will be keep in “unresolved” state. This call may be used only after calls to “dbenv.txn_recover()”. A “unresolved” transaction will be returned again thru new calls to “dbenv.txn_recover()”.
- For example, when there are multiple global transaction managers recovering transactions in a single Berkeley DB environment, any transactions returned by “dbenv.txn_recover()” that are not handled by the current global transaction manager should be discarded using “txn.discard()”.
DBLock
- The DBLock objects have no methods or attributes. They are just opaque handles to the lock in question. They are managed via DBEnv methods.
DBSequence
- Sequences provide an arbitrary number of persistent objects that return an increasing or decreasing sequence of integers. Opening a sequence handle associates it with a record in a database. The handle can maintain a cache of values from the database so that a database update is not needed as the application allocates a value.
DBSequence Methods
- DBSequence(db, flags=0)
Constructor. More info...
- open(key, txn=None, flags=0)
Opens the sequence represented by the key. More info...
- close(flags=0)
Close a DBSequence handle. More info...
- initial_value(value)
Set the initial value for a sequence. This call is only effective when the sequence is being created. More info...
- get(delta=1, txn=None, flags=0)
Returns the next available element in the sequence and changes the sequence value by delta. More info...
- get_dbp()
Returns the DB object associated to the DBSequence. More info...
- get_key()
Returns the key for the sequence. More info...
- remove(txn=None, flags=0)
Removes the sequence from the database. This method should not be called if there are other open handles on this sequence. More info...
- get_cachesize()
Returns the current cache size. More info...
- set_cachesize(size)
Configure the number of elements cached by a sequence handle. More info...
- get_flags()
Returns the current flags. More info...
- set_flags(flags)
Configure a sequence. More info...
- stat(flags=0)
- Returns a dictionary of sequence statistics with the following keys:
wait |
The number of times a thread of control was forced to wait on the handle mutex. |
nowait |
The number of times that a thread of control was able to obtain handle mutex without waiting. |
current |
The current value of the sequence in the database. |
value |
The current cached value of the sequence. |
last_value |
The last cached value of the sequence. |
min |
The minimum permitted value of the sequence. |
max |
The maximum permitted value of the sequence. |
cache_size |
The number of values that will be cached in this handle. |
flags |
The flags value for the sequence. |
- get_range()
Returns a tuple representing the range of values in the sequence. More info...
- set_range((min, max))
Configure a sequence range. More info...
History
- This module was started by Andrew Kuchling (amk) to remove the dependency on SWIG in a package by Gregory P. Smith who based his work on a similar package by Robin Dunn which wrapped Berkeley DB 2.7.x.
Development then returned full circle back to Robin Dunn working in behalf of Digital Creations to complete the SWIG-less wrapping of the DB 3.x API and to build a solid unit test suite. Having completed that, Robin is now busy with another project (wxPython) and Greg has returned as maintainer. Jesus Cea Avion is the maintainer of this code since February 2008.
This module is included in the standard python >= 2.3 distribution as the bsddb module. The only reason you should look here is for documentation or to get a more up to date version. The bsddb.db module aims to mirror much of the Berkeley DB C/C++ API.