Size: 15508
Comment:
|
Size: 19907
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 4: | Line 4: |
[[BR]] ''' 2.3 第一步: 表现记录 ''' [[BR]] 当我们要在数据库中储存记录时,首先要做的就是要适当地定义那些记录该是什么样的. 在Python语言中,有很多方法表现人们的信息,很多内置的对象类型,例如链表,字典在大部分情况下已足够. 特别是你不关注数据存储的过程时. |
|
Line 8: | Line 12: |
[[BR]] '''2.3.1 使用链表''' [[BR]] 链表, 它可以以顺序方式存储人们的属性信息. 启动你的Python解释器, 键入下面的两个句子(在IDLE GUI,当你在shell中键入{{{python}}}时, 就会出现Python提示符:{{{>>>}}}, 如果从来没这样运行过Python代码, 可以先看一些如O'Reilly's Learning Python的书来帮助你起步): |
|
Line 14: | Line 22: |
[[BR]] 我们刚刚创建了两条记录.虽然简单.用来表现两个人.Bob 和 Sue (如果你的名字是Bob或Sue,或者类似 我为此表示歉意.).每条记录是一四个属性的链表.名字.年龄.薪水和工作领域.我们可以简单地通过位置索引来访问这些属性.(两个结果在括号内是因为这是两个结果的一个元组): [[BR]] |
|
Line 16: | Line 27: |
.^^不,我是认真的, 例如有一次我教的Python课上.我用"Bob Smith",年龄40.5,工作为开发者和管理者的人作为虚构的数据记录好多年了. 直到最近在芝加哥的一次课上. 我遇到一位学员名叫 Bob Smith. 40.5岁. 并且工作也是开发者和管理者.这个世界并不是你们所看到的那样. | |
Line 21: | Line 33: |
[[BR]] 在这种数据表现形式下处理记录相当容易.我们只需要利用一些链表的操作.例如.我们可以通过用空格分隔记录的姓名. 然后取最后一段来得到记录中人的名. 同时我们可以通过就地替换记录来为某人加工资: |
|
Line 30: | Line 43: |
[[BR]] 这里处理名的表达式是从左到右.我们先取了Bob的姓名.以空格将其分隔成一个链表.然后通过名的位置索引出值(运行一次第一步看看它是如何执行的). |
|
Line 34: | Line 49: |
[[BR]] 当然,我们现在只有两个变量.并不是一个数据库.为了将Bob和Sue放入一个集合.我们仅仅将它们放了另一个链表. |
|
Line 43: | Line 59: |
{{{ >>> |
[[BR]] 现在,people链表用来表现我们的数据库. 我们可以通过记录的位置索引来取得记录.并依次在循环中处理. {{{ >>> people[1][0] |
Line 47: | Line 64: |
>>> # print last names # give each a 20% raise |
>>> for person in people: print person[0].split( )[-1] # print last names person[2] *= 1.20 # give each a 20% raise |
Line 52: | Line 69: |
>>> # check new pay | >>>for person in people: print person[2] # check new pay |
Line 57: | Line 74: |
{{{ >>> # collect all pay >>> |
[[BR]] 现在我们有一个链表.我们可以利用Python强大的迭代工具如包含链表.映射.表达式计算来读取记录中的值. {{{ >>> pays = [person[2] for person in people] # collect all pay >>> pays |
Line 62: | Line 80: |
>>> # ditto >>> |
>>> pays = map((lambda x: x[2]), people) # ditto >>> pays |
Line 65: | Line 83: |
>>> # generator expression sum (2.4) | >>> sum(person[2] for person in people) # generator expression sum (2.4) |
Line 69: | Line 87: |
{{{ >>> >>> |
[[BR]] 为了给链表中插入新的记录.链表的常用操作如{{{append}}}和{{{extend}}}可以满足. {{{ >>> people.append(['Tom', 50, 0, None]) >>> len(people) |
Line 74: | Line 93: |
>>> | >>> people[-1][0] |
Line 78: | Line 97: |
[[BR]] 链表可以用于的我们的people数据库.同时他们可能对于很多程序是满足需求的.但是同时链表也有一些缺点.如这一点.现在的Bob和sue仅仅存在于内存中.一旦我们退出Python.他们就会消失.还有.每次你需要提取名或给某人涨薪时. 我们必须一次一次地重复键入.这样就会造成一个问题.一旦你需要改变某些操作时.你可能需要更新很多处的代码.我们将在下面的部分说到这些问题. |
|
Line 86: | Line 107: |
>>> >>> >>> |
>>> NAME, AGE, PAY = range(3) # [0, 1, 2] >>> bob = ['Bob Smith', 42, 10000] >>> bob[NAME] |
Line 90: | Line 111: |
>>> | >>> PAY, bob[PAY] |
Line 100: | Line 121: |
>>> >>> >>> }}} |
>>> bob = [['name', 'Bob Smith'], ['age', 42], ['pay', 10000]] >>> sue = [['name', 'Sue Jones'], ['age', 45], ['pay', 20000]] >>> people = [bob, sue] }}} |
Line 106: | Line 127: |
>>> # name, pay |
>>> for person in people: print person[0][1], person[2][1] # name, pay |
Line 110: | Line 131: |
>>> # collect names ['Bob Smith', 'Sue Jones'] >>> # get last names # give a 10% raise |
>>> [person[0][1] for person in people] # collect names ['Bob Smith', 'Sue Jones'] >>> for person in people: print person[0][1].split( )[-1] # get last names person[2][1] *= 1.10 # give a 10% raise |
Line 117: | Line 138: |
>>> | >>> for person in people: print person[2] |
Line 124: | Line 145: |
>>> # find a specific field |
>>>for person in people: for (name, value) in person: if name == 'name': print value # find a specific field |
Line 132: | Line 154: |
>>> # find any field by name >>> |
>>> def field(record, label): for (fname, fvalue) in record: # find any field by name if fname == label: return fvalue >>> field(bob, 'name') |
Line 136: | Line 161: |
>>> | >>> field(sue, 'pay') |
Line 138: | Line 163: |
>>> # print all ages |
>>> for rec in people: print field(rec, 'age') # print all ages |
Line 150: | Line 175: |
>>> >>> }}} |
>>> bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'} >>> sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'mus'} }}} |
Line 155: | Line 181: |
>>> # not bob[0], sue[2] | >>> bob['name'], sue['pay'] # not bob[0], sue[2] |
Line 157: | Line 183: |
>>> | >>> bob['name'].split( )[-1] |
Line 159: | Line 185: |
>>> >>> |
>>> sue['pay'] *= 1.10 >>> sue['pay'] |
Line 170: | Line 196: |
>>> >>> |
>>> bob = dict(name='Bob Smith', age=42, pay=30000, job='dev') >>> bob |
Line 178: | Line 204: |
>>> >>> >>> >>> >>> >>> |
>>> M = [[1, 2, 3], >>> [4, 5, 6], >>> [7, 8, 9]] >>> N = [[2, 2, 2], >>> [3, 3, 3], >>> [4, 4, 4]] |
2.3. Step 1: Representing Records BR If we're going to store records in a database, the first step is probably deciding what those records will look like. There are a variety of ways to represent information about people in the Python language. Built-in object types such as lists and dictionaries are often sufficient, especially if we don't care about processing the data we store. BR 2.3 第一步: 表现记录 BR 当我们要在数据库中储存记录时,首先要做的就是要适当地定义那些记录该是什么样的. 在Python语言中,有很多方法表现人们的信息,很多内置的对象类型,例如链表,字典在大部分情况下已足够. 特别是你不关注数据存储的过程时. BRBR 2.3.1. Using Lists BR Lists, for example, can collect attributes about people in a positionally ordered way. Start up your Python interactive interpreter and type the following two statements (this works in the IDLE GUI, after typing python at a shell prompt, and so on, and the >>> characters are Python's promptif you've never run Python code this way before, see an introductory resource such as O'Reilly's Learning Python for help with getting started): BR 2.3.1 使用链表 BR 链表, 它可以以顺序方式存储人们的属性信息. 启动你的Python解释器, 键入下面的两个句子(在IDLE GUI,当你在shell中键入python时, 就会出现Python提示符:>>>, 如果从来没这样运行过Python代码, 可以先看一些如O'Reilly's Learning Python的书来帮助你起步):
>>> bob = ['Bob Smith', 42, 30000, 'software'] >>> sue = ['Sue Jones', 45, 40000, 'music']
We've just made two records, albeit simple ones, to represent two people, Bob and Sue (my apologies if you really are Bob or Sue, generically or otherwise). Each record is a list of four properties: name, age, pay, and job field. To access these fields, we simply index by position (the result is in parentheses here because it is a tuple of two results): BR 我们刚刚创建了两条记录.虽然简单.用来表现两个人.Bob 和 Sue (如果你的名字是Bob或Sue,或者类似 我为此表示歉意.).每条记录是一四个属性的链表.名字.年龄.薪水和工作领域.我们可以简单地通过位置索引来访问这些属性.(两个结果在括号内是因为这是两个结果的一个元组): BR
No, I'm serious. For an example I present in Python classes I teach, I had for many years regularly used the named "Bob Smith," age 40.5, and jobs "developer" and "manager" as a supposedly fictitious database recorduntil a recent class in Chicago, where I met a student name Bob Smith who was 40.5 and was a developer and manager. The world is stranger than it seems.
不,我是认真的, 例如有一次我教的Python课上.我用"Bob Smith",年龄40.5,工作为开发者和管理者的人作为虚构的数据记录好多年了. 直到最近在芝加哥的一次课上. 我遇到一位学员名叫 Bob Smith. 40.5岁. 并且工作也是开发者和管理者.这个世界并不是你们所看到的那样.
>>> bob[0], sue[2] # fetch name, pay ('Bob Smith', 40000)
Processing records is easy with this representation; we just use list operations. For example, we can extract a last name by splitting the name field on blanks and grabbing the last part, and we may give someone a raise by changing their list in-place: BR 在这种数据表现形式下处理记录相当容易.我们只需要利用一些链表的操作.例如.我们可以通过用空格分隔记录的姓名. 然后取最后一段来得到记录中人的名. 同时我们可以通过就地替换记录来为某人加工资:
>>> bob[0].split( )[-1] # what's bob's last name? 'Smith' >>> sue[2] *= 1.25 # give sue a 25% raise >>> sue ['Sue Jones', 45, 50000.0, 'music']
The last-name expression here proceeds from left to right: we fetch Bob's name, split it into a list of substrings around spaces, and index his last name (run it one step at a time to see how). BR 这里处理名的表达式是从左到右.我们先取了Bob的姓名.以空格将其分隔成一个链表.然后通过名的位置索引出值(运行一次第一步看看它是如何执行的). BRBR 2.3.1.1. A database list BR Of course, what we really have at this point is just two variables, not a database; to collect Bob and Sue into a unit, we might simply stuff them into another list: BR 当然,我们现在只有两个变量.并不是一个数据库.为了将Bob和Sue放入一个集合.我们仅仅将它们放了另一个链表.
>>> people = [bob, sue] >>> for person in people: print person ['Bob Smith', 42, 30000, 'software'] ['Sue Jones', 45, 50000.0, 'music']
Now, the people list represents our database. We can fetch specific records by their relative positions and process them one at a time, in loops: BR 现在,people链表用来表现我们的数据库. 我们可以通过记录的位置索引来取得记录.并依次在循环中处理.
>>> people[1][0] 'Sue Jones' >>> for person in people: print person[0].split( )[-1] # print last names person[2] *= 1.20 # give each a 20% raise Smith Jones >>>for person in people: print person[2] # check new pay 36000.0 60000.0
Now that we have a list, we can also collect values from records using some of Python's more powerful iteration tools, such as list comprehensions, maps, and generator expressions: BR 现在我们有一个链表.我们可以利用Python强大的迭代工具如包含链表.映射.表达式计算来读取记录中的值.
>>> pays = [person[2] for person in people] # collect all pay >>> pays [36000.0, 60000.0] >>> pays = map((lambda x: x[2]), people) # ditto >>> pays [36000.0, 60000.0] >>> sum(person[2] for person in people) # generator expression sum (2.4) 96000.0
To add a record to the database, the usual list operations, such as append and extend, will suffice: BR 为了给链表中插入新的记录.链表的常用操作如append和extend可以满足.
>>> people.append(['Tom', 50, 0, None]) >>> len(people) 3 >>> people[-1][0] 'Tom'
Lists work for our people database, and they might be sufficient for some programs, but they suffer from a few major flaws. For one thing, Bob and Sue, at this point, are just fleeting objects in memory that will disappear once we exit Python. For another, every time we want to extract a last name or give a raise, we'll have to repeat the kinds of code we just typed; that could become a problem if we ever change the way those operations workwe may have to update many places in our code. We'll address these issues in a few moments. BR 链表可以用于的我们的people数据库.同时他们可能对于很多程序是满足需求的.但是同时链表也有一些缺点.如这一点.现在的Bob和sue仅仅存在于内存中.一旦我们退出Python.他们就会消失.还有.每次你需要提取名或给某人涨薪时. 我们必须一次一次地重复键入.这样就会造成一个问题.一旦你需要改变某些操作时.你可能需要更新很多处的代码.我们将在下面的部分说到这些问题. BRBR 2.3.1.2. Field labels BR Perhaps more fundamentally, accessing fields by position in a list requires us to memorize what each position means: if you see a bit of code indexing a record on magic position 2, how can you tell it is extracting a pay? In terms of understanding the code, it might be better to associate a field name with a field value.
We might try to associate names with relative positions by using the Python range built-in function, which builds a list of successive integers:
>>> NAME, AGE, PAY = range(3) # [0, 1, 2] >>> bob = ['Bob Smith', 42, 10000] >>> bob[NAME] 'Bob Smith' >>> PAY, bob[PAY] (2, 10000)
This addresses readability: the three variables essentially become field names. This makes our code dependent on the field position assignments, thoughwe have to remember to update the range assignments whenever we change record structure. Because they are not directly associated, the names and records may become out of sync over time and require a maintenance step.
Moreover, because the field names are independent variables, there is no direct mapping from a record list back to its field's names. A raw record, for instance, provides no way to label its values with field names in a formatted display. In the preceding record, without additional code, there is no path from value 42 to label AGE.
We might also try this by using lists of tuples, where the tuples record both a field name and a value; better yet, a list of lists would allow for updates (tuples are immutable). Here's what that idea translates to, with slightly simpler records:
>>> bob = [['name', 'Bob Smith'], ['age', 42], ['pay', 10000]] >>> sue = [['name', 'Sue Jones'], ['age', 45], ['pay', 20000]] >>> people = [bob, sue]
This really doesn't fix the problem, though, because we still have to index by position in order to fetch fields:
>>> for person in people: print person[0][1], person[2][1] # name, pay Bob Smith 10000 Sue Jones 20000 >>> [person[0][1] for person in people] # collect names ['Bob Smith', 'Sue Jones'] >>> for person in people: print person[0][1].split( )[-1] # get last names person[2][1] *= 1.10 # give a 10% raise Smith Jones >>> for person in people: print person[2] ['pay', 11000.0] ['pay', 22000.0]
All we've really done here is add an extra level of positional indexing. To do better, we might inspect field names in loops to find the one we want (the loop uses tuple assignment here to unpack the name/value pairs):
>>>for person in people: for (name, value) in person: if name == 'name': print value # find a specific field Bob Smith Sue Jones
Better yet, we can code a fetcher function to do the job for us:
>>> def field(record, label): for (fname, fvalue) in record: # find any field by name if fname == label: return fvalue >>> field(bob, 'name') 'Bob Smith' >>> field(sue, 'pay') 22000.0 >>> for rec in people: print field(rec, 'age') # print all ages 42 45
If we proceed down this path, we'll eventually wind up with a set of record interface functions that generically map field names to field data. If you've done any Python coding in the past, you probably already know that there is an easier way to code this sort of association, and you can probably guess where we're headed in the next section. BRBR 2.3.2. Using Dictionaries BR The list-based record representations in the prior section work, though not without some cost in terms of performance required to search for field names (assuming you need to care about milliseconds and such). But if you already know some Python, you also know that there are more convenient ways to associate property names and values. The built-in dictionary object is a natural:
>>> bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'} >>> sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'mus'}
Now, Bob and Sue are objects that map field names to values automatically, and they make our code more understandable and meaningful. We don't have to remember what a numeric offset means, and we let Python search for the value associated with a field's name with its efficient dictionary indexing:
>>> bob['name'], sue['pay'] # not bob[0], sue[2] ('Bob Smith', 40000) >>> bob['name'].split( )[-1] 'Smith' >>> sue['pay'] *= 1.10 >>> sue['pay'] 44000.0
Because fields are accessed mnemonically now, they are more meaningful to those who read your code (including you). BRBR 2.3.2.1. Other ways to make dictionaries BR Dictionaries turn out to be so useful in Python programming that there are even more convenient ways to code them than the traditional literal syntax shown earliere.g., with keyword arguments and the type constructor:
>>> bob = dict(name='Bob Smith', age=42, pay=30000, job='dev') >>> bob {'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
Lists are convenient any time we need an ordered container of other objects that may need to change over time. A simple way to represent matrixes in Python, for instance, is as a list of nested liststhe top list is the matrix, and the nested lists are the rows: Now, to combine one matrix's components with another's, step over their indexes with nested loops; here's a simple pairwise multiplication: To build up a new matrix with the results, we just need to create the nested list structure along the way: Nested list comprehensions such as either of the following will do the same job, albeit at some cost in complexity (if you have to think hard about expressions like these, so will the next person who has to read your code!): List comprehensions are powerful tools, provided you restrict them to simple tasksfor example, listing selected module functions, or stripping end-of-lines: If you are interested in matrix processing, also see the mathematical and scientific extensions available for Python in the public domain, such as those available through NumPy and SciPy. The code here works, but extensions provide optimized tools. NumPy, for instance, is seen by some as an open source Matlab equivalent. |
by filling out a dictionary one field at a time:
>>> M = [[1, 2, 3], >>> [4, 5, 6], >>> [7, 8, 9]] >>> N = [[2, 2, 2], >>> [3, 3, 3], >>> [4, 4, 4]] {'job': 'mus', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'}
and by zipping together name/value lists:
>>> >>> >>> [('name', 'Sue Jones'), ('age', 45), ('pay', 40000), ('job', 'mus')] >>> >>> {'job': 'mus', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'}
We can even make dictionaries today from a sequence of key values and an optional starting value for all the keys (handy to initialize an empty dictionary):
>>> >>> >>> {'job': '?', 'pay': '?', 'age': '?', 'name': '?'}
BRBR 2.3.2.2. Lists of dictionaries BR Regardless of how we code them, we still need to collect our records into a database; a list does the trick again, as long as we don't require access by key:
>>> >>> # all name, pay Bob Smith 30000 Sue Jones 44000.0 >>> # fetch sue's pay 44000.0
Iteration tools work just as well here, but we use keys rather than obscure positions (in database terms, the list comprehension and map in the following code project the database on the "name" field column):
>>> # collect names >>> ['Bob Smith', 'Sue Jones'] >>> # ditto ['Bob Smith', 'Sue Jones'] >>> # sum all pay 74000.0
And because dictionaries are normal Python objects, these records can also be accessed and updated with normal Python syntax:
>>> # last name # a 10% raise Smith Jones >>> 33000.0 48400.0
BRBR 2.3.2.3. Nested structures BR Incidentally, we could avoid the last-name extraction code in the prior examples by further structuring our records. Because all of Python's compound datatypes can be nested inside each other and as deeply as we like, we can build up fairly complex information structures easilysimply type the object's syntax, and Python does all the work of building the components, linking memory structures, and later reclaiming their space. This is one of the great advantages of a scripting language such as Python.
The following, for instance, represents a more structured record by nesting a dictionary, list, and tuple inside another dictionary:
>>>
Because this record contains nested structures, we simply index twice to go two levels deep:
>>> # bob's full name {'last': 'Smith', 'first': 'Bob'} >>> # bob's last name 'Smith' >>> # bob's upper pay 50000
The name field is another dictionary here, so instead of splitting up a string, we simply index to fetch the last name. Moreover, people can have many jobs, as well as minimum and maximum pay limits. In fact, Python becomes a sort of query language in such caseswe can fetch or change nested data with the usual object operations:
>>> # all of bob's jobs software writing >> # bob's last job 'writing' >>> # bob gets a new job >>> {'job': ['software', 'writing', 'janitor'], 'pay': (40000, 50000), 'age': 42, 'name': {'last': 'Smith', 'first': 'Bob'}}
It's OK to grow the nested list with append, because it is really an independent object. Such nesting can come in handy for more sophisticated applications; to keep ours simple, we'll stick to the original flat record structure. BRBR 2.3.2.4. Dictionaries of dictionaries BR One last twist on our people database: we can get a little more mileage out of dictionaries here by using one to represent the database itself. That is, we can use a dictionary of dictionariesthe outer dictionary is the database, and the nested dictionaries are the records within it. Rather than a simple list of records, a dictionary-based database allows us to store and retrieve records by symbolic key:
>>> >>> >>> >>> >>> # fetch bob's name 'Bob Smith' >>> # change sue's pay >>> # fetch sue's pay 50000
Notice how this structure allows us to access a record directly instead of searching for it in a loop (we get to Bob's name immediately by indexing on key bob). This really is a dictionary of dictionaries, though you won't see all the gory details unless you display the database all at once:
>>> {'bob': {'pay': 33000.0, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}, 'sue': {'job': 'mus', 'pay': 50000, 'age': 45, 'name': 'Sue Jones'}}
If we still need to step through the database one record at a time, we can now rely on dictionary iterators. In recent Python releases, a dictionary iterator produces one key in a for loop each time through (in earlier releases, call the keys method explicitly in the for loop: say db.keys( ) rather than just db):
>>> bob => Bob Smith sue => Sue Jones >>> bob => 33000.0 sue => 50000
To visit all records, either index by key as you go:
>>> Smith Jones
or step through the dictionary's values to access records directly:
>>> 36300.0 55000.0 >>> >>> ['Bob Smith', 'Sue Jones'] >>> >>> ['Bob Smith', 'Sue Jones']
And to add a new record, simply assign it to a new key; this is just a dictionary, after all:
>>> >>> >>> {'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'} >>> 'Tom' >>> ['bob', 'sue', 'tom'] >>> 3
Although our database is still a transient object in memory, it turns out that this dictionary-of-dictionaries format corresponds exactly to a system that saves objects permanentlythe shelve (yes, this should be shelf grammatically speaking, but the Python module name and term is shelve). To learn how, let's move on to the next section.