8-16<Xr9-> Review: Chapter 9-

Contents

8-16<Xr9-> Review: Chapter 9-
9. XML 处理
1. 9.4 Unicode

9. XML 处理

9.4 Unicode

例9.19, (2)：

>>> from xml.dom import minidom
>>> xmldoc = minidom.parse('russiansample.xml') (1)
>>> title = xmldoc.getElementsByTagName('title')[0].firstChild.data
>>> title                                       (2)
u'\u041f\u0440\u0435\u0434\u0438\u0441\u043b\u043e\u0432\u0438\u0435

(2)

Note that the text data of the title tag (now in the title variable, thanks to that long concatenation of Python functions which I hastily skipped over and, annoyingly, won't explain until the next section) -- the text data inside the XML document's title element is stored in unicode.
注意 title 标记的文本数据（现在在 title 变量中，幸亏有 Python 函数的长串联，我快速地将它跳过去，并且在下一节之前不会进行解释）——在 XML 文档的 title 元素中的文本数据是以 unicode 保存的。
原文就有些晦涩，我的理解是：上面的一行 (title = xmldoc.getElementsByTagName('title')[0].firstChild.data) 和我们目前的话题没什么关系，所以我把它全部写在一行中，目前暂且跳过。翻译成中文似乎不好表达，所以我姑且采取了折衷的办法，只保留了大概的意思。
注意 title 标记（现在在 title 变量中，上面那一长串 Python 函数我们暂且跳过，下一节再解释）——在 XML 文档的 title 元素中的文本数据是以 unicode 保存的。

DiveIntoPythonZh/2007-08-16

8-16<Xr9-> Review: Chapter 9-

9. XML 处理

9.4 Unicode