===========================
 ʹminidomXMLʾ
===========================

:: limodou
:ϵ: chatme@263.net
:汾: $Id: xml.txt 42 2005-09-28 05:19:21Z limodou $
:ҳ: http://wiki.woodpecker.org.cn/moin/NewEdit
:BLOG: http://www.donews.net/limodou
:Ȩ: GPL

.. contents::

һXMLĶȡ 
-------------

 NewEdit дƬεĹܣƬηΪƬεķƬεݡȱʡ¶XMLʽġ
һ£ʹminidomȡͱXMLļ

Ƭηһʾļ--catalog.xml::

	<?xml version="1.0" encoding="utf-8"?>
	<catalog>
		<maxid>4</maxid>
		<item id="1">
			<caption>Python</caption>
			<item id="4">
				<caption></caption>
			</item>
		</item>
		<item id="2">
			<caption>Zope</caption>
		</item>
	</catalog>

״ṹʾΪ::

	Python
		
	Zope

ȼ򵥽һXML֪ʶѾ֪˿ȥ

1. XMLĵı

   XMLĵıΪutf-8㿴ġԡʵUTF-8롣XMLĵĴжʹUTF-8еģ
   ˣ㲻дencodingĻΪļUTF-8ġPythonУֱֻּ֧룬ǳõ
   GB2312Ͳ֧֣˽ڴXMLʱʹUTF-8롣

2. XMLĵĽṹ

   XMLĵXMLͷϢXMLϢ塣ͷϢ::

	<?xml version="1.0" encoding="utf-8"?>

   ˴XMLĵõİ汾뷽ʽЩӵĻһЩĵ͵Ķ(DOCTYPE)ڶXMLĵ
   DTDSchemaһЩʵĶ塣ﲢûõҲרңͲϸ˵ˡ

   XMLϢ״ԪɡÿXMLĵһĵԪأҲĸԪأԪغݶڸԪС

3. DOM

   DOMDocument Object ModelļƣԶʾһXMLĵķʹĺôԷǳ
   ڶнб

4. Ԫغͽ

   ԪؾǱǣǳɶԳֵġXMLĵԪɵģԪԪ֮ıԪصҲı
   minidomĽ㣬ԪҲڽһ֣Ҷӽ㣬ӽ㣻һЩҶӽ㣬
   㣬治ӽ㡣

   catalog.xmlУĵԪcatalogԪأmaxiditemmaxidʾǰitemidֵ
   ÿһitemһidԣidΨһģ NewEdit ÿӦĴƬεXMLĵ˲ظһֵitemԪһcaptionԪأʾ˷ƣ԰itemԪءͶһ״XMLṹǿһǶ


1.1 õdom

::

	>>> import xml.dom.minidom
	>>> dom = xml.dom.minidom.parse('d:/catalog.xml')

ǵõһdomĵһԪӦcatalog

1.2 õĵԪض

::

	>>> root = dom.documentElement

ǵõ˸Ԫ(catalog)

1.3 

ÿһ㶼nodeNamenodeValuenodeTypeԡnodeNameΪ֡

::

	>>> root.nodeName
	u'catalog'

nodeValueǽֵֻıЧnodeTypeǽͣ¼::

	'ATTRIBUTE_NODE'
	'CDATA_SECTION_NODE'
	'COMMENT_NODE'
	'DOCUMENT_FRAGMENT_NODE'
	'DOCUMENT_NODE'
	'DOCUMENT_TYPE_NODE'
	'ELEMENT_NODE'
	'ENTITY_NODE'
	'ENTITY_REFERENCE_NODE'
	'NOTATION_NODE'
	'PROCESSING_INSTRUCTION_NODE'
	'TEXT_NODE'

Щֺͨܺ⡣catalogELEMENT_NODE͡

::

	>>> root.nodeType
	1
	>>> root.ELEMENT_NODE
	1

1.4 Ԫءӽķ

Ԫءӽķ֪ܶ࣬ԪֵԪأʹgetElementsByTagNameȡmaxidԪ::

	>>> root.getElementsByTagName('maxid')
	[<DOM Element: maxid at 0xb6d0a8>]

һбǵmaxidֻһбҲֻһ

õĳԪµӽ(Ԫ)ʹchildNodes::

	>>> root.childNodes
	[<DOM Text node "\n    ">, <DOM Element: maxid at 0xb6d0a8>, 
	<DOM Text node "\n    ">, <DOM Element: item at 0xb6d918>, 
	<DOM Text node "\n    ">, <DOM Element: item at 0xb6de40>, 
	<DOM Text node "\n    ">, <DOM Element: item at 0xb6dfa8>, 
	<DOM Text node "\n">]

ԿǼݶΪı㡣ÿкĻسı㡣ĽǿԿÿ
ͣıԪؽ㣻֣Ԫؽ㣩ֵı㣩ÿ㶼һ
ͬĽвͬԺͷϸҪμĵڱȽϼ򵥣ֻ漰ıԪؽ㡣

getElementsByTagNameǰԪصԪأвεԪءchildNodesֻ˵ǰԪصĵһӽ㡣

ǿԱchildNodesÿһ㣬жnodeTypeõͬݡ磬ӡԪص::

	>>> for node in root.childNodes:
		if node.nodeType == node.ELEMENT_NODE:
			print node.nodeName
			
	maxid
	item
	item

ı㣬õıݿʹ: .dataԡ

ڼ򵥵Ԫأ磺<caption>Python</caption>ǿԱдһõݣΪPython

::

	def getTagText(root, tag):
		node = root.getElementsByTagName(tag)[0]
		rc = ""
		for node in node.childNodes:
			if node.nodeType in ( node.TEXT_NODE, node.CDATA_SECTION_NODE):
				rc = rc + node.data
		return rc

ֻҵĵһϵԪءὫϵĵһԪеıƴһ𡣵nodeTypeΪı
ʱnode.dataΪıݡǿһԪcaptionǿܿ::

	[<DOM Text node "Python">]

˵captionԪֻһı㡣

һԪԣôʹgetAttribute::

	>>> itemlist = root.getElementsByTagName('item')
	>>> item = itemlist[0]
	>>> item.getAttribute('id')
	u'1'

͵õ˵һitemԪصֵ

Ǽ򵥵СһʹminidomȡXMLеϢ

1. xml.dom.minidomģ飬dom
2. õĵ󣨸
3. ͨgetElementsByTagName()childNodesԣһЩԣҵҪԪ
4. ȡԪı

XMLд
-------------
 
ʾһδ޵catalog.xmlһXMLļ

2.1 dom

::

	>>> import xml.dom.minidom
	>>> impl = xml.dom.minidom.getDOMImplementation()
	>>> dom = impl.createDocument(None, 'catalog', None)

һյdomcatalogΪĵԪԪ

2.2 ʾɵXML

ÿһdom󣨰domXMLݵķ磺toxml(), toprettyxml()

toxml()ոʽXMLı::

	<catalog><item>test</item><item>test</item></catalog>

toprettyxml()XMLı::

	<catalog>
		<item>
			test
		</item>
		<item>
			test
		</item>
	</catalog>

Կǽÿ涼˻سԶÿһԪأԪֻıݣϣ
Ԫصtagıһģ::

	<item>test</item>

Ƿֿĸʽminidomǲ֧Ĵʵ::

	<catalog>
		<item>test</item>
		<item>test</item>
	</catalog>

XMLʽ˵

2.3 ɸֽ

domӵиɽķгı㣬CDATAԪؽɹ̡

  1. ı

     ::

		>>> text=dom.createTextNode('test')
		test

     Ҫעǣɽʱminidomıַм飬ı'<','&'ַ֮Ӧת
     ΪӦʵ'&lt;','&amp;'ſԣû

  2. CDATA

     ::

		>>> data = dom.createCDATASection('aaaaaa\nbbbbbb')
		>>> data.toxml()
		'<![CDATA[aaaaaa\nbbbbbb]]>'

     CDATAڰıͬʱԲת'<','&'ַıǣ<![CDATA[ı]]>ġı
     "]]>"ĴڡɽʱminidomЩ飬ֻеʱпܷд

  3. Ԫؽ

     ::

		>>> item = dom.createElement('caption')
		>>> item.toxml()
		'<caption/>'

     ԪĽ㣬ɵԪؽʵһԪأκıҪıԪأ
     ҪʹappendChild()insertBefore()֮ķӽӾ͵ԪؽС罫ɵtext뵽captionԪؽ::

		>>> item.appendChild(text)
		<DOM Text node "test">
		>>> item.toxml()
		'<caption>test</caption>'

     ʹԪضsetAttribute()Ԫмԣ::

		>>> item.setAttribute('id', 'idvalue')
		>>> item.toxml()
		'<caption id="idvalue">test</caption>'

2.4 dom

dom֪ɸֽ㣬Ҷӽ㣨Ľ㣬ı㣩ͷҶӽ㣨
Ľ㣬Ԫؽ㣩ɣȻҪýappendChild()insertBefore()
еλһҪĵϣϡһʾΪ::

	>>> import xml.dom.minidom
	>>> impl = xml.dom.minidom.getDOMImplementation()
	>>> dom = impl.createDocument(None, 'catalog', None)
	>>> root = dom.documentElement
	>>> item = dom.createElement('item')
	>>> text = dom.createTextNode('test')
	>>> item.appendChild(text)
	<DOM Text node "test">
	>>> root.appendChild(item)
	<DOM Element: item at 0xb9cf80>
	>>> print root.toxml()
	<catalog><item>test</item></catalog>

2.5 Ԫؽĺ

дһСڼ򵥵::

	<caption>test</caption>

::

	<item><![CDATA[test]]></item>

Ԫؽ

::

	1       def makeEasyTag(dom, tagname, value, type='text'):
	2           tag = dom.createElement(tagname)
	3           if value.find(']]>') > -1:
	4               type = 'text'
	5           if type == 'text':
	6               value = value.replace('&', '&amp;')
	7               value = value.replace('<', '&lt;')
	8               text = dom.createTextNode(value)
	9           elif type == 'cdata':
	10              text = dom.createCDATASection(value)
	11          tag.appendChild(text)
	12          return tag

˵

	dom
	  Ϊdom 
	
	tagname
	  ΪҪԪص֣'item' 
	
	value
	  ΪıݣΪ 
	
	type
	  Ϊıĸʽ'text'ΪһText㣬'cdata'ΪCDATA

˵

	* ȴԪؽ 
	* ıǷ']]>'ҵıֻText 
	* Ϊ'text'ıе'<'滻Ϊ'&lt;''&'滻Ϊ'&amp;'ı 
	* Ϊ'cdata'CDATA 
	* ɵı׷ӵԪؽ
	
	СԶشַתCDATAг']]>'

'item'ԸΪ::

	>>> item = makeEasyTag(dom, 'item', 'test')
	>>> item.toxml()
	'<item>test</item>'

2.6 д뵽XMLļ

domѾɺˣǿԵdomwritexml()дļСwritexml()﷨ʽΪ::

	writexml(writer, indent, addindent, newl, encoding)

* writer ļ 
* indent ÿtagǰַ磺'  'ʾÿtagǰո 
* addindent ÿӽַ 
* newl ÿtagַ磺'\n'ʾÿtagһس 
* encoding ɵXMLϢͷеencodingֵʱminidomбĴ㱣ıк֣Ҫѽбת
* writexml ǳwriterҪ⣬ʡԡ

һıкֵʾ::

	1       >>> import xml.dom.minidom
	2       >>> impl = xml.dom.minidom.getDOMImplementation()
	3       >>> dom = impl.createDocument(None, 'catalog', None)
	4       >>> root = dom.documentElement
	5       >>> text = unicode('ʾ', 'cp936')
	6       >>> item = makeEasyTag(dom, 'item', text)
	7       >>> root.appendChild(item)
	8       <DOM Element: item at 0xb9ceb8>
	9       >>> root.toxml()
	10      u'<catalog><item>\u6c49\u5b57\u793a\u4f8b</item></catalog>'
	11      >>> f=file('d:/test.xml', 'w')
	12      >>> import codecs
	13      >>> writer = codecs.lookup('utf-8')[3](f)
	14      >>> dom.writexml(writer, encoding='utf-8')
	15      >>> writer.close()

5 ΪXMLʱڲʹUnicode룬ҪתUnicode㲻һminicode飬
ʱܲȡʱܻ
12-13 UTF-8дڱʱԶUnicodeתUTF-8롣

дXMLļˡ


XML 
-------------

domwritexml()ȻԿһЩʽϵ⡣ʵ::

	<catalog>
		<item>test</item>
		<item>test</item>
	</catalog>

::

	<catalog>
		<item>
			test
		</item>
		<item>
			test
		</item>
	</catalog>

޷ԭıǷпհףһֽ򲻴һ⡣wxPythonԴ
XMLԴ༭(xred)Ĵ롣::

	1       def Indent(dom, node, indent = 0):
	2           # Copy child list because it will change soon
	3           children = node.childNodes[:]
	4           # Main node doesn't need to be indented
	5           if indent:
	6               text = dom.createTextNode('\n' + '\t' * indent)
	7               node.parentNode.insertBefore(text, node)
	8           if children:
	9               # Append newline after last child, except for text nodes
	10              if children[-1].nodeType == node.ELEMENT_NODE:
	11                  text = dom.createTextNode('\n' + '\t' * indent)
	12                  node.appendChild(text)
	13              # Indent children which are elements
	14              for n in children:
	15                  if n.nodeType == node.ELEMENT_NODE:
	16                      Indent(dom, n, indent + 1)

˵

	dom
	  Ϊdom
	
	node
	  ΪҪԪؽ
	
	indent
	  ָĲ

˵

Indentһݹ麯һԪʱеݹ鴦ҪǽԪصĻкĴд
ģÿһʹһƱԸԸΪҪݡǰѺе'\t'һ¡ɴдһ
ȫֱԺҪ׵Ķࡣ NewEdit УĴ㹻ˣûЩ

Indent뷨ǵݹӽ㣬ҪسĵطӦı㡣ʹ
writexml()ʱ˵ġϸ˵ֱþˡ

Ҫעǣ

Indent()Ҫ޸ԭdomڵ֮ǰȸһʱdomʹϺʱdom󼴿ɡ
ϸĵù::

	1       domcopy = dom.cloneNode(True)
	2       Indent(domcopy, domcopy.documentElement)
	3       f = file(xmlfile, 'wb')
	4       writer = codecs.lookup('utf-8')[3](f)
	5       domcopy.writexml(writer, encoding = 'utf-8')
	6       domcopy.unlink()

1 
  ¡һdom
2 
  
3-4 
  UTF-8봦
5 
  XMLļ
6 
  dom

֮ⷬXMLĵӦúÿˡ


`[]`_

.. _`[]`: technical.htm