##在这里详述 MiscItems/2010-11-05。 = 分析不标准json数据 = {{{ Leo Jay 发件人当地时间 发送时间 17:14 (GMT+08:00)。发送地当前时间:下午5:28。 ✆ 回复 python-cn@googlegroups.com 发送至 python-cn@googlegroups.com 日期 2010年11月5日 下午5:14 主题 Re: [CPyUG] python分析json数据 }}} == PLY自制解析 == {{{ 2010/11/3 月色狼影 : > python只能分析标准的json数据. {'foo" : "bar"} > 但是我采集数据的这个站点 所有的数据都不是标准化的 也就是说 所有的key都没有双引号, 而value都是单引号 > 我该如何处理 > 比如: > {type: 7, typeId: 495, name: 'Howling Fjord'} > 我该如何在每个key上加上引号 以及在value上把单引号换成双引号 > }}} 没有现成的,那就自己写词法分析吧: {{{#!python from ply import lex, yacc import json tokens = ('WORD', 'STRING') literals = '{},:' t_WORD = r'\w+' t_ignore = ' \t\n\r' BACKSLASH = dict(json.decoder.BACKSLASH) BACKSLASH["'"] = u"'" def t_STRING(t): r"'(\\'|[^'])*?'" t.value = json.decoder.py_scanstring('%s"' % t.value[1:-1], 0, None, True, BACKSLASH)[0] return t def t_error(t): print 'Illegal character "%s"' % t.value[0] def p_dic(p): '''dic : '{' key_value_pairs '}' ''' p[0] = p[2] def p_key_value_pairs(p): '''key_value_pairs : key ':' value | key ':' value ',' key_value_pairs ''' if p[0] is None: p[0] = {} if len(p) == 6: p[0].update(p[5]) p[0][p[1]] = p[3] def p_key_and_value(p): '''value : dic | WORD | STRING key : WORD ''' p[0] = p[1] def p_error(p): print 'Syntax error at "%s"' % p.value lexer = lex.lex() parser = yacc.yacc() result = parser.parse(r"{roles:0,joined:'2010/06/25 06:55:27',posts:34,avatar:1,avatarmore:'inv_drink_32_disgustingrotgut',sig:'Wise man said: just walk this way, to the dawn of the light. \nWind will blow into your face, as the years pass you by.\nHear this voice, from deep inside. It\'s the call of your heart.'}") print json.dumps(result) }}} 以上程序在python2.6下运行结果是: {{{ {"roles": "0", "posts": "34", "joined": "2010/06/25 06:55:27", "avatarmore": "inv_drink_32_disgustingrotgut", "sig": "Wise man said: just walk this way, to the dawn of the light. \nWind will blow into your face, as the years pass you by.\nHear this voice, from deep inside. It's the call of your heart.", "avatar": "1"} }}} 你要先安装PLY * http://www.dabeaz.com/ply/