Differences between revisions 2 and 3

txt 2 html

题面

free won <[email protected]>
reply-to        [email protected]
to      [email protected]
date    Fri, Jul 11, 2008 at 19:12
subject [CPyUG:58809] 文本转换html问题

最近在完成一个　类似论坛的站点。by django.

出现了一个问题就是在回帖的时候，把回帖的内容转换成html格式发布出来。

目前尝试了些正则方式，都不太理想。

希望有经验的朋友能给个方向。

limodou

   1 #coding=utf-8
   2 import re
   3 import cgi
   4 
   5 re_string = re.compile(r'(?P<htmlchars>[<&>])|(?P<space>^[
   6 \t]+)|(?P<lineend>\r\n|\r|\n)|(?P<protocal>(^|\s*)((http|ftp)://.*?))(\s|$)',
   7 re.S|re.M|re.I)
   8 def text2html(text, tabstop=4):
   9    def do_sub(m):
  10        c = m.groupdict()
  11        if c['htmlchars']:
  12            return cgi.escape(c['htmlchars'])
  13        if c['lineend']:
  14            return '<br>'
  15        elif c['space']:
  16            t = m.group().replace('\t', '&nbsp;'*tabstop)
  17            t = t.replace(' ', '&nbsp;')
  18            return t
  19        elif c['space'] == '\t':
  20            return ' '*tabstop;
  21        else:
  22            url = m.group('protocal')
  23            if url.startswith(' '):
  24                prefix = ' '
  25                url = url[1:]
  26            else:
  27                prefix = ''
  28            last = m.groups()[-1]
  29            if last in ['\n', '\r', '\r\n']:
  30                last = '<br>'
  31            return '%s<a href="%s">%s</a>%s' % (prefix, url, url, last)
  32    return re.sub(re_string, do_sub, text)
  33 
  34 if __name__ == '__main__':
  35    text="""
  36  http://groups.google.com/group/python-cn/pending
  37 """
  38    print text2html(text)

yrh

<[email protected]>

我原先做个一个，不过不是for django的，你看看吧：

   1 def htmlEncode(strings):
   2     strings = strings.replace("\'", "&#39;")
   3     strings = strings.replace("\\", "&#92;")
   4     strings = strings.replace("\.", "&#46;")
   5     strings = strings.replace("\|", "&#124;")
   6 
   7     #strings = strings.replace('<br>', '')
   8     strings = strings.replace("  ", "　")
   9     strings = strings.replace("<", "&#60;")
  10     #大写的html标签全部被替换，小写的<strong>、<center>、<font>、<img>标签不会被替换
  11     strings = re.sub(re.compile('&#60;(?P<xg>\/?)(?P<bq>strong|center|font|img |pre)'), '<\g<xg>\g<bq>', strings)
  12     strings = re.sub('\n', '\n<br>', strings)
  13     return strings

这个函数是我自己写的一个论坛的函数，允许小写的<strong>、<center>、<font>、<img>标签，其他html标签一律转换为文本。替换斜杠，英文单引号、句号，竖线的目的是因为某些数据库存储这些字符的时候会出错，具体哪些字符要替换，你参考数据库手册好了。

如果你要更多功能的，建议你参考论坛处理ubb标签的方式，你google一下好了（比如动网论坛，不过大部分是asp的，python的很少）

反馈

创建 by -- ZoomQuiet [DateTime(2008-07-11T14:11:59Z)]

PageComment2

[:/PageCommentData:PageCommentData]

-  ⇤ ← Revision 2 as of 2008-07-11 03:32:02 → 
  Size: 0
  Editor: ZoomQuiet
  Comment: error idx
+   ← Revision 3 as of 2008-07-11 14:12:00 → ⇥
  Size: 3305
  Editor: ZoomQuiet
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
+##language:zh
#pragma section-numbers off
##含有章节索引导航的 ZPyUG 文章通用模板
[[TableOfContents]]
## 默许导航,请保留
[[Include(ZPyUGnav)]]


= txt 2 html =


##startInc
== 题面 ==
{{{
free won <[email protected]>
reply-to	[email protected]
to	[email protected]
date	Fri, Jul 11, 2008 at 19:12
subject	[CPyUG:58809] 文本转换html问题
}}}
最近在完成一个　类似论坛的站点。by django.

出现了一个问题就是在回帖的时候，把回帖的内容转换成html格式发布出来。

目前尝试了些正则方式，都不太理想。

希望有经验的朋友能给个方向。

== limodou ==
{{{#!python
#coding=utf-8
import re
import cgi

re_string = re.compile(r'(?P<htmlchars>[<&>])|(?P<space>^[
\t]+)|(?P<lineend>\r\n|\r|\n)|(?P<protocal>(^|\s*)((http|ftp)://.*?))(\s|$)',
re.S|re.M|re.I)
def text2html(text, tabstop=4):
   def do_sub(m):
       c = m.groupdict()
       if c['htmlchars']:
           return cgi.escape(c['htmlchars'])
       if c['lineend']:
           return '<br>'
       elif c['space']:
           t = m.group().replace('\t', '&nbsp;'*tabstop)
           t = t.replace(' ', '&nbsp;')
           return t
       elif c['space'] == '\t':
           return ' '*tabstop;
       else:
           url = m.group('protocal')
           if url.startswith(' '):
               prefix = ' '
               url = url[1:]
           else:
               prefix = ''
           last = m.groups()[-1]
           if last in ['\n', '\r', '\r\n']:
               last = '<br>'
           return '%s<a href="%s">%s</a>%s' % (prefix, url, url, last)
   return re.sub(re_string, do_sub, text)

if __name__ == '__main__':
   text="""
 http://groups.google.com/group/python-cn/pending
"""
   print text2html(text)

}}}


== yrh ==
`<[email protected]>`

我原先做个一个，不过不是for  django的，你看看吧：

{{{#!python
def htmlEncode(strings):
    strings = strings.replace("\'", "&#39;")
    strings = strings.replace("\\", "&#92;")
    strings = strings.replace("\.", "&#46;")
    strings = strings.replace("\|", "&#124;")

    #strings = strings.replace('<br>', '')
    strings = strings.replace("  ", "　")
    strings = strings.replace("<", "&#60;")
    #大写的html标签全部被替换，小写的<strong>、<center>、<font>、<img>标签不会被替换
    strings = re.sub(re.compile('&#60;(?P<xg>\/?)(?P<bq>strong|center|font|img |pre)'), '<\g<xg>\g<bq>', strings)
    strings = re.sub('\n', '\n<br>', strings)
    return strings
}}}
这个函数是我自己写的一个论坛的函数，允许 小写的<strong>、<center>、<font>、<img>标签，其他html标签一律转换为文本。
替换斜杠，英文单引号、句号，竖线的目的是因为某些数据库存储这些字符的时候会出错，具体哪些字符要替换，你参考数据库手册好了。

如果你要更多功能的，建议你参考论坛处理ubb标签的方式，你google一下好了（比如动网论坛，不过大部分是asp的，python的很少）

##endInc

----
'''反馈'''

创建 by -- ZoomQuiet [[[DateTime(2008-07-11T14:11:59Z)]]]
||<^>[[PageComment2]]||<^>[:/PageCommentData:PageCommentData]''||

Diff for "MiscItems/2008-07-11"

txt 2 html

题面

limodou

yrh