1. 国际化

1.1. locale 模块

locale 模块提供了 C 本地化( localization )函数的接口, 如 [#eg-8-1 Example 8-1] 所示. 同时提供相关函数, 实现基于当前 locale 设置的数字, 字符串转换. (而 int , float , 以及 string 模块中的相关转换函数不受 locale 设置的影响.)

====Example 8-1. 使用 locale 模块格式化数据=====[eg-8-1]

File: locale-example-1.py

import locale

print "locale", "=>", locale.setlocale(locale.LC_ALL, "")

# integer formatting
value = 4711
print locale.format("%d", value, 1), "==",
print locale.atoi(locale.format("%d", value, 1))

# floating point
value = 47.11
print locale.format("%f", value, 1), "==",
print locale.atof(locale.format("%f", value, 1))

info = locale.localeconv()
print info["int_curr_symbol"]

*B*locale => Swedish_Sweden.1252
4,711 == 4711
47,110000 == 47.11
SEK*b*

[#eg-8-2 Example 8-2] 展示了如何使用 locale 模块获得当前平台 locale 设置.

1.1.0.1. Example 8-2. 使用 locale 模块获得当前平台 locale 设置

File: locale-example-2.py

import locale

language, encoding = locale.getdefaultlocale()

print "language", language
print "encoding", encoding

*B*language sv_SE
encoding cp1252*b*

1.2. unicodedata 模块

( 2.0 中新增) unicodedata 模块包含了 Unicode 字符的属性, 例如字符类别, 分解数据, 以及数值. 如 [#eg-8-3 Example 8-3] 所示.

1.2.0.1. Example 8-3. 使用 unicodedata 模块

File: unicodedata-example-1.py

import unicodedata

for char in [u"A", u"-", u"1", u"\N{LATIN CAPITAL LETTER O WITH DIAERESIS}"]:
    print repr(char),
    print unicodedata.category(char),
    print repr(unicodedata.decomposition(char)),

    print unicodedata.decimal(char, None),
    print unicodedata.numeric(char, None)

*B*u'A' Lu '' None None
u'-' Pd '' None None
u'1' Nd '' 1 1.0
u'\303\226' Lu '004F 0308' None None*b*

在 Python 2.0 中缺少 CJK 象形文字和韩语音节的属性. 这影响到了 0x3400-0x4DB5 , 0x4E00-0x9FA5 , 以及 0xAC00-D7A3 中的字符, 不过每个区间内的第一个字符属性是正确的, 我们可以把字符映射到起始实现正常操作:

def remap(char):
    # fix for broken unicode property database in Python 2.0
    c = ord(char)
    if 0x3400 <= c <= 0x4DB5:
        return unichr(0x3400)
    if 0x4E00 <= c <= 0x9FA5:
        return unichr(0x4E00)
    if 0xAC00 <= c <= 0xD7A3:
        return unichr(0xAC00)
    return char

Python 2.1 修复了这个 bug .

1.3. ucnhash 模块

(仅适用于 2.0 ) ucnhash 模块为一些 Unicode 字符代码提供了特定的命名. 你可以直接使用 \N{} 转义符将 Unicode 字符名称映射到字符代码上. 如 [#eg-8-4 Example 8-4] 所示.

1.3.0.1. Example 8-4. 使用 ucnhash 模块

File: ucnhash-example-1.py

# Python imports this module automatically, when it sees
# the first \N{} escape
# import ucnhash

print repr(u"\N{FROWN}")
print repr(u"\N{SMILE}")
print repr(u"\N{SKULL AND CROSSBONES}")

*B*u'\u2322'
u'\u2323'
u'\u2620'*b*