##language:zh
'''
文章来自《Python cookbook》. 

翻译仅仅是为了个人学习,其它商业版权纠纷与此无关!
'''

-- 0.706 [<<DateTime(2004-09-26T17:12:42Z)>>]
<<TableOfContents>>
= Converting Between Different Naming Conventions 在不同的命名约定之间转换 =
Credit: Sami Hangaslammi

== 问题 Problem ==
You have a body of code whose identifiers use one of the common naming conventions to represent multiple words in a single identifier (CapitalizedWords, mixedCase, or under_scores), and you need to convert the code to another naming convention in order to merge it smoothly with other code.

你有一段代码,其中使用某种常见的命名约定表示一个多词标识符(首符大写形式CapitalizedWords，大小写混合形式 mixedCase 或下划线连接形式 under_scores) ，为了能平滑地与其他的代码合并 , 你需要将代码转换成另外的命名约定。

== 解决 Solution ==
re.sub covers the two hard cases, converting underscore to and from the others:

re.sub 包含两种很难(理解)的情形, 将'下划线连接形式'(underscore)转换成其它形式和从其它形式转换成'下划线连接形式'(underscore):

{{{
#!python
import re

def cw2us(x): # 首符大写形式 to 下划线连接形式
    return re.sub(r'(?<=[a-z])[A-Z]|(?<!^)[A-Z](?=[a-z])',
        r"_\g<0>", x).lower(  )

def us2mc(x): # 下划线连接形式 to 大小写混合形式
    return re.sub(r'_([a-z])', lambda m: (m.group(1).upper(  )), x)
}}}


Mixed-case to underscore is just like capwords to underscore (the case-lowering of the first character becomes redundant, but it does no harm):

'大小写混合形式'到'下划线连接形式'的转换,正类似于'首符大写形式'到'下划线连接形式':(变第一个字符为小写成为多余，但是它没有害处)

 
{{{
#!python
def mc2us(x): # mixed-case to underscore notation
    return cw2us(x)

}}}

Underscore to capwords can similarly exploit the underscore to mixed-case conversion, but it needs an extra twist to uppercase the start:

'下划线连接形式'到'首符大写形式' 能同样地使用对'下划线连接形式'到'大小写混合形式'的转换,但是它需要额外的把开头变为大写字母:
{{{
#!python
def us2cw(x): # underscore to capwords notation
    s = us2mc(x)
    return s[0].upper(  )+s[1:]

}}}


Conversion between mixed-case and capwords is, of course, just an issue of lowercasing or uppercasing the first character, as appropriate:

在'大小写混合形式'和'首符大写形式'之间转换, 当然只是适当的用小写字母或大写字母写第一个字符的问题:

{{{
#!python
def mc2cw(x): # mixed-case to capwords
    return s[0].lower(  )+s[1:]

def cw2mc(x): # capwords to mixed-case
    return s[0].upper(  )+s[1:]
}}}
== 讨论 Discussion ==

Here are some usage examples:

一些用法例子在这里:

{{{

>>> cw2us("PrintHTML")
'print_html'
>>> cw2us("IOError")
'io_error'
>>> cw2us("SetXYPosition")
'set_xy_position'
>>> cw2us("GetX")
'get_x'

}}}

The set of functions in this recipe is useful, and very practical, if you need to homogenize naming styles in a bunch of code, but the approach may be a bit obscure.In the interest of clarity, you might want to adopt a conceptual stance that is general and fruitful.In other words, to convert a bunch of formats into each other, find a neutral format and write conversions from each of the N formats into the neutral one and back again.This means having 2N conversion functions rather than N x (N-1)梐 big win for large N梑ut the point here (in which N is only three) is really one of clarity.

如果你需要一致化一组代码的命名风格,这一份配方中的函数是有用且非常实际的.但是方式可能有一点晦涩。(如果有)追求清楚的兴趣，你可能想要采用在概念上(更加)一般的和有成效的方式。 换句话说,为了在一组格式间彼此转换,找一个中立的格式,并且写出从N个格式中的每一个到中立者以及相反的转换。这意谓着有2N个转换函数，而不是 N*(N-1)个,这与这里相比(只有三种格式),在N很大时的优势确实很明显。



Clearly, the underlying neutral format that each identifier style is encoding is a list of words.Let's say, for definiteness and without loss of generality, that they are lowercase words:

显然，隐藏在每种标识符风格下面的中立格式就是一个单词的列表。让我们说, 为了明确且不失一般性,他们都是小写字母组成的词:

{{{
#!python
 
import string, re
def anytolw(x):  # any format of identifier to list of lowercased words

    # First, see if there are underscores:
    lw = string.split(x,'_')
    if len(lw)>1: return map(string.lower, lw)

    # No. Then uppercase letters are the splitters:
    pieces = re.split('([A-Z])', x)

    # Ensure first word follows the same rules as the others:
    if pieces[0]: pieces = [''] + pieces
    else: pieces = pieces[1:]

    # Join two by two, lowercasing the splitters as you go
    return [pieces[i].lower(  )+pieces[i+1] for i in range(0,len(pieces),2)]
}}}

There's no need to specify the format, since it's self-describing.Conversely, when translating from our internal form to an output format, we do need to specify the format we want, but on the other hand, the functions are very simple:

没有需要去指明(参数的)格式, 因为它是自我描述的。相反地，当从我们的内在形式翻译到一个输出格式的时候，我们确实需要叙述我们想要的格式，但是另一方面，这些函数却是在非常简单的:
 
{{{
#!python
def lwtous(x): return '_'.join(x)
def lwtocw(x): return ''.join(map(string.capitalize,x))
def lwtomc(x): return x[0]+''.join(map(string.capitalize,x[1:]))
}}}

Any other combination is a simple issue of functional composition:

任何其他的组合是简单的把(上面的)函数合成起来的结果:

{{{
#!python
def anytous(x): return lwtous(anytolw(x))
cwtous = mctous = anytous
def anytocw(x): return lwtocw(anytolw(x))
ustocw = mctocw = anytocw
def anytomc(x): return lwtomc(anytolw(x))
cwtomc = ustomc = anytomc
}}}

The specialized approach is slimmer and faster, but this generalized stance may ease understanding as well as offering wider application.

这种特殊的方式更加简短且快速，而且这种一般化的态度易于理解且能推广到很多应用上。

== 参考 See Also ==
The Library Reference sections on the re and string modules.