Size: 1131
Comment:
|
Size: 1161
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 10: | Line 10: |
Line 11: | Line 12: |
Line 16: | Line 18: |
Line 17: | Line 20: |
Line 18: | Line 22: |
Line 22: | Line 27: |
Line 23: | Line 29: |
Line 24: | Line 31: |
Line 25: | Line 33: |
Line 26: | Line 35: |
Line 27: | Line 37: |
Line 28: | Line 39: |
Line 29: | Line 41: |
Line 30: | Line 43: |
Line 31: | Line 45: |
Stem类
先从简单的开始,Stem类的功能比较单一,比较容易搞明白。
其功能是对一些词进行处理,比如:把names 变成name, 把quickly变成quick,把waitting变成wait等等,也就是可以处理词相似。
Stem的初始化需要一个参数,参数的值有以下几种可选:
>>> xapian.Stem.get_available_languages()
'none danish dutch english finnish french german italian norwegian portuguese russian spanish swedish english_lovins english_porter'
>>>
'none'表示不做处理,其余的各代表一种语言。 我在omega的代码中看到,一般默认都是只处理英文单词,也就是这样: >>> stem=xapian.Stem('english')
>>>
---
Stem只有两个实例方法,一个是stem_word(),一个是call() 两个方法的用法和作用是完全一样的,stem_word()只是为了兼容以前的版本。 call()只接收一个字符串做参数,返回一个字符串。 >>> import xapian
>>> stem=xapian.Stem('english')
>>> stem('webs')
'web'
>>> stem('quickly')
'quick'
>>> stem('setting')
'set'
>>> stem('hello')
'hello'
>>>