Differences between revisions 4 and 7 (spanning 3 versions)
Revision 4 as of 2007-01-26 15:37:54
Size: 2317
Editor: wangzhen
Comment:
Revision 7 as of 2007-01-26 15:41:19
Size: 3328
Editor: wangzhen
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
 ~/helloxapian ~/helloxapian
Line 19: Line 19:
hello.txt文件内容:
Line 20: Line 21:
 . Welcome to the Xapian project website.Xapian is an Open Source Search Engine Library, released under the GPL. It's written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C#, and Ruby (so far!) Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators. If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow.
world.txt文件内容:
Line 21: Line 24:
 hello.txt文件内容:

 Welcome to the Xapian project website.Xapian is an Open Source Search Engine Library, released under the GPL. It's written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C#, and Ruby (so far!) Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators. If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow.

 world.txt文件内容:

The 0.9 branch features a few API changes, the most notable being a rewritten QueryParser which is reentrant, has encapsulated internals, and parses better than the old one. Note that the examples are now a subdirectory of xapian-core, so there is no longer a separate xapian-examples download (most of the size of the xapian-examples download was due to configure and other generated files!)
 . The 0.9 branch features a few API changes, the most notable being a rewritten !QueryParser which is reentrant, has encapsulated internals, and parses better than the old one. Note that the examples are now a subdirectory of xapian-core, so there is no longer a separate xapian-examples download (most of the size of the xapian-examples download was due to configure and other generated files!)
Line 31: Line 27:
 Documentation A number of pieces of documentation are available.We suggest you start by reading the Installation Guide, which covers downloading the code, and unpacking, configuring, building and installing. It then shows how to build the example programs.For a quick introduction to our software, including a walk-through example of an application for searching through some data, read the Quickstart document.The Overview explains the API which Xapian provides to programmers.Much useful documentation is automatically extracted from the source code. Full documentation of the API is available for users. For those wishing to do development work on the Xapian library itself, documentation of the internals is available, and there's a short document outlining the directory structure which is automatically generated from the source code.  . Documentation A number of pieces of documentation are available.We suggest you start by reading the Installation Guide, which covers downloading the code, and unpacking, configuring, building and installing. It then shows how to build the example programs.For a quick introduction to our software, including a walk-through example of an application for searching through some data, read the Quickstart document.The Overview explains the API which Xapian provides to programmers.Much useful documentation is automatically extracted from the source code. Full documentation of the API is available for users. For those wishing to do development work on the Xapian library itself, documentation of the internals is available, and there's a short document outlining the directory structure which is automatically generated from the source code.
indexfiles.py文件:
#!/usr/bin/env python
#coding=utf-8
import sys
import xapian
import string
from os import listdir
import re
rex=re.compile('[a-zA-Z0-9]+')
MAX_TERM_LENGTH = 64
DBPATH='indexdb'
if len(sys.argv) < 2:
print >> sys.stderr, "缺少参数,请提供需要建立索引的目录"
sys.exit(1)
try:
database = xapian.WritableDatabase(DBPATH, xapian.DB_CREATE_OR_OPEN)
stemmer = xapian.Stem("english")
for file in listdir(sys.argv[1]):
if file[-4:]=='.txt':
filename=sys.argv[1] + '/' + file
try:
fr=open(filename,'r')
content=fr.read()
fr.close()
content=string.strip(content)
doc = xapian.Document()
doc.set_data(content)
doc.add_value(0,filename)
doc.add_term(file[:-4])
pos = 0
terms=rex.findall(content)
for term in terms:
if len(term) > MAX_TERM_LENGTH:
term=term[:MAX_TERM_LENGTH]
doc.add_posting(stemmer(term.lower()),pos)
pos += 1
database.add_document(doc)
except:
pass
except Exception, e:
print >> sys.stderr, "Exception: %s" % str(e)
sys.exit(1)

Xapian 初体验之 hello xapian

文件夹结构:

~/helloxapian

~/helloxapian/indexfiles.py

~/helloxapian/search.py

~/helloxapian/test

~/helloxapian/test/hello.txt

~/helloxapian/test/world.txt

~/helloxapian/test/abc.txt

hello.txt文件内容:

  • Welcome to the Xapian project website.Xapian is an Open Source Search Engine Library, released under the GPL. It's written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C#, and Ruby (so far!) Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators. If you're after a packaged search engine for your website, you should take a look at Omega: an application we supply built upon Xapian. Unlike most other website search solutions, Xapian's versatility allows you to extend Omega to meet your needs as they grow.

world.txt文件内容:

  • The 0.9 branch features a few API changes, the most notable being a rewritten QueryParser which is reentrant, has encapsulated internals, and parses better than the old one. Note that the examples are now a subdirectory of xapian-core, so there is no longer a separate xapian-examples download (most of the size of the xapian-examples download was due to configure and other generated files!)

abc.txt文件内容:

  • Documentation A number of pieces of documentation are available.We suggest you start by reading the Installation Guide, which covers downloading the code, and unpacking, configuring, building and installing. It then shows how to build the example programs.For a quick introduction to our software, including a walk-through example of an application for searching through some data, read the Quickstart document.The Overview explains the API which Xapian provides to programmers.Much useful documentation is automatically extracted from the source code. Full documentation of the API is available for users. For those wishing to do development work on the Xapian library itself, documentation of the internals is available, and there's a short document outlining the directory structure which is automatically generated from the source code.

indexfiles.py文件: #!/usr/bin/env python #coding=utf-8 import sys import xapian import string from os import listdir import re rex=re.compile('[a-zA-Z0-9]+') MAX_TERM_LENGTH = 64 DBPATH='indexdb' if len(sys.argv) < 2: print >> sys.stderr, "缺少参数,请提供需要建立索引的目录" sys.exit(1) try: database = xapian.WritableDatabase(DBPATH, xapian.DB_CREATE_OR_OPEN) stemmer = xapian.Stem("english") for file in listdir(sys.argv[1]): if file[-4:]=='.txt': filename=sys.argv[1] + '/' + file try: fr=open(filename,'r') content=fr.read() fr.close() content=string.strip(content) doc = xapian.Document() doc.set_data(content) doc.add_value(0,filename) doc.add_term(file[:-4]) pos = 0 terms=rex.findall(content) for term in terms: if len(term) > MAX_TERM_LENGTH: term=term[:MAX_TERM_LENGTH] doc.add_posting(stemmer(term.lower()),pos) pos += 1 database.add_document(doc) except: pass except Exception, e: print >> sys.stderr, "Exception: %s" % str(e) sys.exit(1)

xapian004 (last edited 2009-12-25 07:15:29 by localhost)