描述

文件中查找替换字符串

问题

替换文件中特定字符串

解决

使用string的replace方法可以简单实现字符串替换操作。从指定文件(或标准输入)读入内容，进行处理，然后写入指定文件(或标准输出)，工作就完成了。

   1 import os, sys
   2 
   3 nargs = len(sys.argv)
   4 
   5 if not 3 <= nargs <= 5:
   6     print "usage: %s search_text replace_text [infile [outfile]]" % \
   7         os.path.basename(sys.argv[0])
   8 else:
   9     stext = sys.argv[1]
  10     rtext = sys.argv[2]                                    
  11     input = sys.stdin                                     #标准输入  
  12     output = sys.stdout                                   #标准输出 
  13     if nargs > 3:
  14         input = open(sys.argv[3])                         #数据源文件 
  15     if nargs > 4:
  16         output = open(sys.argv[4], 'w')                   #数据目的文件  
  17 
  18     for s in input.xreadlines(  ):                        #读入文件的每一行
  19         output.write(s.replace(stext, rtext))             #处理，并写入目的文件 
  20     output.close(  )
  21     input.close(  )

讨论 Discussion

处理方法很简单，这也是赞的地方。杀鸡不用牛刀。上面的"漂亮"代码是可以执行的脚本。脚本通过查询参数，确定被替换的字符串，替换字符串，以及输入文件(默认是标准输入)，输出文件(默认是标准输出)。然后循环处理从输入文件读出的每一行，替换字符串，写入输出文件。就这么简单！代码很漂亮，是吧？还有，脚本最后关闭了两个文件。

如果有足够的内存可以缓存原来的字符串和替换处理后的字符串(字符串是不变对象)，那么有办法进行高效处理。可以不用循环，而是一次将原文件内容读入到临时字符串，在等同于整个文件内容的这个字符串上执行replace操作, 然后写入输出文件。现在典型的PC机有256M内存，处理100M大小的文件应该没有问题，用下面的语句替代循环所需要的内存足够了

   1 output.write(input.read(  ).replace(stext, rtext))

这一条语句较之上面的循环处理又简单一些。

如果还在使用Python的早期版本，比如1.5.2 ,方法如下：

修改import 语句为：

   1         import os, sys, string

修改最后两条语句为：

   1 for s in input.readlines(  ):
   2     output.write(string.replace(s, stext, rtext))

文件的xreadlines方法是Python 2.1中引入的。它采取措施避免一次将整个文件内容全部读入内存。readlines 方法必须将文件内容一次读入内存，在处理大文件时可能有麻烦。

Python 2.2中，更直观的代码如下：

   1 for s in input:
   2     output.write(s.replace(stext, rtext))

这是最快最简单的方法.

#译注 : file.xreadlines() is deprecated since Python 2.3, use the following instead

   1            for line in file:
   2                process(line)

#end 译注

参考 See Also

Python 文档内置函数open和file 对象部分

-  ⇤ ← Revision 1 as of 2004-09-20 19:38:11 → 
  Size: 1450
  Editor: 61
  Comment:
+   ← Revision 12 as of 2009-12-25 07:16:21 → ⇥
  Size: 3489
  Editor: localhost
  Comment: converted to 1.6 markup
-Deletions are marked like this.
+Additions are marked like this.
 Line 8:
--- 61.182.251.99 [[[DateTime(2004-09-20T19:38:11Z)]]]
[[TableOfContents]]
+-- 61.182.251.99 [<<DateTime(2004-09-20T19:38:11Z)>>]
<<TableOfContents>>
 Line 16:
-You need to change one string into another throughout a file.
-Line 20:
+Line 19:
-String substitution is most simply performed by the '''replace''' method of string objects. The work here is to support reading from the specified file (or standard input) and writing to the specified file (or standard output):
-Line 25:
+Line 23:
-#!/usr/bin/env python
+#!python
-Line 35:
+Line 33:
-    rtext = sys.argv[2]
    input = sys.stdin
    output = sys.stdout
+    rtext = sys.argv[2]                                    
    input = sys.stdin                                     #标准输入  
    output = sys.stdout                                   #标准输出
-Line 39:
+Line 37:
-        input = open(sys.argv[3])
+        input = open(sys.argv[3])                         #数据源文件
-Line 41:
+Line 39:
-        output = open(sys.argv[4], 'w')
    for s in input.xreadlines(  ):
        output.write(s.replace(stext, rtext))
+        output = open(sys.argv[4], 'w')                   #数据目的文件  

    for s in input.xreadlines(  ):                        #读入文件的每一行
        output.write(s.replace(stext, rtext))             #处理，并写入目的文件
-Line 46:
+Line 45:
+}}}
 Line 48:
+== 讨论 Discussion ==
处理方法很简单，这也是赞的地方。杀鸡不用牛刀。上面的"漂亮"代码是可以执行的脚本。 脚本通过查询参数，确定被替换的字符串，替换字符串，以及输入文件(默认是标准输入)， 输出文件(默认是标准输出)。然后循环处理从输入文件读出的每一行，替换字符串，写入输出文件。就这么简单！代码很漂亮，是吧？ 还有，脚本最后关闭了两个文件。 

如果有足够的内存可以缓存原来的字符串和替换处理后的字符串(字符串是不变对象)， 那么有办法进行高效处理。
可以不用循环， 而是一次将原文件内容读入到临时字符串， 在等同于整个文件内容的这个字符串上执行'''replace'''操作, 然后写入输出文件。 现在典型的PC机有256M内存，处理100M大小的文件应该没有问题， 用下面的语句替代循环所需要的内存足够了

{{{
#!python
output.write(input.read(  ).replace(stext, rtext))
-Line 49:
+Line 58:
-== 讨论 Discussion ==
-Line 51:
+Line 59:
-...
+这一条语句较之上面的循环处理又简单一些。

如果还在使用Python的早期版本，比如1.5.2 ,方法如下：

修改import 语句为：
{{{
#!python
        import os, sys, string
}}}

修改最后两条语句为：
{{{
#!python
for s in input.readlines(  ):
    output.write(string.replace(s, stext, rtext))
}}}

文件的'''xreadlines'''方法是Python 2.1中引入的。它采取措施避免一次将整个文件内容全部读入内存。'''readlines '''方法必须将文件内容一次读入内存，在处理大文件时可能有麻烦。 


Python 2.2中， 更直观的代码如下： 
{{{
#!python
for s in input:
    output.write(s.replace(stext, rtext))
}}}
这是最快最简单的方法.

#译注 :  file.xreadlines() is deprecated since Python 2.3, use the following instead   
{{{
#!python
           for line in file:
               process(line)  
}}}
#end 译注
-Line 54:
+Line 96:
+         Python 文档内置函数'''open'''和'''file''' 对象部分

Diff for "PyCkBk-4-4"

描述

问题

解决

讨论 Discussion

参考 See Also