描述

读取文件内容 Credit: Luther Blissett

问题

从文件读取文本或数据

解决

一次将文件内容读入一个长字符串的最简便方法

all_the_text = open('thefile.txt').read(  )    # 文本文件的全部文本
all_the_data = open('abinfile', 'rb').read(  ) # 2进制文件的全部数据

更好的方法是将文件对象和一个变量绑定，可以及时关闭文件。比如，读取文本文件内容：

file_object = open('thefile.txt')              # 打开文件
all_the_text = file_object.read(  )            # 文本文件的全部文本
file_object.close(  )                          # 使用完毕，关闭文件

将文本文件的全部内容按照分行作为一个tuple读出有4种方法：

list_of_all_the_lines = file_object.readlines(  )             # 方法 1
list_of_all_the_lines = file_object.read(  ).splitlines(1)    # 方法 2
list_of_all_the_lines = file_object.read().splitlines(  )     # 方法 3 
list_of_all_the_lines = file_object.read(  ).split('\n')      # 方法 4

方法1、2 返回的list中包含的string 元素末尾有'\n'，方法3、4 返回的list中包含的string元素末尾去掉了‘\n’.

方法1是效率最高的，而且最符合python的风格。

在Python 2.2以及更高版本中，有方法5和方法1等效

list_of_all_the_lines = list(file_object)                     # 方法 5

讨论 Discussion

Unless the file you're reading is truly huge, slurping it all into memory in one gulp is fastest and generally most convenient for any further processing. The built-in function open creates a Python file object. With that object, you call the read method to get all of the contents (whether text or binary) as a single large string. If the contents are text, you may choose to immediately split that string into a list of lines, with the split method or with the specialized splitlines method. Since such splitting is a frequent need, you may also call readlines directly on the file object, for slightly faster and more convenient operation. In Python 2.2, you can also pass the file object directly as the only argument to the built-in type list. 如果文件大小不是特别大，那么一次将文件内容全部读出是最快的，一般来说对于以后文件内容的处理也是最方便的。

内置函数 open，打开文件并返回一个文件对象实例。可以调用read方法读取文件内容(文本或2进制)到一个大字符串。如果是文本内容，可以使用string的split方法或者特别的splitlines方法将此string分解成文本行的list。

经常进行文件内容按行分解，可以直接调用文件对象的readlines方法，更方便更快捷。:)。

在Python 2.2以及更高版本中，可以直接将文件对象实例作为内置list函数的唯一参数。

On Unix and Unix-like systems, such as Linux and BSD variants, there is no real distinction between text files and binary data files. On Windows and Macintosh systems, however, line terminators in text files are encoded not with the standard '\n' separator, but with '\r\n' and '\r', respectively. Python translates the line-termination characters into '\n' on your behalf, but this means that you need to tell Python when you open a binary file, so that it won't perform the translation. To do that, use 'rb' as the second argument to open. This is innocuous even on Unix-like platforms, and it's a good habit to distinguish binary files from text files even there, although it's not mandatory in that case. Such a good habit will make your programs more directly understandable, as well as letting you move them between platforms more easily.

You can call methods such as read directly on the file object produced by the open function, as shown in the first snippet of the solution. When you do this, as soon as the reading operation finishes, you no longer have a reference to the file object. In practice, Python notices the lack of a reference at once and immediately closes the file. However, it is better to bind a name to the result of open, so that you can call close yourself explicitly when you are done with the file. This ensures that the file stays open for as short a time as possible, even on platforms such as Jython and hypothetical future versions of Python on which more advanced garbage-collection mechanisms might delay the automatic closing that Python performs. ...

-  ⇤ ← Revision 1 as of 2004-09-19 22:05:29 → 
  Size: 990
  Editor: 61
  Comment:
+   ← Revision 8 as of 2004-09-19 23:08:40 → ⇥
  Size: 4839
  Editor: 61
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 14:
-== 问题 Problem ==
+== 问题  ==
 Line 26:
-更好的方法是将文件对象和一个变量绑定，可以及时关闭文件。比如，读取文本文件内容
+更好的方法是将文件对象和一个变量绑定，可以及时关闭文件。比如，读取文本文件内容：
 Line 29:
-file_object = open('thefile.txt')
+file_object = open('thefile.txt')              # 打开文件
 Line 31:
-file_object.close(  )                          #使用完毕，关闭文件
+file_object.close(  )                          # 使用完毕，关闭文件
 Line 34:
-...
+将文本文件的全部内容按照分行 作为一个tuple读出有4种方法：

{{{
list_of_all_the_lines = file_object.readlines(  )             # 方法 1
list_of_all_the_lines = file_object.read(  ).splitlines(1)    # 方法 2
list_of_all_the_lines = file_object.read().splitlines(  )     # 方法 3 
list_of_all_the_lines = file_object.read(  ).split('\n')      # 方法 4 
}}}

方法1、2 返回的list中包含的string 元素末尾有'\n'，
方法3、4 返回的list中包含的string元素末尾去掉了‘\n’.

方法1是效率最高的，而且最符合python的风格。 

在Python 2.2以及更高版本中，有方法5和方法1等效 
{{{
list_of_all_the_lines = list(file_object)                     # 方法 5 
}}}
-Line 40:
+Line 58:
+Unless the file you're reading is truly huge, slurping it all into memory in one gulp is fastest and generally most convenient for any further processing. The built-in function open creates a Python file object. With that object, you call the read method to get all of the contents (whether text or binary) as a single large string. If the contents are text, you may choose to immediately split that string into a list of lines, with the split method or with the specialized splitlines method. Since such splitting is a frequent need, you may also call readlines directly on the file object, for slightly faster and more convenient operation. In Python 2.2, you can also pass the file object directly as the only argument to the built-in type list. 
如果文件大小不是特别大，那么一次将文件内容全部读出是最快的，一般来说对于以后文件内容的处理也是最方便的。
-Line 41:
+Line 61:
+内置函数 '''open'''，打开文件并返回一个文件对象实例。 可以调用'''read'''方法读取文件内容(文本或2进制)到一个大字符串。如果是文本内容，可以使用'''string'''的'''split'''方法或者特别的'''splitlines'''方法将此string分解成文本行的'''list'''。 

经常进行文件内容按行分解，可以直接调用文件对象的'''readlines'''方法，更方便更快捷。:)。 

在Python 2.2以及更高版本中，可以直接将文件对象实例作为内置'''list'''函数的唯一参数。


On Unix and Unix-like systems, such as Linux and BSD variants, there is no real distinction between text files and binary data files. On Windows and Macintosh systems, however, line terminators in text files are encoded not with the standard '\n' separator, but with '\r\n' and '\r', respectively. Python translates the line-termination characters into '\n' on your behalf, but this means that you need to tell Python when you open a binary file, so that it won't perform the translation. To do that, use 'rb' as the second argument to open. This is innocuous even on Unix-like platforms, and it's a good habit to distinguish binary files from text files even there, although it's not mandatory in that case. Such a good habit will make your programs more directly understandable, as well as letting you move them between platforms more easily. 

You can call methods such as read directly on the file object produced by the open function, as shown in the first snippet of the solution. When you do this, as soon as the reading operation finishes, you no longer have a reference to the file object. In practice, Python notices the lack of a reference at once and immediately closes the file. However, it is better to bind a name to the result of open, so that you can call close yourself explicitly when you are done with the file. This ensures that the file stays open for as short a time as possible, even on platforms such as Jython and hypothetical future versions of Python on which more advanced garbage-collection mechanisms might delay the automatic closing that Python performs.

Diff for "PyCkBk-4-2"

描述

问题

解决

讨论 Discussion

参考 See Also