在 Python 中,文件流(filestream)操作通过内置的 open()
函数实现,它提供了对文件的读取、写入、以及流控制的支持。常见的文件模式包括:
r
:只读模式(默认)。w
:写入模式(会覆盖已有内容)。a
:追加模式。r+
:读写模式。
下面介绍如何使用文件流进行基本的文件操作,以及如何控制文件流读取(如逐行读取、分块读取等)。
1、问题背景
在编写一个编译器时,需要逐个字符地读取文件中的内容。如果遇到 “/” 后跟另一个 “/”,则将把其余的行视为注释。使用 file.read(1) 每次读取一个字符。但是,如果查找到 “/” 后面跟着不是 “/” 的字符,有没有办法将文件流向后移动一个字符,以免丢失该字符?
以下是相关代码:
def tokenType(self):
# PAGE 108
if (self.current == '{' or self.current == '}' or self.current == '(' or self.current == ')' or self.current == '[' or self.current == ']' or self.current == '.' or self.current == ',' or self.current == ';' or self.current == '-' or self.current == '*' or self.current == '/' or self.current == '&' or self.current == '|' or self.current == '<' or self.current == '>' or self.current == '=' or self.current == '~'):
if (self.current == '/'):
next = self.file.read(1)
if (next == '/'):
while (next != "\n"):
next = self.file.read(1)
return "IGNORE"
if (next == '*'):
while (True):
next = self.file.read(1)
if (next == '*'):
next = self.file.read(1)
if (next == '/'):
break
return "IGNORE"
else:
return "SYMBOL"
return "SYMBOL"
elif (self.current == " " or self.current == "\n"):
return "IGNORE"
elif (self.current == "'"):
while(next != "'"):
self.current = self.current + next
return "STRING_CONST"
elif (type(self.current) == int):
next = self.file.read(1)
while(next != " "):
self.current = self.current + next
return "INT_CONST"
else:
next = self.file.read(1)
while(next != " " and next != ""):
self.current = self.current + next
next = self.file.read(1)
if (self.current == 'class' or self.current == 'constructor' or self.current == 'function' or self.current == 'method' or self.current == 'field' or self.current == 'static' or self.current == 'var' or self.current == 'int' or self.current == 'char' or self.current == 'boolean' or self.current == 'void' or self.current == 'true' or self.current == 'false' or self.current == 'null' or self.current == 'this' or self.current == 'let' or self.current == 'do' or self.current == 'if' or self.current == 'else' or self.current == 'while' or self.current == 'return'):
return "KEYWORD"
else:
return "IDENTIFIER"
My problem seems to be when I have something like 10/5 and my program checks to see if the next character is a "/". Then on the next pass through my character interpreting function, the 5 has already been removed when it was checking for a comment.
So, is there any way I can get a character from a file stream without it being "removed" from the stream or is there a way I can move it back a character when I hit a case like this?
2、解决方案
-
第一种方法: 使用
file.seek()
函数调整文件流位置file.seek()
可以将文件流指针定位到文件中的特定位置。在处理完一个字符后,可以使用file.seek()
将流指针向前移动一个字符,以便在下次读取时能够读取该字符。def tokenType(self): # PAGE 108 if (self.current == '{' or self.current == '}' or self.current == '(' or self.current == ')' or self.current == '[' or self.current == ']' or self.current == '.' or self.current == ',' or self.current == ';' or self.current == '-' or self.current == '*' or self.current == '/' or self.current == '&' or self.current == '|' or self.current == '<' or self.current == '>' or self.current == '=' or self.current == '~'): if (self.current == '/'): next = self.file.read(1) if (next == '/'): while (next != "\n"): next = self.file.read(1) return "IGNORE" if (next == '*'): while (True): next = self.file.read(1) if (next == '*'): next = self.file.read(1) if (next == '/'): break return "IGNORE" else: self.file.seek(-1, 1) # 将文件流指针向前移动一个字符 return "SYMBOL" return "SYMBOL" elif (self.current == " " or self.current == "\n"): return "IGNORE" elif (self.current == "'"): while(next != "'"): self.current = self.current + next return "STRING_CONST" elif (type(self.current) == int): next = self.file.read(1) while(next != " "): self.current = self.current + next return "INT_CONST" else: next = self.file.read(1) while(next != " " and next != ""): self.current = self.current + next next = self.file.read(1) if (self.current == 'class' or self.current == 'constructor' or self.current == 'function' or self.current == 'method' or self.current == 'field' or self.current == 'static' or self.current == 'var' or self.current == 'int' or self.current == 'char' or self.current == 'boolean' or self.current == 'void' or self.current == 'true' or self.current == 'false' or self.current == 'null' or self.current == 'this' or self.current == 'let' or self.current == 'do' or self.current == 'if' or self.current == 'else' or self.current == 'while' or self.current == 'return'): return "KEYWORD" else: return "IDENTIFIER" My problem seems to be when I have something like 10/5 and my program checks to see if the next character is a "/". Then on the next pass through my character interpreting function, the 5 has already been removed when it was checking for a comment. So, is there any way I can get a character from a file stream without it being "removed" from the stream or is there a way I can move it back a character when I hit a case like this?
-
第二种方法: 使用 Python 的
io.StringIO()
类io.StringIO()
类可以创建一个文件对象,该对象将字符串作为输入。这样,就可以将字符串作为文件流来处理。当需要将文件流指针向前移动时,可以使用io.StringIO()
的seek()
方法来调整指针位置。import io def tokenType(self): string_io = io.StringIO(self.file.read()) # 将文件内容作为字符串读入 while True: char = string_io.read(1) if char == '{' or char == '}' or char == '(' or char == ')' or char == '[' or char == ']' or char == '.' or char == ',' or char == ';' or char == '-' or char == '*' or char == '/' or char == '&' or char == '|' or char == '<' or char == '>' or char == '=' or char == '~': if char == '/': next = string_io.read(1) if next == '/': while next != "\n": next = string_io.read(1) return "IGNORE" if next == '*': while True: next = string_io.read(1) if next == '*': next = string_io.read(1) if next == '/': break return "IGNORE" else: string_io.seek(-1, 1) # 将文件流指针向前移动一个字符 return "SYMBOL" return "SYMBOL" elif char == " " or char == "\n": return
总结
- 按行读取:适用于逐行处理大文件。
- 分块读取:适用于内存敏感的操作,尤其是处理超大文件时。
- 文件指针控制:通过
seek()
和tell()
可以实现随机访问和流控制。 - 安全文件操作:使用
with
关键字和异常处理可以确保文件安全、正确地被打开和关闭。
这些方法可以帮助你高效地控制和处理文件流,尤其是在处理大文件时,能够大大优化内存使用。