python的文件处理,增删列的问题?
如文件原先是:
a,b,c
d,d,e
e,e,e
。。。
1,新增后变成:
a,b,k,c
d,d,k,e
e,e,k,e
。。。。
python 在第三列前增加了一列,且那一列是个常量(就是都是某个值,不变的),记录数是上千万条的,
要求速度也要快,有什么好方法?
2,
还有就是删除列
删除第2、3列后变成
a,c
d,e
e,e
。。。
记录数是上千万条的,
要求速度也要快,有什么好方法?
[解决办法]
- Python code
#!/usr/bin/env pythonaText = [ ['a','b','c'], ['d','d','e'], ['e','e','e'],];aCol = 2for aList in aText: print aList[0:aCol] + ['k'] + aList[aCol:]aCol = 1for aList in aText: print aList[0:aCol] + aList[aCol + 1:]
[解决办法]
- Python code
class BufferedWriter: MAXBUFFSIZE = 16384 LINEFEED = '\n' SPLITER = ',' def __init__(self, filename): self.filename = filename def __enter__(self): self.handle = open(self.filename, 'w') self.buffer = [] return self def __exit__(self, *args): if self.buffer: self.flush() self.handle.close() def append(self, row): self.buffer.append(row) if len(self.buffer)>=self.MAXBUFFSIZE: self.flush() def flush(self): self.buffer, data = [], self.buffer self.handle.write(''.join( map(lambda row: self.SPLITER.join(row)+self.LINEFEED, data) ))def eachRow(filename, spliter=','): with open(filename,'r') as handle: for ln in handle: yield ln.strip().split(spliter)-- add column 'k' at pos kwith BufferWriter('target.txt') as bw: for row in eachRow('src.txt'): bw.append(row[:k] + ['k',] + row[k:])-- del column at pos kwith BufferWriter('target.txt') as bw: for row in eachRow('src.txt'): bw.append(row[:k] + row[k+1:])