我有一个数据txt文件,其格式设置为以以下格式(有些夸张)加载到数据库(MySQL)中:
data.txt
name age profession datestamp
John 23 engineer 2020-03-01
Amy 17 doctor 2020-02-27
Gordon 19 artist 2020-02-27
Kevin 25 chef 2020-03-01
以上是通过以下通过python执行的命令生成的:
LOAD DATA LOCAL INFILE '/home/sample_data/data.txt' REPLACE INTO TABLE person_professions
FIELDS TERMINATED BY 0x01 OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n'
(name,age,profession,datestamp)
创建data.txt; 但是,data.txt确实非常庞大,无法一次全部插入(设置了约200 MB的插入限制),我想将数据切成几个块(data_1.txt,data_2.txt,data_3.txt等) 。)并一一插入,以免达到插入大小限制。我知道您可以逐行查找条件以将数据切出,例如
with open('data.txt', 'w') as f:
data = f.read().split('\n')
if some condition:
with open('data_1.txt', 'w') as f2:
insert data
但是我不太确定如何提出条件断点,以使其开始插入新的txt文件,除非有更好的方法。
我写了一个函数,可以根据文件的大小完成任务。代码注释中的解释。
def split_file(file_name, lines_per_file=100000):
# Open large file to be read in UTF-8
with open(file_name, 'r', encoding='utf-8') as rf:
# Read all lines in file
lines = rf.readlines()
print ( str(len(lines)) + ' LINES READ.')
# Set variables to count file number and count of lines written
file_no = 0
wlines_count = 0
# For x from 0 to length of lines read stepping by number of lines that will be written in each file
for x in range(0, len(lines), lines_per_file):
# Open new "split" file for writing in UTF-8
with open( 'data' + '-' + str(file_no) + '.txt', 'w', encoding='utf-8') as wf:
# Write lines
wf.writelines(lines[x:x+lines_per_file])
# Update the written lines count
wlines_count += (len(lines[x:x + lines_per_file]))
# Update new "split" file count mainly for naming
file_no+=1
print(str(wlines_count) + " LINES WRITTEN IN " + str(file_no) + " FILES.")
# Split data.txt into files containing 100000 lines
split_file('data.txt',100000)
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句