您如何使用附加/更新parquet
文件pyarrow
?
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
table2 = pd.DataFrame({'one': [-1, np.nan, 2.5], 'two': ['foo', 'bar', 'baz'], 'three': [True, False, True]})
table3 = pd.DataFrame({'six': [-1, np.nan, 2.5], 'nine': ['foo', 'bar', 'baz'], 'ten': [True, False, True]})
pq.write_table(table2, './dataNew/pqTest2.parquet')
#append pqTest2 here?
我在文档中找不到有关添加镶木地板文件的任何内容。并且,可以pyarrow
与多处理一起使用来插入/更新数据吗?
我遇到了同样的问题,我认为可以使用以下方法解决它:
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
chunksize=10000 # this is the number of lines
pqwriter = None
for i, df in enumerate(pd.read_csv('sample.csv', chunksize=chunksize)):
table = pa.Table.from_pandas(df)
# for the first chunk of records
if i == 0:
# create a parquet write object giving it an output file
pqwriter = pq.ParquetWriter('sample.parquet', table.schema)
pqwriter.write_table(table)
# close the parquet writer
if pqwriter:
pqwriter.close()
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句