python csvw格式文件转parquet格式文件
用到的包: pandas pyarrow
pandas pd df pd.(,,) df.()
要求csv文件 要有头行
一定要安装pyarrow
pip install pyarrow
读取 parquet文件
pyarrow.parquet pq table pq.() df table.() bbdf.() (bb) (bb.())
head(10)获取前10行
然后在给转一下json格式
自定义数据
from fastparquet import write, ParquetFile
df = pd.DataFrame({"col1": [1,2,3,4], "col2": ["a","b","c","d"]})
df.to_csv("test_csv", index=False)
df_csv = pd.read_csv("test_csv")
df_csv.to_parquet("test_parquet", compression="GZIP")
df_parquet = ParquetFile("/tmp/test_parquet").to_pandas()
df_parquet.head()
https://blog.csdn.net/weixin_34390996/article/details/92760588
遇见的问题
解决方案
https://www.jianshu.com/p/be233bdb4dbf
https://blog.csdn.net/littlehaes/article/details/107157812