Apache-Drill 查询镶木地板文件:镶木地板记录读取器出错

射线

我使用 Pyarrow 创建了一个镶木地板文件,可以使用 Pyspark 查询它。然而,它不能使用最近安装的 Apache-drill(1.14) 进行查询,它可以处理其他数据格式,包括 csv、json 和 RDB。有人可以帮助我解决问题所在,我该如何解决?谢谢!

(我能够运行 count(*) 查询,但无法运行下面的查询)

这是我的查询和错误消息:

select * from dfs.`C:/Apache_Spark/sample_Sends_2017.parquet` limit 20;

查询执行失败

原因:

SQL Error: INTERNAL_ERROR ERROR: Error in parquet record reader.
Message: Failure in setting up reader
Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
optional int64 SendsID;
optional int64 SendJobsID;
optional int64 SendID;
optional binary EncryptIndivID (UTF8);
optional int64 SendDate (TIMESTAMP_MICROS);
optional int64 __index_level_0__;
}

, metadata: {pandas={"index_columns": ["__index_level_0__"], "column_indexes": [{"name": null, "field_name": null, "pandas_type": "unicode", "numpy_type": "object", "metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "SendsID", "field_name": "SendsID", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "SendJobsID", "field_name": "SendJobsID", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "SendID", "field_name": "SendID", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}, {"name": "EncryptIndivID", "field_name": "EncryptIndivID", "pandas_type": "unicode", "numpy_type": "object", "metadata": null}, {"name": "SendDate", "field_name": "SendDate", "pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null}, {"name": null, "field_name": "__index_level_0__", "pandas_type": "int64", "numpy_type": "int64", "metadata": null}], "pandas_version": "0.23.0"}}}, blocks: [BlockMetaData{1000, 46321 [ColumnMetaData{SNAPPY [SendsID] optional int64 SendsID  [PLAIN_DICTIONARY, RLE, PLAIN], 4917}, ColumnMetaData{SNAPPY [SendJobsID] optional int64 SendJobsID  [PLAIN_DICTIONARY, RLE, PLAIN], 6342}, ColumnMetaData{SNAPPY [SendID] optional int64 SendID  [PLAIN_DICTIONARY, RLE, PLAIN], 6568}, ColumnMetaData{SNAPPY [EncryptIndivID] optional binary EncryptIndivID (UTF8)  [PLAIN_DICTIONARY, RLE, PLAIN], 39530}, ColumnMetaData{SNAPPY [SendDate] optional int64 SendDate (TIMESTAMP_MICROS)  [PLAIN_DICTIONARY, RLE, PLAIN], 41195}, ColumnMetaData{SNAPPY [__index_level_0__] optional int64 __index_level_0__  [PLAIN_DICTIONARY, RLE, PLAIN], 45450}]}]}
Fragment 0:0
维塔利·迪拉夫卡

看起来这是一个已知问题DRILL-6670并在当前的 Apache Drill分支中解决您可以从此分支构建 Drill 或等待即将发布的 Drill 1.15.0 版本。

问题出在optional int64 SendDate (TIMESTAMP_MICROS)专栏中。您可以尝试从查询中排除它或将其转换为 BigInt,在此评论中查看更多信息

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章