-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Labels
hiveHive connectorHive connector
Description
We tested the following SQLs on Trino 451, 470 and 476, all versions behave the same.
On Hive metastore 3.1.3 everything works fine, on 4.0.0 and 4.0.1 the problem below occurs.
Minimal run for easy reproduction:
CREATE TABLE IF NOT EXISTS hive.demo.ny_taxi_data_raw5 (
VendorID BIGINT,
tpep_pickup_datetime TIMESTAMP,
tpep_dropoff_datetime TIMESTAMP,
passenger_count DOUBLE,
trip_distance DOUBLE,
payment_type BIGINT,
Fare_amount DOUBLE,
Tip_amount DOUBLE,
Total_amount DOUBLE
) WITH (
external_location = 's3a://demo/ny-taxi-data/raw/',
format = 'parquet'
);
-- Works
ANALYZE hive.demo.ny_taxi_data_raw5;
-- Fails: Invalid column statistics data: ColumnStatisticsObj(colName:tpep_dropoff_datetime, colType:timestamp, statsData:<ColumnStatisticsData >)
ANALYZE hive.demo.ny_taxi_data_raw5;
This where we get the test data from. It's the well-known New York taxi dataset.
for month in \
2020-01 2020-02 2020-03 2020-04 2020-05 2020-06 2020-07 2020-08 2020-09 2020-10 \
2020-11 2020-12 2021-01 2021-02 2021-03 2021-04 2021-05 2021-06 2021-07 2021-08 \
2021-09 2021-10 2021-11 2021-12 2022-01 2022-02 2022-03 2022-04; do
curl -O "https://repo.stackable.tech/repository/misc/ny-taxi-data/yellow_tripdata_$month.parquet"
mc alias set minio "$MINIO_ENDPOINT" "$MINIO_ACCESS_KEY" "$MINIO_SECRET_KEY"
mc cp "yellow_tripdata_$month.parquet" minio/demo/ny-taxi-data/raw/
done
A second run with a bit more debug output
CREATE TABLE hive.demo.ny_taxi_data_raw6 [..] -- same statement as for ny_taxi_data_raw5
-- All NULL - expected
SHOW STATS FOR hive.demo.ny_taxi_data_raw6;
-- Works
ANALYZE hive.demo.ny_taxi_data_raw6;
-- *Still* all NULL - why still NULL?
SHOW STATS FOR hive.demo.ny_taxi_data_raw6;
-- Fails: Invalid column statistics data: ColumnStatisticsObj(colName:tpep_dropoff_datetime, colType:timestamp, statsData:<ColumnStatisticsData >)
ANALYZE hive.demo.ny_taxi_data_raw6;
-- *Still* all NULL
SHOW STATS FOR hive.demo.ny_taxi_data_raw6;
-- Works: 68224564
SELECT COUNT(*) FROM hive.demo.ny_taxi_data_raw6;
Stack trace (from Trino 476)
io.trino.spi.TrinoException: Invalid column statistics data: ColumnStatisticsObj(colName:tpep_dropoff_datetime, colType:timestamp, statsData:<ColumnStatisticsData >)
at io.trino.plugin.hive.metastore.thrift.ThriftMetastoreUtil.fromMetastoreApiColumnStatistics(ThriftMetastoreUtil.java:588)
at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.groupStatisticsByColumn(ThriftHiveMetastore.java:409)
at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.lambda$getTableColumnStatistics$0(ThriftHiveMetastore.java:327)
at io.trino.plugin.hive.metastore.thrift.ThriftMetastoreApiStats.lambda$wrap$0(ThriftMetastoreApiStats.java:41)
at io.trino.plugin.hive.metastore.thrift.RetryDriver.run(RetryDriver.java:117)
at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.getTableColumnStatistics(ThriftHiveMetastore.java:325)
at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.getCurrentTableStatistics(ThriftHiveMetastore.java:473)
at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.updateTableStatistics(ThriftHiveMetastore.java:429)
at io.trino.plugin.hive.metastore.thrift.BridgingHiveMetastore.updateTableStatistics(BridgingHiveMetastore.java:135)
at io.trino.metastore.tracing.TracingHiveMetastore.lambda$updateTableStatistics$1(TracingHiveMetastore.java:141)
at io.trino.metastore.tracing.Tracing.lambda$withTracing$0(Tracing.java:35)
at io.trino.metastore.tracing.Tracing.withTracing(Tracing.java:43)
at io.trino.metastore.tracing.Tracing.withTracing(Tracing.java:34)
at io.trino.metastore.tracing.TracingHiveMetastore.updateTableStatistics(TracingHiveMetastore.java:141)
at io.trino.metastore.cache.CachingHiveMetastore.updateTableStatistics(CachingHiveMetastore.java:531)
at io.trino.metastore.cache.CachingHiveMetastore.updateTableStatistics(CachingHiveMetastore.java:531)
at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.lambda$setTableStatistics$0(SemiTransactionalHiveMetastore.java:525)
at io.trino.plugin.hive.metastore.SemiTransactionalHiveMetastore.commit(SemiTransactionalHiveMetastore.java:1257)
at io.trino.plugin.hive.HiveMetadata.commit(HiveMetadata.java:3899)
at io.trino.plugin.hive.HiveTransactionManager.lambda$commit$0(HiveTransactionManager.java:58)
at java.base/java.util.Optional.ifPresent(Unknown Source)
at io.trino.plugin.hive.HiveTransactionManager.commit(HiveTransactionManager.java:56)
at io.trino.plugin.hive.HiveConnector.commit(HiveConnector.java:210)
at io.trino.metadata.CatalogTransaction.commit(CatalogTransaction.java:86)
at io.trino.metadata.CatalogMetadata.commit(CatalogMetadata.java:153)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:128)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:74)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:80)
at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79)
at io.trino.$gen.Trino_476_stackable0_0_0_dev____20250716_125820_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
NickLarsenNZ and razvan
Metadata
Metadata
Assignees
Labels
hiveHive connectorHive connector