SuccessChanges

Summary

  1. [SPARK-26709][SQL][BRANCH-2.3] OptimizeMetadataOnlyQuery does not handle (commit: f98aee4d63ca4a51fb7d98e1d36f8a82d62cf378) (details)
Commit f98aee4d63ca4a51fb7d98e1d36f8a82d62cf378 by yamamuro
[SPARK-26709][SQL][BRANCH-2.3] OptimizeMetadataOnlyQuery does not handle
empty records correctly
## What changes were proposed in this pull request?
When reading from empty tables, the optimization
`OptimizeMetadataOnlyQuery` may return wrong results:
``` sql("CREATE TABLE t (col1 INT, p1 INT) USING PARQUET PARTITIONED BY
(p1)") sql("INSERT INTO TABLE t PARTITION (p1 = 5) SELECT ID FROM
range(1, 1)") sql("SELECT MAX(p1) FROM t")
``` The result is supposed to be `null`. However, with the optimization
the result is `5`.
The rule is originally ported from
https://issues.apache.org/jira/browse/HIVE-1003 in #13494. In Hive, the
rule is disabled by default in a later
release(https://issues.apache.org/jira/browse/HIVE-15397), due to the
same problem.
It is hard to completely avoid the correctness issue. Because data
sources like Parquet can be metadata-only. Spark can't tell whether it
is empty or not without actually reading it. This PR disable the
optimization by default.
## How was this patch tested? Unit test
Closes #23648 from gengliangwang/SPARK-26709.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: f98aee4d63ca4a51fb7d98e1d36f8a82d62cf378)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala (diff)
The file was modifieddocs/sql-programming-guide.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)