FailedChanges

Summary

  1. [SPARK-34306][SQL][PYTHON][R] Use Snake naming rule across the function (commit: 30468a9) (details)
  2. [SPARK-34300][PYSPARK][DOCS][MINOR] Fix some typos and syntax issues in (commit: d99d0d2) (details)
  3. [SPARK-34316][K8S] Support spark.kubernetes.executor.disableConfigMap (commit: f66e38c) (details)
  4. [SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null (commit: 521397f) (details)
  5. [SPARK-34319][SQL] Resolve duplicate attributes for (commit: e9362c2) (details)
  6. [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI (commit: bb9bf66) (details)
  7. [SPARK-34317][SQL] Introduce relationTypeMismatchHint to UnresolvedTable (commit: f024d30) (details)
  8. [SPARK-34312][SQL] Support partition(s) truncation by (commit: 6d3674b) (details)
  9. [SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3 (commit: 5acc5b8) (details)
  10. [SPARK-34318][SQL] Dataset.colRegex should work with column names and (commit: 66f3480) (details)
  11. [SPARK-33599][SQL] Restore the assert-like in catalyst/analysis (commit: 5b2ad59) (details)
  12. [SPARK-33591][SQL][FOLLOW-UP] Revise the version and doc of (commit: ff1b6ec) (details)
  13. [SPARK-34282][SQL][TESTS] Unify v1 and v2 TRUNCATE TABLE tests (commit: 79515b8) (details)
  14. [SPARK-34263][SQL] Simplify the code for treating unicode/octal/escaped (commit: d308794) (details)
  15. [SPARK-34324][SQL] FileTable should not list TRUNCATE in capabilities by (commit: cadca8d) (details)
  16. [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the (commit: 6386602) (details)
  17. [SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal (commit: 00120ea) (details)
  18. [SPARK-34325][CORE] Remove unused shuffleBlockResolver variable (commit: 60c71c6) (details)
  19. [SPARK-34308][SQL] Escape meta-characters in printSchema (commit: 603a7fd) (details)
  20. Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending (commit: e927bf9) (details)
  21. [SPARK-34307][SQL] TakeOrderedAndProjectExec avoid shuffle if input rdd (commit: fc80a5b) (details)
  22. [SPARK-34313][SQL] Migrate ALTER TABLE SET/UNSET TBLPROPERTIES commands (commit: a1d4bb3) (details)
  23. [SPARK-34327][BUILD] Strip passwords from inlining into build (commit: 89bf2af) (details)
  24. [SPARK-34182][AVRO] Improve error messages when matching (commit: 791de00) (details)
  25. [SPARK-34335][SQL] Support referencing subquery with column aliases by (commit: 76a7fca) (details)
  26. [SPARK-34304][SQL] Remove view checks in v1 alter table commands (commit: 7bfb4a4) (details)
  27. [SPARK-34317][SQL][FOLLOW-UP] Use relationTypeMismatchHint when (commit: 3d7e139) (details)
  28. [SPARK-34340][CORE] Support ZSTD JNI BufferPool (commit: 8e28218) (details)
  29. [SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the (commit: 44dcf00) (details)
  30. [SPARK-34341][BUILD] Skip zinc related operations on aarch64 (commit: 42c32e8) (details)
  31. [SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop (commit: 6a194f1) (details)
  32. [SPARK-34357][SQL] Map JDBC SQL TIME type to TimestampType with time (commit: 7675582) (details)
  33. [SPARK-34339][CORE][SQL] Expose the number of total paths in (commit: fbe726f) (details)
  34. [SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size (commit: 961c851) (details)
  35. [MINOR][ML] Param Validation should throw IllegalArgumentException (commit: 5f1af69) (details)
  36. [SPARK-34359][SQL] Add a legacy config to restore the output schema of (commit: 361d702) (details)
  37. [SPARK-34343][SQL][TESTS] Add missing test for some non-array types in (commit: 55399eb) (details)
  38. [SPARK-34371][SQL][TESTS] Run the datetime rebasing tests for Parquet (commit: ee11a8f) (details)
  39. [SPARK-34331][SQL] Speed up DS v2 metadata col resolution (commit: 989eb68) (details)
  40. [SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for (commit: 76baaf7) (details)
  41. [SPARK-34350][SQL][TESTS] replace withTimeZone defined in (commit: 1f4135c) (details)
  42. [SPARK-26836][SQL] Supporting Avro schema evolution for partitioned Hive (commit: e614f34) (details)
Commit 30468a901577e82c855fbc4cb78e1b869facb44c by gurwls223
[SPARK-34306][SQL][PYTHON][R] Use Snake naming rule across the function APIs

### What changes were proposed in this pull request?

This PR completes snake_case rule at functions APIs across the languages, see also SPARK-10621.

In more details, this PR:
- Adds `count_distinct` in Scala Python, and R, and document that `count_distinct` is encouraged. This was not deprecated because `countDistinct` is pretty commonly used. We could deprecate in the future releases.
- (Scala-specific) adds `typedlit` but doesn't deprecate `typedLit` which is arguably commonly used. Likewise, we could deprecate in the future releases.
- Deprecates and renames:
  - `sumDistinct` -> `sum_distinct`
  - `bitwiseNOT` -> `bitwise_not`
  - `shiftLeft` -> `shiftleft` (matched with SQL name in `FunctionRegistry`)
  - `shiftRight` -> `shiftright` (matched with SQL name in `FunctionRegistry`)
  - `shiftRightUnsigned` -> `shiftrightunsigned` (matched with SQL name in `FunctionRegistry`)
  - (Scala-specific) `callUDF` -> `call_udf`

### Why are the changes needed?

To keep the consistent naming in APIs.

### Does this PR introduce _any_ user-facing change?

Yes, it deprecates some APIs and add new renamed APIs as described above.

### How was this patch tested?

Unittests were added.

Closes #31408 from HyukjinKwon/SPARK-34306.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 30468a9)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SameResultSuite.scala (diff)
The file was modifiedpython/pyspark/sql/tests/test_column.py (diff)
The file was modifiedR/pkg/vignettes/sparkr-vignettes.Rmd (diff)
The file was modifieddocs/sql-getting-started.md (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala (diff)
The file was modifiedexamples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java (diff)
The file was modifiedsql/hive/src/test/java/org/apache/spark/sql/hive/JavaDataFrameSuite.java (diff)
The file was modifiedpython/docs/source/reference/pyspark.sql.rst (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/MathFunctionsSuite.scala (diff)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL.R (diff)
The file was modifiedpython/pyspark/sql/tests/test_functions.py (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala (diff)
The file was modifiedR/pkg/R/generics.R (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java (diff)
The file was modifiedpython/pyspark/sql/tests/test_group.py (diff)
The file was modifiedR/pkg/NAMESPACE (diff)
The file was modifiedR/pkg/R/functions.R (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala (diff)
The file was modifiedpython/pyspark/sql/functions.pyi (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ObjectHashAggregateSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala (diff)
Commit d99d0d27be875bba692bcfe376f90c930e170380 by gurwls223
[SPARK-34300][PYSPARK][DOCS][MINOR] Fix some typos and syntax issues in docstrings and output of `dev/lint-python`

This changeset is published into the public domain.

### What changes were proposed in this pull request?

Some typos and syntax issues in docstrings and the output of `dev/lint-python` have been fixed.

### Why are the changes needed?
In some places, the documentation did not refer to parameters or classes by the full and correct name, potentially causing uncertainty in the reader or rendering issues in Sphinx. Also, a typo in the standard output of `dev/lint-python` was fixed.

### Does this PR introduce _any_ user-facing change?

Slight improvements in documentation, and in standard output of `dev/lint-python`.

### How was this patch tested?

Manual testing and `dev/lint-python` run. No new Sphinx warnings arise due to this change.

Closes #31401 from DavidToneian/SPARK-34300.

Authored-by: David Toneian <david@toneian.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: d99d0d2)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedpython/pyspark/sql/avro/functions.py (diff)
The file was modifieddev/lint-python (diff)
Commit f66e38c963243e6c690d7b7a65c00d80ff80b0b5 by dhyun
[SPARK-34316][K8S] Support spark.kubernetes.executor.disableConfigMap

### What changes were proposed in this pull request?

This PR aims to add a new configuration `spark.kubernetes.executor.disableConfigMap`.

### Why are the changes needed?

This can be use to disable config map creating for executor pods due to https://github.com/apache/spark/pull/27735 .

### Does this PR introduce _any_ user-facing change?

No. By default, this doesn't change AS-IS behavior.
This is a new feature to add an ability to disable SPARK-30985.

### How was this patch tested?

Pass the newly added UT.

Closes #31428 from dongjoon-hyun/SPARK-34316.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: f66e38c)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala (diff)
Commit 521397f2f9b58a2e827e9d6abe049576cf6e9b2d by gurwls223
[SPARK-33591][SQL][FOLLOWUP] Add legacy config for recognizing null partition spec values

### What changes were proposed in this pull request?

This is a follow up for https://github.com/apache/spark/pull/30538.
It adds a legacy conf `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` in case users wants the legacy behavior.
It also adds document for the behavior change.

### Why are the changes needed?

In case users want the legacy behavior, they can set `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` as true.

### Does this PR introduce _any_ user-facing change?

Yes, adding a legacy configuration to restore the old behavior.

### How was this patch tested?

Unit test.

Closes #31421 from gengliangwang/legacyNullStringConstant.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 521397f)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala (diff)
Commit e9362c2571f4a329218ff466fce79eef45e8f992 by gurwls223
[SPARK-34319][SQL] Resolve duplicate attributes for FlatMapCoGroupsInPandas/MapInPandas

### What changes were proposed in this pull request?

Resolve duplicate attributes for `FlatMapCoGroupsInPandas`.

### Why are the changes needed?

When performing self-join on top of `FlatMapCoGroupsInPandas`, analysis can fail because of conflicting attributes. For example,

```scala
df = spark.createDataFrame([(1, 1)], ("column", "value"))
row = df.groupby("ColUmn").cogroup(
    df.groupby("COLUMN")
).applyInPandas(lambda r, l: r + l, "column long, value long")
row.join(row).show()
```
error:

```scala
...
Conflicting attributes: column#163321L,value#163322L
;;
’Join Inner
:- FlatMapCoGroupsInPandas [ColUmn#163312L], [COLUMN#163312L], <lambda>(column#163312L, value#163313L, column#163312L, value#163313L), [column#163321L, value#163322L]
:  :- Project [ColUmn#163312L, column#163312L, value#163313L]
:  :  +- LogicalRDD [column#163312L, value#163313L], false
:  +- Project [COLUMN#163312L, column#163312L, value#163313L]
:     +- LogicalRDD [column#163312L, value#163313L], false
+- FlatMapCoGroupsInPandas [ColUmn#163312L], [COLUMN#163312L], <lambda>(column#163312L, value#163313L, column#163312L, value#163313L), [column#163321L, value#163322L]
   :- Project [ColUmn#163312L, column#163312L, value#163313L]
   :  +- LogicalRDD [column#163312L, value#163313L], false
   +- Project [COLUMN#163312L, column#163312L, value#163313L]
      +- LogicalRDD [column#163312L, value#163313L], false
...
```

### Does this PR introduce _any_ user-facing change?

yes, the query like the above example won't fail.

### How was this patch tested?

Adde unit tests.

Closes #31429 from Ngone51/fix-conflcting-attrs-of-FlatMapCoGroupsInPandas.

Lead-authored-by: yi.wu <yi.wu@databricks.com>
Co-authored-by: wuyi <yi.wu@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: e9362c2)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_cogrouped_map.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_map.py (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala (diff)
Commit bb9bf66bb6235ed6b9cc03ece8ba8c25c4560d88 by wenchen
[SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines

### What changes were proposed in this pull request?
In spark, the `count(table.*)` may cause very weird result, for example:
```
select count(*) from (select 1 as a, null as b) t;
output: 1
select count(t.*) from (select 1 as a, null as b) t;
output: 0
```
 This is because spark expands `t.*` while converts `*` to count(1), this will confuse
users. After checking the ANSI standard, `count(*)` should always be `count(1)` while `count(t.*)`
is not allowed. What's more, this is also not allowed by common databases, e.g. MySQL, Oracle.

So, this PR proposes to block the ambiguous behavior and print a clear error message for users.

### Why are the changes needed?
to avoid ambiguous behavior and follow ANSI standard and other SQL engines

### Does this PR introduce _any_ user-facing change?
Yes, `count(table.*)` behavior will be blocked and output an error message.

### How was this patch tested?
newly added and existing tests

Closes #31286 from linhongliu-db/fix-table-star.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: bb9bf66)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/udf/postgreSQL/udf-join.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/count.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/udf-count.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/udf/udf-count.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/udf/postgreSQL/udf-join.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/count.sql (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/postgreSQL/join.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/join.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit f024d3051c94b7a6db443345524c5c068c4e82c7 by wenchen
[SPARK-34317][SQL] Introduce relationTypeMismatchHint to UnresolvedTable for a better error message

### What changes were proposed in this pull request?

This PR proposes to add `relationTypeMismatchHint` to `UnresolvedTable` so that if a relation is resolved to a view when a table is expected, a hint message can be included as a part of the analysis exception message. Note that the same feature is already introduced to `UnresolvedView` in #30636.

This mostly affects `ALTER TABLE` commands where the analysis exception message will now contain `Please use ALTER VIEW as instead`.

### Why are the changes needed?

To give a better error message. (The hint used to exist but got removed for commands that migrated to the new resolution framework)

### Does this PR introduce _any_ user-facing change?

Yes, now `ALTER TABLE` commands include a hint to use `ALTER VIEW` instead.
```
sql("ALTER TABLE v SET SERDE 'whatever'")
```
Before:
```
"v is a view. 'ALTER TABLE ... SET [SERDE|SERDEPROPERTIES]' expects a table.
```
After this PR:
```
"v is a view. 'ALTER TABLE ... SET [SERDE|SERDEPROPERTIES]' expects a table. Please use ALTER VIEW instead.
```

### How was this patch tested?

Updated existing test cases to include the hint.

Closes #31424 from imback82/better_error.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f024d30)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableRenamePartitionParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableAddPartitionParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/v2ResolutionPlans.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableRecoverPartitionsParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsParserSuite.scala (diff)
Commit 6d3674bb6212ab911663c2450ff308c0c615b262 by wenchen
[SPARK-34312][SQL] Support partition(s) truncation by `Supports(Atomic)PartitionManagement`

### What changes were proposed in this pull request?
1. Add new method `truncatePartition()` to the `SupportsPartitionManagement` interface.
2. Add new method `truncatePartitions()` to the `SupportsAtomicPartitionManagement` interface.
3. Default implementation of new methods in `InMemoryPartitionTable`/`InMemoryAtomicPartitionTable`.

### Why are the changes needed?
This is the first step in supporting of v2 `TRUNCATE TABLE .. PARTITION`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running new tests:
```
$ build/sbt "test:testOnly *SupportsPartitionManagementSuite"
$ build/sbt "test:testOnly *SupportsAtomicPartitionManagementSuite"
```

Closes #31420 from MaxGekk/dsv2-truncate-table-partitions.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 6d3674b)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsPartitionManagement.java (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagementSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryAtomicPartitionTable.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryPartitionTable.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/SupportsPartitionManagementSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java (diff)
Commit 5acc5b8f1eb4306ba09aeb6f1470466cf0e244e9 by dhyun
[SPARK-34323][BUILD] Upgrade zstd-jni to 1.4.8-3

### What changes were proposed in this pull request?

This PR aims to upgrade zstd-jni to 1.4.8-3.

### Why are the changes needed?

This will bring the latest improvements and bug fixes.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass the CIs with the existing tests.

Closes #31430 from williamhyun/zstd-148.

Authored-by: William Hyun <williamhyun3@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 5acc5b8)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
Commit 66f3480f2b6d0cb4c834423541bf8ffcc96ffe49 by yamamuro
[SPARK-34318][SQL] Dataset.colRegex should work with column names and qualifiers which contain newlines

### What changes were proposed in this pull request?

This PR fixes an issue that `Dataset.colRegex` doesn't work with column names or qualifiers which contain newlines.
In the current master, if column names or qualifiers passed to `colRegex` contain newlines, it throws exception.
```
val df = Seq(1, 2, 3).toDF("test\n_column").as("test\n_table")
val col1 = df.colRegex("`tes.*\n.*mn`")
org.apache.spark.sql.AnalysisException: Cannot resolve column name "`tes.*
.*mn`" among (test
_column)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$resolveException(Dataset.scala:272)
  at org.apache.spark.sql.Dataset.$anonfun$resolve$1(Dataset.scala:263)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.Dataset.resolve(Dataset.scala:263)
  at org.apache.spark.sql.Dataset.colRegex(Dataset.scala:1407)
  ... 47 elided

val col2 = df.colRegex("test\n_table.`tes.*\n.*mn`")
org.apache.spark.sql.AnalysisException: Cannot resolve column name "test
_table.`tes.*
.*mn`" among (test
_column)
  at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$resolveException(Dataset.scala:272)
  at org.apache.spark.sql.Dataset.$anonfun$resolve$1(Dataset.scala:263)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.Dataset.resolve(Dataset.scala:263)
  at org.apache.spark.sql.Dataset.colRegex(Dataset.scala:1407)
  ... 47 elided
```

### Why are the changes needed?

Column names and qualifiers can contain newlines but `colRegex` can't work with them, so it's a bug.

### Does this PR introduce _any_ user-facing change?

Yes. users can pass column names and qualifiers even though they contain newlines.

### How was this patch tested?

New test.

Closes #31426 from sarutak/fix-colRegex.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 66f3480)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala (diff)
Commit 5b2ad59f64a9bb065b49acb2e73a6b246a3d8c64 by wenchen
[SPARK-33599][SQL] Restore the assert-like in catalyst/analysis

### What changes were proposed in this pull request?
There exists some `Exception` for assert in fact. Such as:
`throw new IllegalStateException("[BUG] unexpected plan returned by `lookupV2Relation`: " + other)`

This kind `Exception` seems should not put in single dedicated files.

### Why are the changes needed?
Reduce the workload of auditing.

### Does this PR introduce _any_ user-facing change?
'No'.

### How was this patch tested?
Jenkins test.

Closes #31395 from beliefer/SPARK-33599-restore-assert.

Authored-by: gengjiaan <gengjiaan@360.cn>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 5b2ad59)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit ff1b6ecc379e657123d93ebbed26f3ca6a2b0528 by wenchen
[SPARK-33591][SQL][FOLLOW-UP] Revise the version and doc of `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral`

### What changes were proposed in this pull request?

Correct the version of SQL configuration `spark.sql.legacy.parseNullPartitionSpecAsStringLiteral` from 3.2.0 to 3.0.2.
Also, revise the documentation and test case.

### Why are the changes needed?

The release version in https://github.com/apache/spark/pull/31421 was wrong.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests

Closes #31434 from gengliangwang/reviseVersion.

Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: ff1b6ec)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/ShowPartitionsSuiteBase.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
Commit 79515b82f1e8a9c04d7ac9f49095f0d206df5812 by wenchen
[SPARK-34282][SQL][TESTS] Unify v1 and v2 TRUNCATE TABLE tests

### What changes were proposed in this pull request?
1. Move parser tests from `DDLParserSuite` to `TruncateTableParserSuite`.
2. Port DS v1 tests from `DDLSuite` and other test suites to `v1.TruncateTableSuiteBase` and to `v1.TruncateTableSuite`.
3. Add a test for DSv2 `TRUNCATE TABLE` to `v2.TruncateTableSuite`.

### Why are the changes needed?
To improve test coverage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running new test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *TruncateTableSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *CatalogedDDLSuite"
```

Closes #31387 from MaxGekk/unify-truncate-table-tests.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 79515b8)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/TruncateTableSuiteBase.scala
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/TruncateTableSuite.scala
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/TruncateTableSuite.scala
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/TruncateTableParserSuite.scala
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/CachedTableSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/TruncateTableSuite.scala
Commit d308794adb821d301847772de3ee1ef3166aaf5b by sarutak
[SPARK-34263][SQL] Simplify the code for treating unicode/octal/escaped characters in string literals

### What changes were proposed in this pull request?

In the current master, the code for treating unicode/octal/escaped characters in string literals is a little bit complex so let's simplify it.

### Why are the changes needed?

To keep it easy to maintain.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`ParserUtilsSuite` passes.

Closes #31362 from sarutak/refactor-unicode-escapes.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>
(commit: d308794)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ParserUtilsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala (diff)
Commit cadca8d3527d6fe268325db2b83b132dcbb035f5 by viirya
[SPARK-34324][SQL] FileTable should not list TRUNCATE in capabilities by default

### What changes were proposed in this pull request?

This patch proposes to remove `TRUNCATE` from the default `capabilities` list from `FileTable`.

### Why are the changes needed?

The abstract class `FileTable` now lists `TRUNCATE` in its `capabilities`, but `FileTable` does not know if an implementation really supports truncation or not. Specifically, we can check existing `FileTable` implementations including `AvroTable`, `CSVTable`, `JsonTable`, etc. No one implementation really implements `SupportsTruncate` in its writer builder.

### Does this PR introduce _any_ user-facing change?

No, because seems to me `FileTable` is not of user-facing public API.

### How was this patch tested?

Existing unit tests.

Closes #31432 from viirya/SPARK-34324.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: cadca8d)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala (diff)
Commit 63866025d2e4bb89251ba7e29160fb30dd48ddf7 by kabhwan.opensource
[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

### What changes were proposed in this pull request?

This PR proposes to fix the UTs being added in SPARK-31793, so that all things contributing the length limit are properly accounted.

### Why are the changes needed?

The test `DataSourceScanExecRedactionSuite.SPARK-31793: FileSourceScanExec metadata should contain limited file paths` is failing conditionally, depending on the length of the temp directory.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified UTs explain the missing points, which also do the test.

Closes #31435 from HeartSaVioR/SPARK-34326.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(commit: 6386602)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/util/UtilsSuite.scala (diff)
Commit 00120ea53748d84976e549969f43cf2a50778c1c by gurwls223
[SPARK-34212][SQL][FOLLOWUP] Parquet vectorized reader can read decimal fields with a larger precision

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/31357

#31357 added a very strong restriction to the vectorized parquet reader, that the spark data type must exactly match the physical parquet type, when reading decimal fields. This restriction is actually not necessary, as we can safely read parquet decimals with a larger precision. This PR releases this restriction a little bit.

### Why are the changes needed?

To not fail queries unnecessarily.

### Does this PR introduce _any_ user-facing change?

Yes, now users can read parquet decimals with mismatched `DecimalType` as long as the scale is the same and precision is larger.

### How was this patch tested?

updated test.

Closes #31443 from cloud-fan/improve.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 00120ea)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedColumnReader.java (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
Commit 60c71c6d2d38163468c0f428fd1f33015b58c32c by gurwls223
[SPARK-34325][CORE] Remove unused shuffleBlockResolver variable inSortShuffleWriter

### What changes were proposed in this pull request?
Remove unused shuffleBlockResolver variable in SortShuffleWriter.

### Why are the changes needed?
For better code understanding.

### Does this PR introduce _any_ user-facing change?
NO.

### How was this patch tested?
End to End.

Closes #31433 from offthewall123/remove_shuffleBlockResolver_in_SortShuffleWriter.

Authored-by: offthewall123 <dingyu.xu@intel.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 60c71c6)
The file was modifiedcore/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/shuffle/sort/SortShuffleWriterSuite.scala (diff)
Commit 603a7fd7b6cbaa91251b3cb528c04b22b228a1c4 by gurwls223
[SPARK-34308][SQL] Escape meta-characters in printSchema

### What changes were proposed in this pull request?

Similar to SPARK-33690, this PR improves the output layout of `printSchema` for the case column names contain meta characters.
Here is an example.

Before:
```
scala> val df1 = spark.sql("SELECT 'aaa\nbbb\tccc\rddd\feee\bfff\u000Bggg\u0007hhh'")
scala> df1.printSchema
root
|-- aaa
ddd ccc
   eefff
        ggghhh: string (nullable = false)
```

After:
```
scala> df1.printSchema
root
|-- aaa\nbbb\tccc\rddd\feee\bfff\vggg\ahhh: string (nullable = false)
```

### Why are the changes needed?

To avoid breaking the layout of `Dataset#printSchema`

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New test.

Closes #31412 from sarutak/escape-meta-printSchema.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 603a7fd)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/util/SchemaUtils.scala (diff)
Commit e927bf90e0e035a5103e029f2524239ee11c2961 by gurwls223
Revert "[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path"

This reverts commit 63866025d2e4bb89251ba7e29160fb30dd48ddf7.
(commit: e927bf9)
The file was modifiedcore/src/test/scala/org/apache/spark/util/UtilsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala (diff)
Commit fc80a5b8775ef69a2ec981027b334c49beecd77d by wenchen
[SPARK-34307][SQL] TakeOrderedAndProjectExec avoid shuffle if input rdd has single partition

### What changes were proposed in this pull request?
when the child rdd has only one partition, skip the shuffle

### Why are the changes needed?
avoid unnecessary shuffle

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
existing testsuites

Closes #31409 from zhengruifeng/limit_with_single_partition.

Authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: fc80a5b)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala (diff)
Commit a1d4bb3300ff99fbb9f09d276b1d39e0a43813a0 by wenchen
[SPARK-34313][SQL] Migrate ALTER TABLE SET/UNSET TBLPROPERTIES commands to use UnresolvedTable to resolve the identifier

### What changes were proposed in this pull request?

This PR proposes to migrate `ALTER TABLE ... SET/UNSET TBLPROPERTIES` to use `UnresolvedTable` to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in [JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).

### Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent: [SPARK-29900](https://issues.apache.org/jira/browse/SPARK-29900).

### Does this PR introduce _any_ user-facing change?

After this PR, `ALTER TABLE SET/UNSET TBLPROPERTIES` will have a consistent resolution behavior.

### How was this patch tested?

Updated existing tests / added new tests.

Closes #31422 from imback82/v2_alter_table_set_unset_properties.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: a1d4bb3)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)
Commit 89bf2afb3337a44f34009a36cae16dd0ff86b353 by gurwls223
[SPARK-34327][BUILD] Strip passwords from inlining into build information while releasing

### What changes were proposed in this pull request?

Strip passwords from getting inlined into build information, inadvertently.

` https://user:passdomain/foo -> https://domain/foo`

### Why are the changes needed?
This can be a serious security issue, esp. during a release.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Tested by executing the following command on both Mac OSX and Ubuntu.

```
echo url=$(git config --get remote.origin.url |  sed 's|https://\(.*\)\(.*\)|https://\2|')
```

Closes #31436 from ScrapCodes/strip_pass.

Authored-by: Prashant Sharma <prashsh1@in.ibm.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 89bf2af)
The file was modifiedbuild/spark-build-info (diff)
Commit 791de00ddfac17df6418e31d642094e7b7c9431b by wenchen
[SPARK-34182][AVRO] Improve error messages when matching Catalyst-to-Avro schemas

### What changes were proposed in this pull request?
Improve the error messages for incompatibilities between Avro and Catalyst schemas. First, make `AvroSerializer` more similar to `AvroDeserializer` in printing out contextual information such as hierarchical field names. Standardize exception messages in both serializer and deserializer to always include such contextual information, and include a top-level exception which shows the full schemas which were being parsed when the incompatibility was found. Both now print out the hierarchical name for both the Avro and Catalyst fields, since they may be different due to case sensitivity and Avro union handling.

### Why are the changes needed?
The error messages in this type of failure scenario are very lacking in information on the write path (`AvroSerializer`). Below are two examples of messages that provide insufficient information to determine what went wrong (lacking in field names, context about the overall schema structure, etc.).
```
org.apache.spark.sql.avro.IncompatibleSchemaException: Cannot convert Catalyst type IntegerType to Avro type "float".

org.apache.spark.sql.avro.IncompatibleSchemaException: Cannot convert Catalyst type StructType(StructField(bar,IntegerType,true)) to Avro type {"type":"record","name":"test","fields":[{"name":"NOTbar","type":["null","int"],"default":null}]}.
```
The error messages currently existing in `AvroDeserializer` are much better, but still not very internally consistent, and it would be better if they were consistent with the newly added exception messages in `AvroSerializer`.

### Does this PR introduce _any_ user-facing change?
Error messages when there are incompatibilities between Avro and Catalyst schemas will be greatly improved on when writing Avro data using the `avroSchema` option, a little bit improved when reading Avro data, and much more consistent between the two.

Below is an example of a new message. See `AvroSerdeSuite` for more examples.
```
org.apache.spark.sql.avro.IncompatibleSchemaException: Cannot convert Catalyst type STRUCT<`foo`: STRUCT<`bar`: INT>> to Avro type {"type":"record","name":"top","fields":[{"name":"foo","type":"int"}]}
at org.apache.spark.sql.avro.AvroSerializer.liftedTree1$1(AvroSerializer.scala:83)
...
Caused by: org.apache.spark.sql.avro.IncompatibleSchemaException: Cannot convert Catalyst field 'foo' to Avro field 'foo' because schema is incompatible (sqlType = STRUCT<`bar`: INT>, avroType = "int")
at org.apache.spark.sql.avro.AvroSerializer.newConverter(AvroSerializer.scala:230)
...
```

### How was this patch tested?
New unit test suite, `AvroSerdeSuite`, was added to test corner cases on `AvroSerializer` and `AvroDeserializer` and verify that the exception messages are as expected. Existing tests in `AvroSuite` also continue to pass, with modifications in places where assertions were made about the exceptions that would be thrown.

Closes #31333 from xkrogen/xkrogen-SPARK-34182-avro-serde-errormessages.

Authored-by: Erik Krogen <xkrogen@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 791de00)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroUtils.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/TestUtils.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala (diff)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroCatalystDataConversionSuite.scala (diff)
The file was addedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroSerdeSuite.scala
Commit 76a7fca4e123c62097c8a35ef827fba7143d77d9 by wenchen
[SPARK-34335][SQL] Support referencing subquery with column aliases by table alias

### What changes were proposed in this pull request?
This PR adds support for referencing subquery with column aliases by its table alias.

Before
```sql
-- AnalysisException: cannot resolve '`t.c1`' given input columns: [c1, c2];
SELECT t.c1, t.c2 FROM (SELECT 1 AS a, 1 AS b) t(c1, c2)
```

After:
```sql
-- [(1, 1)]
SELECT t.c1, t.c2 FROM (SELECT 1 AS a, 1 AS b) t(c1, c2)
```

### Why are the changes needed?
To allow users to reference subquery with column aliases by its table alias.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
Added parser tests and SQL query tests.

Closes #31444 from allisonwang-db/spark-34335.

Authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 76a7fca)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/table-aliases.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/grouping_set.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/table-aliases.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala (diff)
Commit 7bfb4a4642bf03fa69e8e23f2ad15cf713d30473 by wenchen
[SPARK-34304][SQL] Remove view checks in v1 alter table commands

### What changes were proposed in this pull request?
Remove the check `verifyAlterTableType()` from the following v1 commands:
- AlterTableAddPartitionCommand
- AlterTableDropPartitionCommand
- AlterTableRenamePartitionCommand
- AlterTableRecoverPartitionsCommand
- AlterTableSerDePropertiesCommand
- AlterTableSetLocationCommand

The check is not needed any more after migration on new resolution framework, see SPARK-29900.

Also new tests were added to:
- AlterTableAddPartitionSuiteBase
- AlterTableDropPartitionSuiteBase
- AlterTableRenamePartitionSuiteBase
- v1/AlterTableRecoverPartitionsSuite

and removed duplicate tests from `SQLViewSuite` and `HiveDDLSuite`.

The tests for `AlterTableSerDePropertiesCommand`/`AlterTableSetLocationCommand` exist in SQLViewSuite` and `HiveDDLSuite`, and they can be ported to unified tests after SPARK-34305 and SPARK-34332.

The `ALTER TABLE .. CHANGE COLUMN` command accepts only tables too, so, the check can be removed after migration on new resolution framework, SPARK-34302.

### Why are the changes needed?
To improve code maintenance by removing dead code.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
1. Added new tests to unified test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableAddPartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableDropPartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRenamePartitionSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *AlterTableRecoverPartitionsSuite"
```
2. Run the modified test suites:
```
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *SQLViewSuite"
$ build/sbt -Phive-2.3 -Phive-thriftserver "test:testOnly *HiveDDLSuite"
```

Closes #31405 from MaxGekk/remove-view-check-in-alter-table.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 7bfb4a4)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/AlterTableAddPartitionSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableRenamePartitionSuiteBase.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterTableDropPartitionSuiteBase.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
Commit 3d7e1397d62d28a31cc5e9602b03e26962efdaac by wenchen
[SPARK-34317][SQL][FOLLOW-UP] Use relationTypeMismatchHint when UnresolvedTable is resolved to a temp view

### What changes were proposed in this pull request?

This is a follow up to #31424, and proposes to use `UnresolvedTable.relationTypeMismatchHint` when `UnresolvedTable` is resolved to a temp view.

### Why are the changes needed?

This change utilizes the type mismatch hint when a relation is resolved to a temp view when a table is expected.

For example, `ALTER TABLE tmpView SET TBLPROPERTIES ('p' = 'an')` will now include `Please use ALTER VIEW instead.` in the exception message: `tmpView is a temp view. 'ALTER TABLE ... SET TBLPROPERTIES' expects a table. Please use ALTER VIEW instead.`

### Does this PR introduce _any_ user-facing change?

Yes, adds the hint in the exception message.

### How was this patch tested?

Update existing tests to include the hint.

Closes #31452 from imback82/followup_SPARK-34317.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 3d7e139)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
Commit 8e28218106a1520548434c18528754feb00e8e93 by dhyun
[SPARK-34340][CORE] Support ZSTD JNI BufferPool

### What changes were proposed in this pull request?

This PR aims two goals.
1. Support ZSTD JNI BufferPool feature by adding a new configuration, `spark.io.compression.zstd.bufferPool.enabled`, for Apache Spark 3.2.0.
2. Make Spark independent from ZSTD JNI library's default buffer pool policy change.

### Why are the changes needed?

ZSTD JNI library has different behaviors across its versions.

| Version | Description | Commit |
| ---------- | --------------- | ----------- |
| v1.4.5-7 | `BufferPool` was added and used it by default | https://github.com/luben/zstd-jni/commit/4f55c8917216518d7390eb0624bee3bf0e2c491a |
| v1.4.5-8 | `RecyclingBufferPool` was added and `BufferPool` became an interface to allow custom BufferPool implementation | https://github.com/luben/zstd-jni/commit/dd2588edd302823fa534de1516e4ae6d6dc6417e |
| v1.4.7+ | `NoPool` is used by default and user should specify buffer pool explicitly | https://github.com/luben/zstd-jni/commit/f7c8279bc162c8c8b1964948d0f3b309ad715311 |

### Does this PR introduce _any_ user-facing change?

No, the default value (`false`) is consistent with the AS-IS ZSTD-JNI library's default buffer pool.

### How was this patch tested?

Pass the CIs with the updated UT.

Closes #31453 from dongjoon-hyun/SPARK-34340.

Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 8e28218)
The file was modifiedcore/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/io/CompressionCodec.scala (diff)
Commit 44dcf0062c41ff4230096bee800d9b4f70c424ce by kabhwan.opensource
[SPARK-34326][CORE][SQL] Fix UTs added in SPARK-31793 depending on the length of temp path

### What changes were proposed in this pull request?

This PR proposes to fix the UTs being added in SPARK-31793, so that all things contributing the length limit are properly accounted.

### Why are the changes needed?

The test `DataSourceScanExecRedactionSuite.SPARK-31793: FileSourceScanExec metadata should contain limited file paths` is failing conditionally, depending on the length of the temp directory.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified UTs explain the missing points, which also do the test.

Closes #31449 from HeartSaVioR/SPARK-34326-v2.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
(commit: 44dcf00)
The file was modifiedcore/src/test/scala/org/apache/spark/util/UtilsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala (diff)
Commit 42c32e8a3ad126bd11bdb57c7e06b170a6840383 by gurwls223
[SPARK-34341][BUILD] Skip zinc related operations on aarch64

### What changes were proposed in this pull request?

Skip the zinc related  installation operations on aarch64 platform.

### Why are the changes needed?

The standalone zinc is not supported well on aarch64, so that the error ouput, `cannot execute binary file: Exec format error` dumped after build/mvn is called.

This patch try to skip the zinc installation and related operations on aarch64 to make sure the error output doesn't print again on aarch64.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

simple cmd: `build/mvn -v`, see no error ouput again in aarch64, and nothing changed on x86

- on AArch64 Ubuntu
```
rootyikun-arm:~/dev/spark# uname -a
Linux yikun-arm 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12 10:36:10 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux
rootyikun-arm:~/dev/spark# uname -m
aarch64
rootyikun-arm:~/dev/spark# build/mvn -v
Using `mvn` from path: /root/dev/spark/build/apache-maven-3.6.3/bin/mvn
Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /root/dev/spark/build/apache-maven-3.6.3
Java version: 1.8.0_222, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-arm64/jre
Default locale: en, platform encoding: UTF-8
OS name: "linux", version: "4.15.0-70-generic", arch: "aarch64", family: "unix"
```
- on x86 Mac OS
```
# uname -a
Darwin MacBook.local 19.6.0 Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64 x86_64
# uname -m
x86_64
# build/mvn -v
Using `mvn` from path: /Users/jiangyikun/huawei/apache-maven-3.6.3/bin/mvn
Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /Users/jiangyikun/huawei/apache-maven-3.6.3
Java version: 1.8.0_221, vendor: Oracle Corporation, runtime: /Library/Java/JavaVirtualMachines/jdk1.8.0_221.jdk/Contents/Home/jre
Default locale: zh_CN, platform encoding: UTF-8
OS name: "mac os x", version: "10.15.7", arch: "x86_64", family: "mac"
```
- on x86 Ubuntu
```
rootyikun-x86:~/spark# uname -a
Linux yikun-x86 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
rootyikun-x86:~/spark# uname -m
x86_64
rootyikun-x86:~/spark# ./build//mvn  -v
Using `mvn` from path: /root/spark/build/apache-maven-3.6.3/bin/mvn
Apache Maven 3.6.3 (cecedd343002696d0abb50b32b541b8a6ba2883f)
Maven home: /root/spark/build/apache-maven-3.6.3
Java version: 1.8.0_275, vendor: Private Build, runtime: /usr/lib/jvm/java-8-openjdk-amd64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.4.0-58-generic", arch: "amd64", family: "unix"
```

Closes #31454 from Yikun/zinc_skip_aarch64.

Authored-by: Yikun Jiang <yikunkero@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 42c32e8)
The file was modifiedbuild/mvn (diff)
Commit 6a194f197819e2970f619ed92321e0a0b716bf37 by gurwls223
[SPARK-33212][FOLLOW-UP][BUILD] Uses provided properties for Hadoop client dependencies in root pom

### What changes were proposed in this pull request?

This PR is a followup of https://github.com/apache/spark/pull/30701. It uses properties of `hadoop-client-api.artifact`, `hadoop-client-runtime.artifact` and `hadoop-client-minicluster.artifact` explicitly to set the dependencies and versions.

Otherwise, it is logically incorrect. For example, if you build with Hadoop 2, this dependency becomes `hadoop-client-api:2.7.4` internally, which does not exist in Hadoop 2 (https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client-api).

### Why are the changes needed?

- To fix the logical incorrectness.
- It fixes a potential issue: this actually caused an issue when `generate-sources` plugin is used together with Hadoop 2 by default, which attempts to pull 2.7.4 of `hadoop-client-api`, `hadoop-client-runtime` and `hadoop-client-minicluster` for whatever reason.

### Does this PR introduce _any_ user-facing change?

No for users and dev. It's more a cleanup.

### How was this patch tested?

Manually checked the dependencies are correctly placed.

Closes #31467 from HyukjinKwon/SPARK-33212.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 6a194f1)
The file was modifiedpom.xml (diff)
Commit 7675582dabf040935c598df2ec6d3d89fcda6331 by wenchen
[SPARK-34357][SQL] Map JDBC SQL TIME type to TimestampType with time portion fixed regardless of timezone

### What changes were proposed in this pull request?

Due to user-experience (confusing to Spark users - java.sql.Time using milliseconds vs Spark using microseconds; and user losing useful functions like hour(), minute(), etc on the column), we have decided to revert back to use TimestampType but this time we will enforce the hour to be consistently across system timezone (via offset manipulation) and date part fixed to zero epoch.

Full Discussion with Wenchen Fan Wenchen Fan regarding this ticket is here https://github.com/apache/spark/pull/30902#discussion_r569186823

### Why are the changes needed?

Revert and improvement to sql.Time handling

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests and integration tests

Closes #31473 from saikocat/SPARK-34357.

Authored-by: Hoa <hoameomu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 7675582)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala (diff)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala (diff)
Commit fbe726f5b17f1deaa96cbbaccbbad16ef2450acb by gurwls223
[SPARK-34339][CORE][SQL] Expose the number of total paths in Utils.buildLocationMetadata()

### What changes were proposed in this pull request?

This PR proposes to expose the number of total paths in Utils.buildLocationMetadata(), with relaxing space usage a bit (around 10+ chars).

Suppose the first 2 of 5 paths are only fit to the threshold, the outputs between the twos are below:

* before the change: `[path1, path2]`
* after the change: `(5 paths)[path1, path2, ...]`

### Why are the changes needed?

SPARK-31793 silently truncates the paths hence end users can't indicate how many paths are truncated, and even more, whether paths are truncated or not.

### Does this PR introduce _any_ user-facing change?

Yes, the location metadata will also show how many paths are truncated (not shown), instead of silently truncated.

### How was this patch tested?

Modified UTs

Closes #31464 from HeartSaVioR/SPARK-34339.

Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: fbe726f)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/util/Utils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/util/UtilsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala (diff)
Commit 961c85166a259cd4e5343e54f27228767a841a88 by gurwls223
[SPARK-34346][CORE][SQL] io.file.buffer.size set by spark.buffer.size will override by loading hive-site.xml accidentally may cause perf regression

### What changes were proposed in this pull request?

In many real-world cases, when interacting with hive catalog through Spark SQL, users may just share the `hive-site.xml` for their hive jobs and make a copy to `SPARK_HOME`/conf w/o modification. In Spark, when we generate Hadoop configurations, we will use `spark.buffer.size(65536)` to reset `io.file.buffer.size(4096)`. But when we load the hive-site.xml, we may ignore this behavior and reset `io.file.buffer.size` again according to `hive-site.xml`.

1. The configuration priority for setting Hadoop and Hive config here is not right, while literally, the order should be `spark > spark.hive > spark.hadoop > hive > hadoop`

2. This breaks `spark.buffer.size` congfig's behavior for tuning the IO performance w/ HDFS if there is an existing `io.file.buffer.size` in hive-site.xml

### Why are the changes needed?

bugfix for configuration behavior and fix performance regression by that behavior change

### Does this PR introduce _any_ user-facing change?

this pr restores silent user face change

### How was this patch tested?

new tests

Closes #31460 from yaooqinn/SPARK-34346.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 961c851)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/internal/SharedStateSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/SparkContextSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala (diff)
The file was addedcore/src/test/resources/core-site.xml
The file was addedcore/src/test/resources/hive-site.xml
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala (diff)
Commit 5f1af69cbf3e40045ed93b2ab2e98b07992f05e0 by ruifengz
[MINOR][ML] Param Validation should throw IllegalArgumentException

### What changes were proposed in this pull request?
Param Validation throw `IllegalArgumentException`

### Why are the changes needed?
Param Validation should throw `IllegalArgumentException` instead of `IllegalStateException`

### Does this PR introduce _any_ user-facing change?
Yes, the type of exception changed

### How was this patch tested?
existing testsuites

Closes #31469 from zhengruifeng/mllib_exceptions.

Authored-by: Ruifeng Zheng <ruifengz@foxmail.com>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(commit: 5f1af69)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/Selector.scala (diff)
Commit 361d702f8d0c25bccf1231498a593e332342894f by wenchen
[SPARK-34359][SQL] Add a legacy config to restore the output schema of SHOW DATABASES

### What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/26006

In #26006 , we merged the v1 and v2 SHOW DATABASES/NAMESPACES commands, but we missed a behavior change that the output schema of SHOW DATABASES becomes different.

This PR adds a legacy config to restore the old schema, with a migration guide item to mention this behavior change.

### Why are the changes needed?

Improve backward compatibility

### Does this PR introduce _any_ user-facing change?

No (the legacy config is false by default)

### How was this patch tested?

a new test

Closes #31474 from cloud-fan/command-schema.

Lead-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Co-authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 361d702)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowNamespacesSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
Commit 55399eb3b297ff4a286d785586168c3852f9dac4 by yamamuro
[SPARK-34343][SQL][TESTS] Add missing test for some non-array types in PostgreSQL

### What changes were proposed in this pull request?

This PR added tests for some non-array types in PostgreSQL.
PostgreSQL supports wide range of types (https://www.postgresql.org/docs/13/datatype.html) and `PostgresIntegrationSuite` contains tests for some types but ones for the following types are missing.

* bit varying
* point
* line
* lseg
* box
* path
* polygon
* circle
* pg_lsn
* macaddr
* macaddr8
* numeric
* pg_snapshot
* real
* time
* timestamp
* tsquery
* tsvector
* txid_snapshot
* xml

NOTE: Handling money types can be buggy so this PR doesn't add tests for those types.

### Why are the changes needed?

To ensure those types work with Spark well.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Extended `PostgresIntegrationSuite`.

Closes #31456 from sarutak/test-for-some-types-postgresql.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 55399eb)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/PostgresIntegrationSuite.scala (diff)
Commit ee11a8f4079680d71efbc69fb552e67d91d1eb24 by wenchen
[SPARK-34371][SQL][TESTS] Run the datetime rebasing tests for Parquet datasource v1 and v2

### What changes were proposed in this pull request?
Extract the date/timestamps rebasing tests from `ParquetIOSuite` to `ParquetRebaseDatetimeSuite` to run them for both DSv1 and DSv2 implementations of Parquet datasource.

### Why are the changes needed?
To improve test coverage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
By running new test suites:
```
$ build/sbt "sql/test:testOnly *ParquetRebaseDatetimeV2Suite"
$ build/sbt "sql/test:testOnly *ParquetRebaseDatetimeV1Suite"
$ build/sbt "sql/test:testOnly *ParquetIOSuite"
```

Closes #31478 from MaxGekk/rebase-tests-dsv1-and-dsv2.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: ee11a8f)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRebaseDatetimeSuite.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala (diff)
Commit 989eb6884d77226ab4f494a4237e09aea54a032d by wenchen
[SPARK-34331][SQL] Speed up DS v2 metadata col resolution

### What changes were proposed in this pull request?

This is a follow-up of https://github.com/apache/spark/pull/28027

https://github.com/apache/spark/pull/28027 added a DS v2 API that allows data sources to produce metadata/hidden columns that can only be seen when it's explicitly selected. The way we integrate this API into Spark is:
1. The v2 relation gets normal output and metadata output from the data source, and the metadata output is excluded from the plan output by default.
2. column resolution can resolve `UnresolvedAttribute` with metadata columns, even if the child plan doesn't output metadata columns.
3. An analyzer rule searches the query plan, trying to find a node that has missing inputs. If such node is found, transform the sub-plan of this node, and update the v2 relation to include the metadata output.

The analyzer rule in step 3 brings a perf regression, for queries that do not read v2 tables at all. This rule will calculate `QueryPlan.inputSet` (which builds an `AttributeSet` from outputs of all children) and `QueryPlan.missingInput` (which does a set exclusion and creates a new `AttributeSet`) for every plan node in the query plan. In our benchmark, the TPCDS query compilation time gets increased by more than 10%

This PR proposes a simple way to improve it: we add a special metadata entry to the metadata attribute, which allows us to quickly check if a plan needs to add metadata columns: we just check all the references of this plan, and see if the attribute contains the special metadata entry, instead of calculating `QueryPlan.missingInput`.

This PR also fixes one bug: we should not change the final output schema of the plan, if we only use metadata columns in operators like filter, sort, etc.

### Why are the changes needed?

Fix perf regression in SQL query compilation, and fix a bug.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Run `org.apache.spark.sql.TPCDSQuerySuite`, before this PR, `AddMetadataColumns` is the top 4 rule ranked by running time
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 407641
Total time: 47.257239779 seconds

Rule                                  Effective Time / Total Time                     Effective Runs / Total Runs

OptimizeSubqueries                      4157690003 / 8485444626                         49 / 2778
Analyzer$ResolveAggregateFunctions      1238968711 / 3369351761                         49 / 2141
ColumnPruning                           660038236 / 2924755292                          338 / 6391
Analyzer$AddMetadataColumns             0 / 2918352992                                  0 / 2151
```
after this PR:
```
Analyzer$AddMetadataColumns             0 / 122885629                                   0 / 2151
```
This rule is 20 times faster and is negligible to the total compilation time.

This PR also add new tests to verify the bug fix.

Closes #31440 from cloud-fan/metadata-col.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 989eb68)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Implicits.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
Commit 76baaf746502b50f6f466d53bdce2b5a9a35b2df by wenchen
[SPARK-32985][SQL] Decouple bucket scan and bucket filter pruning for data source v1

### What changes were proposed in this pull request?

As a followup from discussion in https://github.com/apache/spark/pull/29804#discussion_r493100510 . Currently in data source v1 file scan `FileSourceScanExec`, [bucket filter pruning will only take effect with bucket table scan](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L542 ). However this is unnecessary, as bucket filter pruning can also happen if we disable bucketed table scan. Read files with bucket hash partitioning, and bucket filter pruning are two orthogonal features, and do not need to couple together.

### Why are the changes needed?

This help query leverage the benefit from bucket filter pruning to save CPU/IO to not read unnecessary bucket files, and do not bound by bucket table scan when the parallelism of tasks is a concern.

In addition, this also resolves the issue to reduce number of tasks launched for simple query with bucket column filter - SPARK-33207, because with bucket scan, we launch # of tasks to equal to # of buckets, and this is unnecessary.

### Does this PR introduce _any_ user-facing change?

Users will notice query to start pruning irrelevant files for reading bucketed table, when disabling bucketing. If the input data does not follow spark data source bucketing convention, by default exception will be thrown and query will be failed. The exception can be bypassed with setting config `spark.sql.files.ignoreCorruptFiles` to true.

### How was this patch tested?

Added unit test in `BucketedReadSuite.scala` to make all existing unit tests for bucket filter work with this PR.

Closes #31413 from c21/bucket-pruning.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 76baaf7)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/DisableUnnecessaryBucketedScanSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/DisableUnnecessaryBucketedScan.scala (diff)
Commit 1f4135c4bd82001abae85d27b7822283d1a5b6b0 by yamamuro
[SPARK-34350][SQL][TESTS] replace withTimeZone defined in OracleIntegrationSuite with DateTimeTestUtils.withDefaultTimeZone

### What changes were proposed in this pull request?

This PR replaces `withTimeZone` defined and used in `OracleIntegrationSuite` with `DateTimeTestUtils.withDefaultTimeZone` which is defined as a utility method.

### Why are the changes needed?

Both methods are semantically the same so it might be better to use the utility one.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

`OracleIntegrationSuite` passes.

Closes #31465 from sarutak/oracle-timezone-util.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(commit: 1f4135c)
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala (diff)
Commit e614f34c7a538b1f2c59616689eaea95af85fd54 by dhyun
[SPARK-26836][SQL] Supporting Avro schema evolution for partitioned Hive tables with "avro.schema.literal"

### What changes were proposed in this pull request?

Before this PR for a partitioned Avro Hive table when the SerDe is configured to read the partition data
the table level properties were overwritten by the partition level properties.

This PR changes this ordering by giving table level properties higher precedence  thus when a new evolved schema
is set for the table this new schema will be used to read the partition data and not the original schema which was used for writing the data.

This new behavior is consistent with Apache Hive.
See the example used in the unit test `SPARK-26836: support Avro schema evolution`, in Hive this results in:

```
0: jdbc:hive2://<IP>:10000> select * from t;
INFO  : Compiling command(queryId=hive_20210111141102_7a6349d0-f9ed-4aad-ac07-b94b44de2394): select * from t
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:t.col1, type:string, comment:null), FieldSchema(name:t.col2, type:string, comment:null), FieldSchema(name:t.ds, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20210111141102_7a6349d0-f9ed-4aad-ac07-b94b44de2394); Time taken: 0.098 seconds
INFO  : Executing command(queryId=hive_20210111141102_7a6349d0-f9ed-4aad-ac07-b94b44de2394): select * from t
INFO  : Completed executing command(queryId=hive_20210111141102_7a6349d0-f9ed-4aad-ac07-b94b44de2394); Time taken: 0.013 seconds
INFO  : OK
+---------------+-------------+-------------+
|    t.col1     |   t.col2    |    t.ds     |
+---------------+-------------+-------------+
| col1_default  | col2_value  | 1981-01-07  |
| col1_value    | col2_value  | 1983-04-27  |
+---------------+-------------+-------------+
2 rows selected (0.159 seconds)
```

### Why are the changes needed?

Without this change the old schema would be used. This can use a correctness issue when the new schema introduces
a new field with a default value (following the rules of schema evolution) before an existing field.
In this case the rows coming from the partition where the old schema was used will **contain values in wrong column positions**.

For example check the attached unit test `SPARK-26836: support Avro schema evolution`

Without this fix the result of the select on the table would be:

```
+----------+----------+----------+
|      col1|      col2|        ds|
+----------+----------+----------+
|col2_value|      null|1981-01-07|
|col1_value|col2_value|1983-04-27|
+----------+----------+----------+

```

With this fix:

```
+------------+----------+----------+
|        col1|      col2|        ds|
+------------+----------+----------+
|col1_default|col2_value|1981-01-07|
|  col1_value|col2_value|1983-04-27|
+------------+----------+----------+
```

### Does this PR introduce _any_ user-facing change?

Just fixes the value errors.
When a new column is introduced even to the last position then instead of 'null' the given default will be used.

### How was this patch tested?

This was tested with the unit tested included to the PR.
And manually on Apache Spark / Hive.

Closes #31133 from attilapiros/SPARK-26836.

Authored-by: “attilapiros” <piros.attila.zsolt@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: e614f34)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)