Changes

Summary

  1. [SPARK-35963][SQL] Rename TimestampWithoutTZType to TimestampNTZType (commit: f249277) (details)
  2. [SPARK-35618][SQL] Resolve star expressions in subqueries using outer (commit: f281736) (details)
  3. [SPARK-35685][SQL] Prompt recreating the view when there is an (commit: 0c34b96) (details)
  4. [SPARK-35965][DOCS] Add doc for ORC nested column vectorized reader (commit: 3c3193c) (details)
  5. [SPARK-35966][SQL] Port HIVE-17952: Fix license headers to avoid (commit: 6699f76) (details)
  6. [SPARK-35686][SQL] Not allow using auto-generated alias when creating (commit: 3c68343) (details)
  7. [SPARK-35971][SQL] Rename the type name of TimestampNTZType as (commit: 3acc4b9) (details)
  8. [SPARK-35969][K8S] Make the pod prefix more readable and tallied with (commit: 94c1e3c) (details)
  9. [SPARK-35756][SQL] unionByName supports struct having same col names but (commit: ca12176) (details)
  10. [SPARK-35975][SQL] New configuration `spark.sql.timestampType` for the (commit: a643076) (details)
  11. [SPARK-35779][SQL] Dynamic filtering for Data Source V2 (commit: fceabe2) (details)
  12. [SPARK-35339][PYTHON] Improve unit tests for data-type-based basic (commit: 95d9494) (details)
  13. [SPARK-35897][SS] Support user defined initial state with (commit: 47485a3) (details)
  14. [SPARK-35955][SQL] Check for overflow in Average in ANSI mode (commit: 1fda011) (details)
  15. [SPARK-35825][INFRA][FOLLOWUP] Increase it in build/mvn script (commit: 79a6e00) (details)
  16. [SPARK-35785][SS] Cleanup support for RocksDB instance (commit: ca6acf0) (details)
  17. [SPARK-35968][SQL] Make sure partitions are not too small in AQE (commit: 0c9c8ff) (details)
  18. [SPARK-35981][PYTHON][TEST] Use check_exact=False to loosen the check (commit: 7769644) (details)
  19. [SPARK-35992][BUILD] Upgrade ORC to 1.6.9 (commit: c55b9fd) (details)
  20. [SPARK-35994][INFRA] Publish snapshot from branch-3.2 (commit: dcc4057) (details)
  21. [SPARK-35785][SS][FOLLOWUP] Ignore concurrent update and cleanup test (commit: a6e00ee) (details)
  22. [SPARK-35990][BUILD] Remove avro-sbt plugin dependency (commit: 6c4616b) (details)
  23. [SPARK-35996][BUILD] Setting version to 3.3.0-SNAPSHOT (commit: f9f9568) (details)
Commit f2492772baf1d00d802e704f84c22a9c410929e9 by wenchen
[SPARK-35963][SQL] Rename TimestampWithoutTZType to TimestampNTZType

### What changes were proposed in this pull request?

Rename TimestampWithoutTZType to TimestampNTZType

### Why are the changes needed?

The time name of `TimestampWithoutTZType` is verbose. Rename it as `TimestampNTZType` so that
1. it is easier to read and type.
2. As we have the function to_timestamp_ntz, this makes the names consistent.
3. We will introduce a new SQL configuration `spark.sql.timestampType` for the default timestamp type. The configuration values can be "TIMESTMAP_NTZ" or "TIMESTMAP_LTZ" for simplicity.

### Does this PR introduce _any_ user-facing change?

No, the new timestamp type is not released yet.

### How was this patch tested?

Run `git grep -i WithoutTZ` and there is no result.
And Ci tests.

Closes #33167 from gengliangwang/rename.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f249277)
The file was removedsql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampWithoutTZType.scala
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/datetime.sql (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/UDFSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/RowEncoderSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/TimestampFormatter.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/SerializerBuildHelper.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SpecificInternalRow.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/DateExpressionsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/SpecializedGettersReader.java (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/InterpretedUnsafeProjection.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-functions/sql-expression-schema.md (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/CatalystTypeConvertersSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/types/DataTypes.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/types/DataTypeTestUtils.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AnsiCastSuiteBase.scala (diff)
The file was addedsql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala (diff)
Commit f281736fbd84ea18cb6a02d6fd9a31b9541a3a0b by wenchen
[SPARK-35618][SQL] Resolve star expressions in subqueries using outer query plans

### What changes were proposed in this pull request?
This PR supports resolving star expressions in subqueries using outer query plans.

### Why are the changes needed?
Currently, Spark can only resolve star expressions using the inner query plan when resolving subqueries. Instead, it should also be able to resolve star expressions using the outer query plans.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Unit tests

Closes #32787 from allisonwang-db/spark-35618-resolve-star-in-subquery.

Lead-authored-by: allisonwang-db <allison.wang@databricks.com>
Co-authored-by: allisonwang-db <66282705+allisonwang-db@users.noreply.github.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f281736)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveSubquerySuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/join-lateral.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/columnresolution-negative.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/join-lateral.sql (diff)
Commit 0c34b9654187bae0638ed964efd974fc1888f46f by wenchen
[SPARK-35685][SQL] Prompt recreating the view when there is an incompatible schema issue

### What changes were proposed in this pull request?
If the user creates a view in 2.4 and reads it in 3.1/3.2, there will be an incompatible schema issue.
So this PR adds a view ddl in the error message to prompt the user recreating the view to fix the
incompatible issue.
For example:
```sql
-- create view in 2.4
CREATE TABLE IF NOT EXISTS t USING parquet AS SELECT '1' as a, '20210420' as b"
CREATE OR REPLACE VIEW v AS SELECT CAST(t.a AS INT), to_date(t.b, 'yyyyMMdd') FROM t
-- select view in master
SELECT * FROM v
```
Then we will get below error:
```
cannot resolve '`to_date(spark_catalog.default.t.b, 'yyyyMMdd')`' given input columns: [a, to_date(b, yyyyMMdd)];
```

### Why are the changes needed?
Improve the error message

### Does this PR introduce _any_ user-facing change?
Yes, the error message will change

### How was this patch tested?
newly added test case

Closes #32831 from linhongliu-db/SPARK-35685-view-compatible.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 0c34b96)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala (diff)
Commit 3c3193c0fcee532ca13e33e84abf2bb9abe4f7a2 by gurwls223
[SPARK-35965][DOCS] Add doc for ORC nested column vectorized reader

### What changes were proposed in this pull request?

In https://issues.apache.org/jira/browse/SPARK-34862, we added support for ORC nested column vectorized reader, and it is disabled by default for now. So we would like to add the user-facing documentation for it, and user can opt-in to use it if they want.

### Why are the changes needed?

To make user be aware of the feature, and let them know the instruction to use the feature.

### Does this PR introduce _any_ user-facing change?

Yes, the documentation itself.

### How was this patch tested?

Manually check generated documentation as below.

<img width="1153" alt="Screen Shot 2021-07-01 at 12 19 40 AM" src="https://user-images.githubusercontent.com/4629931/124083422-b0724280-da02-11eb-93aa-a25d118ba56e.png">

<img width="1147" alt="Screen Shot 2021-07-01 at 12 19 52 AM" src="https://user-images.githubusercontent.com/4629931/124083442-b5cf8d00-da02-11eb-899f-827d55b8558d.png">

Closes #33168 from c21/orc-doc.

Authored-by: Cheng Su <chengsu@fb.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 3c3193c)
The file was modifieddocs/sql-data-sources-orc.md (diff)
Commit 6699f76fe2afa7f154b4ba424f3fe048fcee46df by yao
[SPARK-35966][SQL] Port HIVE-17952: Fix license headers to avoid dangling javadoc warnings

<!--
Thanks for sending a pull request!  Here are some tips for you:
  1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
  2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
  3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
  4. Be sure to keep the PR description updated to reflect all changes.
  5. Please write your PR title to summarize what this PR proposes.
  6. If possible, provide a concise example to reproduce the issue for a faster review.
  7. If you want to add a new configuration, please read the guideline first for naming configurations in
     'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
  8. If you want to add or modify an error message, please read the guideline first:
     https://spark.apache.org/error-message-guidelines.html
-->

### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
  1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
  2. If you fix some SQL features, you can provide some references of other DBMSes.
  3. If there is design documentation, please add the link.
  4. If there is a discussion in the mailing list, please add the link.
-->

Port HIVE-17952: Fix license headers to avoid dangling javadoc warnings

### Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, you can clarify why it is a bug.
-->
Fix license headers

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such as the documentation fix.
If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
If no, write 'No'.
-->
no

### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why it was difficult to add.
-->
pass rat check

Closes #33169 from yaooqinn/SPARK-35966.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
(commit: 6699f76)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/ColumnValue.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/PlainSaslHelper.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/PasswdAuthenticationProvider.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/HandleIdentifier.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/HiveSQLException.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/TypeQualifiers.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/TableTypeMappingFactory.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadFactoryWithGarbageCleanup.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/SaslQOP.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/GetInfoType.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/OperationStatus.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/PlainSaslServer.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/HiveTableTypeMapping.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/SessionManager.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/ServiceException.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/AuthenticationProviderFactory.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImplwithUGI.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/OperationState.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpServlet.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/RowSet.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/Service.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetTablesOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/FetchOrientation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/OperationType.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionHookContext.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/TableTypeMapping.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/MetadataOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/FetchType.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/CustomAuthenticationProviderImpl.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/ColumnDescriptor.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/Handle.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/CLIService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/CLIServiceUtils.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/Operation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/SQLOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSetIpAddressProcessor.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/server/ThreadWithGarbageCleanup.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftCLIServiceClient.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/ServiceUtils.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/ClassicTableTypeMapping.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/RowBasedSet.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/ICLIService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/server/HiveServer2.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/OperationHandle.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetCrossReferenceOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/ServiceStateChangeListener.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/KerberosSaslHelper.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/ServiceOperations.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/BreakableService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftBinaryCLIService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/PamAuthenticationProviderImpl.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/AnonymousAuthenticationProviderImpl.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/TypeDescriptor.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/SessionHandle.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/thrift/ThriftHttpCLIService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/LdapAuthenticationProviderImpl.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/GetInfoValue.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/FilterService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/AbstractService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSession.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HiveAuthFactory.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/RowSetFactory.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionProxy.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionHookContextImpl.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetPrimaryKeysOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/ExecuteStatementOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/CLIServiceClient.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionBase.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/CookieSigner.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/CompositeService.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthenticationException.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/ColumnBasedSet.java (diff)
The file was modifiedsql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/TableSchema.java (diff)
Commit 3c683434fa3f041000af363fdc6bdaddf4e1fb2a by wenchen
[SPARK-35686][SQL] Not allow using auto-generated alias when creating view

### What changes were proposed in this pull request?
As described in  #32831, Spark has compatible issues when querying a view created by an
older version. The root cause is that Spark changed the auto-generated alias name. To avoid
this in the future, we could ask the user to specify explicit column names when creating
a view.

### Why are the changes needed?
Avoid compatible issue when querying a view

### Does this PR introduce _any_ user-facing change?
Yes. User will get error when running query below after this change
```
CREATE OR REPLACE VIEW v AS SELECT CAST(t.a AS INT), to_date(t.b, 'yyyyMMdd') FROM t
```

### How was this patch tested?
not yet

Closes #32832 from linhongliu-db/SPARK-35686-no-auto-alias.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 3c68343)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala (diff)
Commit 3acc4b973b57f88fbe681c7db89cd55699750178 by gengliang
[SPARK-35971][SQL] Rename the type name of TimestampNTZType as "timestamp_ntz"

### What changes were proposed in this pull request?

Rename the type name string of TimestampNTZType from "timestamp without time zone" to "timestamp_ntz".

### Why are the changes needed?

This is to make the column header shorter and simpler.
Snowflake and Flink uses similar approach:
https://docs.snowflake.com/en/sql-reference/data-types-datetime.html
https://ci.apache.org/projects/flink/flink-docs-master/docs/dev/table/concepts/timezone/

### Does this PR introduce _any_ user-facing change?

No, the new timestamp type is not released yet.

### How was this patch tested?

Unit tests

Closes #33173 from gengliangwang/reviseTypeName.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: 3acc4b9)
The file was modifiedsql/core/src/test/resources/sql-functions/sql-expression-schema.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime.sql.out (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/ansi/datetime.sql.out (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuiteBase.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/TimestampNTZType.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/datetime-legacy.sql.out (diff)
Commit 94c1e3c38cfcc7444aad6efa38a7d2c3ed9f9f4a by dongjoon
[SPARK-35969][K8S] Make the pod prefix more readable and tallied with K8S DNS Label Names

### What changes were proposed in this pull request?

By default, the executor pod prefix is generated by the app name. It handles characters that match [^a-z0-9\\-] differently. The '.' and all whitespaces will be converted to '-', but other ones to empty string. Especially,  characters like '_', '|' are commonly used as a word separator in many languages.

According to the K8S DNS Label Names, see https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#dns-label-names, we can convert all special characters to `-`.

 
For example,

```
scala> "xyz_abc_i_am_a_app_name_w/_some_abbrs".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-")
res11: String = xyz-abc-i-am-a-app-name-w-some-abbrs

scala> "xyz_abc_i_am_a_app_name_w/_some_abbrs".replaceAll("\\s+", "-").replaceAll("\\.", "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-")
res12: String = xyzabciamaappnamewsomeabbrs
```

```scala
scala> "time.is%the¥most$valuable_——————thing,it's about time.".replaceAll("[^a-z0-9\\-]", "-").replaceAll("-+", "-")
res9: String = time-is-the-most-valuable-thing-it-s-about-time-

scala> "time.is%the¥most$valuable_——————thing,it's about time.".replaceAll("\\s+", "-").replaceAll("\\.", "-").replaceAll("[^a-z0-9\\-]", "").replaceAll("-+", "-")
res10: String = time-isthemostvaluablethingits-about-time-

```

### Why are the changes needed?

For better UX

### Does this PR introduce _any_ user-facing change?

yes, the executor pod name might look better
### How was this patch tested?

add new ones

Closes #33171 from yaooqinn/SPARK-35969.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 94c1e3c)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStepSuite.scala (diff)
Commit ca1217667c2bae5c37cbc8ab3899e653a9c4ed0d by wenchen
[SPARK-35756][SQL] unionByName supports struct having same col names but different sequence

### What changes were proposed in this pull request?

unionByName does not supports struct having same col names but different sequence
```
val df1 = Seq((1, Struct1(1, 2))).toDF("a", "b")
val df2 = Seq((1, Struct2(1, 2))).toDF("a", "b")
val unionDF = df1.unionByName(df2)
```
it gives the exception

`org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the compatible column types. struct<c2:int,c1:int> <> struct<c1:int,c2:int> at the second column of the second table; 'Union false, false :- LocalRelation [_1#38, _2#39] +- LocalRelation _1#45, _2#46`

In this case the col names are same so this unionByName should have the support to check within in the Struct if col names are same it should not throw this exception and works.

after fix we are getting the result

```
val unionDF = df1.unionByName(df2)
scala>  unionDF.show
+---+------+
|  a|     b|
+---+------+
|  1|{1, 2}|
|  1|{2, 1}|
+---+------+

```

### Why are the changes needed?
As per unionByName functionality based on name, does the union. In the case of struct this scenario was missing where all the columns  names are same but sequence is different,  so added this functionality.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added the unit test and also done the testing through spark shell

Closes #32972 from SaurabhChawla100/SPARK-35756.

Authored-by: SaurabhChawla <s.saurabhtim@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: ca12176)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveUnion.scala (diff)
Commit a643076d4ef622eac505ebf22c9aa2cc909320ac by max.gekk
[SPARK-35975][SQL] New configuration `spark.sql.timestampType` for the default timestamp type

### What changes were proposed in this pull request?

Add a new configuration `spark.sql.timestampType`, which configures the default timestamp type of Spark SQL, including SQL DDL and Cast clause. Setting the configuration as `TIMESTAMP_NTZ` will use `TIMESTAMP WITHOUT TIME ZONE` as the default type while putting it as `TIMESTAMP_LTZ` will use `TIMESTAMP WITH LOCAL TIME ZONE`.

The default value of the new configuration is TIMESTAMP_LTZ, which is consistent with previous Spark releases.

### Why are the changes needed?

A new configuration for switching the default timestamp type as timestamp without time zone.

### Does this PR introduce _any_ user-facing change?

No, it's a new feature.

### How was this patch tested?

Unit test

Closes #33176 from gengliangwang/newTsTypeConf.

Authored-by: Gengliang Wang <gengliang@apache.org>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(commit: a643076)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala (diff)
Commit fceabe2372ab2a53401059e6019f441d0580aeab by viirya
[SPARK-35779][SQL] Dynamic filtering for Data Source V2

### What changes were proposed in this pull request?

This PR implemented the proposal per [design doc](https://docs.google.com/document/d/1RfFn2e9o_1uHJ8jFGsSakp-BZMizX1uRrJSybMe2a6M) for SPARK-35779.

### Why are the changes needed?

Spark supports dynamic partition filtering that enables reusing parts of the query to skip unnecessary partitions in the larger table during joins. This optimization has proven to be beneficial for star-schema queries which are common in the industry. Unfortunately, dynamic pruning is currently limited to partition pruning during joins and is only supported for built-in v1 sources. As more and more Spark users migrate to Data Source V2, it is important to generalize dynamic filtering and expose it to all v2 connectors.

Please, see the design doc for more information on this effort.

### Does this PR introduce _any_ user-facing change?

Yes, this PR adds a new optional mix-in interface for `Scan` in Data Source V2.

### How was this patch tested?

This PR comes with tests.

Closes #32921 from aokolnychyi/dynamic-filtering-wip.

Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: fceabe2)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/CleanupDynamicPruningFilters.scala (diff)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala (diff)
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsRuntimeFiltering.java
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/dynamicpruning/PartitionPruning.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcV2SchemaPruningSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/read/Scan.java (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/BatchScanExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/InMemoryTable.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroRowReaderSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala (diff)
Commit 95d94948c58593c72bf4add58624b0e8b305953a by ueshin
[SPARK-35339][PYTHON] Improve unit tests for data-type-based basic operations

### What changes were proposed in this pull request?

Improve unit tests for data-type-based basic operations by:
- removing redundant test cases
- adding `astype` test for ExtensionDtypes

### Why are the changes needed?

Some test cases for basic operations are duplicated after introducing data-type-based basic operations. The PR is proposed to remove redundant test cases.
`astype` is not tested for ExtensionDtypes, which will be adjusted in this PR as well.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Unit tests.

Closes #33095 from xinrong-databricks/datatypeops_test.

Authored-by: Xinrong Meng <xinrong.meng@databricks.com>
Signed-off-by: Takuya UESHIN <ueshin@databricks.com>
(commit: 95d9494)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/test_boolean_ops.py (diff)
The file was modifiedpython/pyspark/pandas/tests/test_series.py (diff)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/test_string_ops.py (diff)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/testing_utils.py (diff)
The file was modifiedpython/pyspark/pandas/tests/data_type_ops/test_num_ops.py (diff)
Commit 47485a3c2df3201c838b939e82d5b26332e2d858 by gengliang
[SPARK-35897][SS] Support user defined initial state with flatMapGroupsWithState in Structured Streaming

### What changes were proposed in this pull request?
This PR aims to add support for specifying a user defined initial state for arbitrary structured streaming stateful processing using [flat]MapGroupsWithState operator.

### Why are the changes needed?
Users can load previous state of their stateful processing as an initial state instead of redoing the entire processing once again.

### Does this PR introduce _any_ user-facing change?

Yes this PR introduces new API
```
  def mapGroupsWithState[S: Encoder, U: Encoder](
      timeoutConf: GroupStateTimeout,
      initialState: KeyValueGroupedDataset[K, S])(
      func: (K, Iterator[V], GroupState[S]) => U): Dataset[U]

  def flatMapGroupsWithState[S: Encoder, U: Encoder](
      outputMode: OutputMode,
      timeoutConf: GroupStateTimeout,
      initialState: KeyValueGroupedDataset[K, S])(
      func: (K, Iterator[V], GroupState[S]) => Iterator[U])

```

### How was this patch tested?

Through unit tests in FlatMapGroupsWithStateSuite

Closes #33093 from rahulsmahadev/flatMapGroupsWithState.

Authored-by: Rahul Mahadev <rahul.mahadev@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: 47485a3)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/IncrementalExecution.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/streaming/GroupState.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala (diff)
The file was modifiedsql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java (diff)
Commit 1fda011d71d970996680fdef4a109805f9d3d385 by gengliang
[SPARK-35955][SQL] Check for overflow in Average in ANSI mode

### What changes were proposed in this pull request?

Fixes decimal overflow issues for decimal average in ANSI mode, so that overflows throw an exception rather than returning null.

### Why are the changes needed?

Query:

```
scala> import org.apache.spark.sql.functions._
import org.apache.spark.sql.functions._

scala> spark.conf.set("spark.sql.ansi.enabled", true)

scala> val df = Seq(
     |  (BigDecimal("10000000000000000000"), 1),
     |  (BigDecimal("10000000000000000000"), 1),
     |  (BigDecimal("10000000000000000000"), 2),
     |  (BigDecimal("10000000000000000000"), 2),
     |  (BigDecimal("10000000000000000000"), 2),
     |  (BigDecimal("10000000000000000000"), 2),
     |  (BigDecimal("10000000000000000000"), 2),
     |  (BigDecimal("10000000000000000000"), 2),
     |  (BigDecimal("10000000000000000000"), 2),
     |  (BigDecimal("10000000000000000000"), 2),
     |  (BigDecimal("10000000000000000000"), 2),
     |  (BigDecimal("10000000000000000000"), 2)).toDF("decNum", "intNum")
df: org.apache.spark.sql.DataFrame = [decNum: decimal(38,18), intNum: int]

scala> val df2 = df.withColumnRenamed("decNum", "decNum2").join(df, "intNum").agg(mean("decNum"))
df2: org.apache.spark.sql.DataFrame = [avg(decNum): decimal(38,22)]

scala> df2.show(40,false)
```

Before:
```
+-----------+
|avg(decNum)|
+-----------+
|null       |
+-----------+
```

After:
```
21/07/01 19:48:31 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 24)
java.lang.ArithmeticException: Overflow in sum of decimals.
at org.apache.spark.sql.errors.QueryExecutionErrors$.overflowInSumOfDecimalError(QueryExecutionErrors.scala:162)
at org.apache.spark.sql.errors.QueryExecutionErrors.overflowInSumOfDecimalError(QueryExecutionErrors.scala)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:759)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:898)
at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:898)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:499)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:502)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test

Closes #33177 from karenfeng/SPARK-35955.

Authored-by: Karen Feng <karen.feng@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
(commit: 1fda011)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
Commit 79a6e00b7621bb3ebd99ff692452d292e11fb377 by dongjoon
[SPARK-35825][INFRA][FOLLOWUP] Increase it in build/mvn script

### What changes were proposed in this pull request?

This is a follow up of https://github.com/apache/spark/pull/32961.

This PR additionally sets the stack size in `build/mvn`.

### Why are the changes needed?

We are still hitting `StackOverflowError` in Jenkins.

- https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2/3064/console
```
[INFO] compiling 166 Scala sources and 19 Java sources to /home/jenkins/workspace/spark-master-test-maven-hadoop-3.2/sql/catalyst/target/scala-2.12/classes ...
[ERROR] ## Exception when compiling 480 sources to /home/jenkins/workspace/spark-master-test-maven-hadoop-3.2/sql/catalyst/target/scala-2.12/classes
java.lang.StackOverflowError
```

This PR increases the JVM of `mvn` instead of the plugin.

```
$ MAVEN_OPTS="-XX:+PrintFlagsFinal" build/mvn clean | grep 'intx ThreadStackSize'
     intx ThreadStackSize                           = 2048                                {pd product}

$ MAVEN_OPTS="-Xss128m -XX:+PrintFlagsFinal" build/mvn clean | grep 'intx ThreadStackSize'
     intx ThreadStackSize                          := 131072                              {pd product}
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #33180 from dongjoon-hyun/SPARK-35825.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 79a6e00)
The file was modifiedbuild/mvn (diff)
Commit ca6acf0839baaa40a1417e7dca0cc1a22de06bb2 by viirya
[SPARK-35785][SS] Cleanup support for RocksDB instance

### What changes were proposed in this pull request?
Add the functionality of cleaning up files of old versions for the RocksDB instance and RocksDBFileManager.

### Why are the changes needed?
Part of the implementation of RocksDB state store.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
New UT added.

Closes #32933 from xuanyuanking/SPARK-35785.

Authored-by: Yuanjian Li <yuanjian.li@databricks.com>
Signed-off-by: Liang-Chi Hsieh <viirya@gmail.com>
(commit: ca6acf0)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBFileManager.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala (diff)
Commit 0c9c8ff56933e6ae13454845e831746360af84e3 by wenchen
[SPARK-35968][SQL] Make sure partitions are not too small in AQE partition coalescing

### What changes were proposed in this pull request?

By default, AQE will set `COALESCE_PARTITIONS_MIN_PARTITION_NUM` to the spark default parallelism, which is usually quite big. This is to keep the parallelism on par with non-AQE, to avoid perf regressions.

However, this usually leads to many small/empty partitions, and hurts performance (although not worse than non-AQE). Users usually blindly set `COALESCE_PARTITIONS_MIN_PARTITION_NUM` to 1, which makes this config quite useless.

This PR adds a new config to set the min partition size, to avoid too small partitions after coalescing. By default, Spark will not respect the target size, and only respect this min partition size, to maximize the parallelism and avoid perf regression in AQE. This PR also adds a bool config to respect the target size when coalescing partitions, and it's recommended to set it to get better overall performance. This PR also deprecates the `COALESCE_PARTITIONS_MIN_PARTITION_NUM` config.

### Why are the changes needed?

AQE is default on now, we should make the perf better in the default case.

### Does this PR introduce _any_ user-facing change?

yes, a new config.

### How was this patch tested?

new tests

Closes #33172 from cloud-fan/aqe2.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 0c9c8ff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/groupingsets.sql.out (diff)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/ShufflePartitionsUtilSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala (diff)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL.R (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-order-by.sql (diff)
The file was modifiedpython/pyspark/sql/tests/test_dataframe.py (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/postgreSQL/groupingsets.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-order-by.sql.out (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ShufflePartitionsUtil.scala (diff)
Commit 77696448db5975343cf609b51f42702b49d7b08b by gurwls223
[SPARK-35981][PYTHON][TEST] Use check_exact=False to loosen the check precision

### What changes were proposed in this pull request?

We should use `check_exact=False` because the value check in `StatsTest.test_cov_corr_meta` is too strict.

### Why are the changes needed?

In some environment, the precision could be different in pandas' `DataFrame.corr` function and the test `StatsTest.test_cov_corr_meta` fails.

```
AssertionError: DataFrame.iloc[:, 0] (column name="a") are different
DataFrame.iloc[:, 0] (column name="a") values are different (14.28571 %)
[index]: [a, b, c, d, e, f, g]
[left]:  [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0]
[right]: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 4.807406715958909e-17]
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Modified tests should still pass.

Closes #33179 from ueshin/issuse/SPARK-35981/corr.

Authored-by: Takuya UESHIN <ueshin@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: 7769644)
The file was modifiedpython/pyspark/pandas/tests/test_stats.py (diff)
Commit c55b9fd1e014fac979b1e42f5a880e7b63286a54 by dongjoon
[SPARK-35992][BUILD] Upgrade ORC to 1.6.9

### What changes were proposed in this pull request?

This PR aims to upgrade Apache ORC to 1.6.9.

### Why are the changes needed?

This is required to bring ORC-804 in order to fix ORC encryption masking bug.

### Does this PR introduce _any_ user-facing change?

No. This is not released yet.

### How was this patch tested?

Pass the newly added test case.

Closes #33189 from dongjoon-hyun/SPARK-35992.

Lead-authored-by: Dongjoon Hyun <dongjoon@apache.org>
Co-authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: c55b9fd)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcEncryptionSuite.scala (diff)
The file was modifiedpom.xml (diff)
Commit dcc405743ec6898470d77a4b7b989948e65c01cd by dongjoon
[SPARK-35994][INFRA] Publish snapshot from branch-3.2

### What changes were proposed in this pull request?

This PR aims to publish snapshot artifacts from branch-3.2 additionally.

### Why are the changes needed?

`GitHub Action`'s cronjob feature is only supported in the default branch. So, to have a daily job, we should add here.

Currently, it's publishing master and 3.1.
- https://github.com/apache/spark/actions/workflows/publish_snapshot.yml

<img width="273" alt="Screen Shot 2021-07-02 at 10 22 41 AM" src="https://user-images.githubusercontent.com/9700541/124309380-7c407400-db1f-11eb-9aa4-30db61a72b80.png">

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #33192 from dongjoon-hyun/SPARK-35994.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: dcc4057)
The file was modified.github/workflows/publish_snapshot.yml (diff)
Commit a6e00ee9d7e4bd213d91cc48b6ebe73a57ded24f by dongjoon
[SPARK-35785][SS][FOLLOWUP] Ignore concurrent update and cleanup test

### What changes were proposed in this pull request?

This patch ignores the test "ensure that concurrent update and cleanup consistent versions" in #32933. The test is currently flaky and we will address it later.

### Why are the changes needed?

Unblock other developments.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests.

Closes #33195 from viirya/ignore-rocksdb-test.

Authored-by: Liang-Chi Hsieh <viirya@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: a6e00ee)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala (diff)
Commit 6c4616b2ac75f30db63c49ab91a784285b147b11 by dongjoon
[SPARK-35990][BUILD] Remove avro-sbt plugin dependency

### What changes were proposed in this pull request?

This PR removes sbt-avro plugin dependency.
In the current master, Build with SBT depends on the plugin but it seems never used.
Originally, the plugin was introduced for `flume-sink` in SPARK-1729 (#807) but `flume-sink` is no longer in Spark repository.

After SBT was upgraded to 1.x in SPARK-21708 (#29286), `avroGenerate` part was introduced in `object SQL` in `SparkBuild.scala`.
It's confusable but I understand `Test / avroGenerate := (Compile / avroGenerate).value` is for suppressing sbt-avro for `sql` sub-module.
In fact, Test/compile will fail if `Test / avroGenerate :=(Compile / avroGenerate).value` is commented out.

`sql` sub-module contains `parquet-compat.avpr` and `parquet-compat.avdl` but according to `sql/core/src/test/README.md`, they are intended to be handled by `gen-avro.sh`.

Also, in terms of Maven build, there seems to be no definition to handle `*.avpr` or `*.avdl`.

Based on the above, I think we can remove `sbt-avro`.

### Why are the changes needed?

If `sbt-avro` is really no longer used, it's confusable that `sbt-avro` related configurations are in `SparkBuild.scala`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

GA.

Closes #33190 from sarutak/remove-avro-from-sbt.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 6c4616b)
The file was modifiedproject/SparkBuild.scala (diff)
The file was modifiedproject/plugins.sbt (diff)
Commit f9f95686cb397271f55aaff29ec4352b4ef9aade by dongjoon
[SPARK-35996][BUILD] Setting version to 3.3.0-SNAPSHOT

### What changes were proposed in this pull request?

This PR aims to update `master` branch version to 3.3.0-SNAPSHOT.

### Why are the changes needed?

Start to prepare Apache Spark 3.3.0 and the published snapshot version should not conflict with `branch-3.2`.

### Does this PR introduce _any_ user-facing change?

N/A.

### How was this patch tested?

Pass the CIs.

Closes #33196 from dongjoon-hyun/SPARK-35996.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: f9f9568)
The file was modifiedcommon/sketch/pom.xml (diff)
The file was modifiedtools/pom.xml (diff)
The file was modifiedR/pkg/DESCRIPTION (diff)
The file was modifiedassembly/pom.xml (diff)
The file was modifiedexternal/kafka-0-10-sql/pom.xml (diff)
The file was modifiedcommon/kvstore/pom.xml (diff)
The file was modifiedcommon/network-yarn/pom.xml (diff)
The file was modifiedpom.xml (diff)
The file was modifiedcommon/network-shuffle/pom.xml (diff)
The file was modifiedstreaming/pom.xml (diff)
The file was modifiedgraphx/pom.xml (diff)
The file was modifiedexternal/kafka-0-10/pom.xml (diff)
The file was modifiedhadoop-cloud/pom.xml (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/pom.xml (diff)
The file was modifiedresource-managers/kubernetes/core/pom.xml (diff)
The file was modifiedsql/hive/pom.xml (diff)
The file was modifiedexternal/avro/pom.xml (diff)
The file was modifiedrepl/pom.xml (diff)
The file was modifiedcore/pom.xml (diff)
The file was modifiedproject/MimaExcludes.scala (diff)
The file was modifiedresource-managers/yarn/pom.xml (diff)
The file was modifiedexternal/kafka-0-10-assembly/pom.xml (diff)
The file was modifiedpython/pyspark/version.py (diff)
The file was modifiedlauncher/pom.xml (diff)
The file was modifiedcommon/network-common/pom.xml (diff)
The file was modifiedexternal/kinesis-asl-assembly/pom.xml (diff)
The file was modifiedmllib-local/pom.xml (diff)
The file was modifiedsql/hive-thriftserver/pom.xml (diff)
The file was modifieddocs/_config.yml (diff)
The file was modifiedexternal/kafka-0-10-token-provider/pom.xml (diff)
The file was modifiedexternal/kinesis-asl/pom.xml (diff)
The file was modifiedsql/core/pom.xml (diff)
The file was modifiedcommon/tags/pom.xml (diff)
The file was modifiedcommon/unsafe/pom.xml (diff)
The file was modifiedexamples/pom.xml (diff)
The file was modifiedsql/catalyst/pom.xml (diff)
The file was modifiedmllib/pom.xml (diff)
The file was modifiedexternal/docker-integration-tests/pom.xml (diff)
The file was modifiedexternal/spark-ganglia-lgpl/pom.xml (diff)
The file was modifiedresource-managers/mesos/pom.xml (diff)