SuccessChanges

Summary

  1. [SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE (commit: 58be82ad4b98fc17e821e916e69e77a6aa36209d) (details)
  2. [SPARK-30152][INFRA] Enable Hadoop-2.7/JDK11 build at GitHub Action (commit: 81996f9e4d8a17c3475a33af0c9c3d32cd70865f) (details)
  3. [SPARK-30155][SQL] Rename parse() to parseString() to avoid conflict in (commit: a30ec19a7358f18849944ecfab1d2b14e733614c) (details)
  4. [SPARK-30148][SQL] Optimize writing plans if there is an analysis (commit: 51aa7a920ec097ed2a797687de8382e21691f18c) (details)
  5. [SPARK-30157][BUILD][TEST-HADOOP3.2][TEST-JAVA11] Upgrade Apache (commit: 1e0037b5e9ff077bdb59ad4536b7e5081a963089) (details)
  6. [SPARK-30156][BUILD] Upgrade Jersey from 2.29 to 2.29.1 (commit: afc4fa02bd2b7eb835e5c5dcbe0cbd1303910b42) (details)
  7. [SPARK-30147][SQL] Trim the string when cast string type to booleans (commit: e88d74052bf40eabab9e3388fa09e52097ffa3aa) (details)
  8. [SPARK-30163][INFRA] Use Google Maven mirror in GitHub Action (commit: 1068b8b24910eec8122bf7fa4748a101becf0d2b) (details)
  9. [SPARK-30163][INFRA][FOLLOWUP] Make `.m2` directory for cold start (commit: 16f1b23d75c0b44aac61111bfb2ae9bb0f3fab68) (details)
Commit 58be82ad4b98fc17e821e916e69e77a6aa36209d by wenchen
[SPARK-30098][SQL] Use default datasource as provider for CREATE TABLE
syntax
### What changes were proposed in this pull request?
In this PR, we propose to use the value of `spark.sql.source.default` as
the provider for `CREATE TABLE` syntax instead of `hive` in Spark 3.0.
And to help the migration, we introduce a legacy conf
`spark.sql.legacy.respectHiveDefaultProvider.enabled` and set its
default to `false`.
### Why are the changes needed?
1. Currently, `CREATE TABLE` syntax use hive provider to create table
while `DataFrameWriter.saveAsTable` API using the value of
`spark.sql.source.default` as a provider to create table. It would be
better to make them consistent.
2. User may gets confused in some cases. For example:
``` CREATE TABLE t1 (c1 INT) USING PARQUET; CREATE TABLE t2 (c1 INT);
```
In these two DDLs, use may think that `t2` should also use parquet as
default provider since Spark always advertise parquet as the default
format. However, it's hive in this case.
On the other hand, if we omit the USING clause in a CTAS statement, we
do pick parquet by default if `spark.sql.hive.convertCATS=true`:
``` CREATE TABLE t3 USING PARQUET AS SELECT 1 AS VALUE; CREATE TABLE t4
AS SELECT 1 AS VALUE;
``` And these two cases together can be really confusing.
3. Now, Spark SQL is very independent and popular. We do not need to be
fully consistent with Hive's behavior.
### Does this PR introduce any user-facing change?
Yes, before this PR, using `CREATE TABLE` syntax will use hive provider.
But now, it use the value of `spark.sql.source.default` as its provider.
### How was this patch tested?
Added tests in `DDLParserSuite` and `HiveDDlSuite`.
Closes #26736 from Ngone51/dev-create-table-using-parquet-by-default.
Lead-authored-by: wuyi <yi.wu@databricks.com> Co-authored-by: yi.wu
<yi.wu@databricks.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 58be82ad4b98fc17e821e916e69e77a6aa36209d)
The file was modifiedsql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveCommandSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLParserSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/sources/CreateTableAsSelectSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeSuite.scala (diff)
The file was modifiedsql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/WindowQuerySuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/postgreSQL/create_view.sql.out (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHive.scala (diff)
Commit 81996f9e4d8a17c3475a33af0c9c3d32cd70865f by dhyun
[SPARK-30152][INFRA] Enable Hadoop-2.7/JDK11 build at GitHub Action
### What changes were proposed in this pull request?
This PR enables JDK11 build with `hadoop-2.7` profile at `GitHub
Action`.
**BEFORE (6 jobs including one JDK11 job)**
![before](https://user-images.githubusercontent.com/9700541/70342731-7763f300-180a-11ea-859f-69038b88451f.png)
**AFTER (7 jobs including two JDK11 jobs)**
![after](https://user-images.githubusercontent.com/9700541/70342658-54d1da00-180a-11ea-9fba-507fc087dc62.png)
### Why are the changes needed?
SPARK-29957 makes JDK11 test work with `hadoop-2.7` profile. We need to
protect it.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This is `GitHub Action` only PR. See the result of `GitHub Action` on
this PR.
Closes #26782 from dongjoon-hyun/SPARK-GHA-HADOOP-2.7.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 81996f9e4d8a17c3475a33af0c9c3d32cd70865f)
The file was modified.github/workflows/master.yml (diff)
Commit a30ec19a7358f18849944ecfab1d2b14e733614c by dhyun
[SPARK-30155][SQL] Rename parse() to parseString() to avoid conflict in
Scala 2.13
### What changes were proposed in this pull request?
Rename internal method LegacyTypeStringParser.parse() to parseString().
### Why are the changes needed?
In Scala 2.13, the parse() definition clashes with supertype
declarations.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Existing tests.
Closes #26784 from srowen/SPARK-30155.
Authored-by: Sean Owen <sean.owen@databricks.com> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
(commit: a30ec19a7358f18849944ecfab1d2b14e733614c)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/LegacyTypeStringParser.scala (diff)
Commit 51aa7a920ec097ed2a797687de8382e21691f18c by gurwls223
[SPARK-30148][SQL] Optimize writing plans if there is an analysis
exception
### What changes were proposed in this pull request? Optimized
QueryExecution.scala#writePlans().
### Why are the changes needed? If any query fails in Analysis phase and
gets AnalysisException, there is no need to execute further phases since
those will return a same result i.e, AnalysisException.
### Does this PR introduce any user-facing change? No
### How was this patch tested? Manually
Closes #26778 from amanomer/optExplain.
Authored-by: Aman Omer <amanomer1996@gmail.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 51aa7a920ec097ed2a797687de8382e21691f18c)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala (diff)
Commit 1e0037b5e9ff077bdb59ad4536b7e5081a963089 by gurwls223
[SPARK-30157][BUILD][TEST-HADOOP3.2][TEST-JAVA11] Upgrade Apache
HttpCore from 4.4.10 to 4.4.12
### What changes were proposed in this pull request?
This PR aims to upgrade `Apache HttpCore` from 4.4.10 to 4.4.12.
### Why are the changes needed?
`Apache HttpCore v4.4.11` is the first official release for JDK11.
> This is a maintenance release that corrects a number of defects in
non-blocking SSL session code that caused compatibility issues with
TLSv1.3 protocol implementation shipped with Java 11.
For the full release note, please see the following.
-
https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins.
Closes #26786 from dongjoon-hyun/SPARK-30157.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: 1e0037b5e9ff077bdb59ad4536b7e5081a963089)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-1.2 (diff)
The file was modifiedpom.xml (diff)
Commit afc4fa02bd2b7eb835e5c5dcbe0cbd1303910b42 by dhyun
[SPARK-30156][BUILD] Upgrade Jersey from 2.29 to 2.29.1
### What changes were proposed in this pull request?
This PR aims to upgrade `Jersey` from 2.29 to 2.29.1.
### Why are the changes needed?
This will bring several bug fixes and important dependency upgrades.
-
https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.29.1.html
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins.
Closes #26785 from dongjoon-hyun/SPARK-30156.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: afc4fa02bd2b7eb835e5c5dcbe0cbd1303910b42)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-1.2 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
Commit e88d74052bf40eabab9e3388fa09e52097ffa3aa by yamamuro
[SPARK-30147][SQL] Trim the string when cast string type to booleans
### What changes were proposed in this pull request?
Now, we trim the string when casting string value to those `canCast`
types values, e.g. int, double, decimal, interval, date, timestamps,
except for boolean. This behavior makes type cast and coercion
inconsistency in Spark. Not fitting ANSI SQL standard either.
``` If TD is boolean, then Case: a) If SD is character string, then SV
is replaced by
   TRIM ( BOTH ' ' FROM VE )
   Case:
   i) If the rules for literal in Subclause 5.3, “literal”, can be
applied to SV to determine a valid value of the data type TD, then let
TV be that value.
  ii) Otherwise, an exception condition is raised: data exception —
invalid character value for cast. b) If SD is boolean, then TV is SV
``` In this pull request, we trim all the whitespaces from both ends of
the string before converting it to a bool value. This behavior is as
same as others, but a bit different from sql standard, which trim only
spaces.
### Why are the changes needed?
Type cast/coercion consistency
### Does this PR introduce any user-facing change?
yes, string with whitespaces in both ends will be trimmed before
converted to booleans.
e.g. `select cast('\t true' as boolean)` results `true` now, before this
pr it's `null`
### How was this patch tested?
add unit tests
Closes #26776 from yaooqinn/SPARK-30147.
Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Takeshi
Yamamuro <yamamuro@apache.org>
(commit: e88d74052bf40eabab9e3388fa09e52097ffa3aa)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/cast.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/cast.sql.out (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala (diff)
Commit 1068b8b24910eec8122bf7fa4748a101becf0d2b by dhyun
[SPARK-30163][INFRA] Use Google Maven mirror in GitHub Action
### What changes were proposed in this pull request?
This PR aims to use [Google Maven
mirror](https://cloudplatform.googleblog.com/2015/11/faster-builds-for-Java-developers-with-Maven-Central-mirror.html)
in `GitHub Action` jobs to improve the stability.
```xml
<settings>
<mirrors>
   <mirror>
     <id>google-maven-central</id>
     <name>GCS Maven Central mirror</name>
   
<url>https://maven-central.storage-download.googleapis.com/repos/central/data/</url>
     <mirrorOf>central</mirrorOf>
   </mirror>
</mirrors>
</settings>
```
### Why are the changes needed?
Although we added Maven cache inside `GitHub Action`, the timeouts
happen too frequently during access `artifact descriptor`.
```
[ERROR] Failed to execute goal on project spark-mllib_2.12:
... Failed to read artifact descriptor for ...
... Connection timed out (Read failed) -> [Help 1]
```
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This PR is irrelevant to Jenkins.
This is tested on the personal repository first. `GitHub Action` of this
PR should pass.
- https://github.com/dongjoon-hyun/spark/pull/11
Closes #26793 from dongjoon-hyun/SPARK-30163.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 1068b8b24910eec8122bf7fa4748a101becf0d2b)
The file was modified.github/workflows/master.yml (diff)
Commit 16f1b23d75c0b44aac61111bfb2ae9bb0f3fab68 by dhyun
[SPARK-30163][INFRA][FOLLOWUP] Make `.m2` directory for cold start
without cache
### What changes were proposed in this pull request?
This PR is a follow-up of https://github.com/apache/spark/pull/26793 and
aims to initialize `~/.m2` directory.
### Why are the changes needed?
In case of cache reset, `~/.m2` directory doesn't exist. It causes a
failure.
- `master` branch has a cache as of now. So, we missed this.
- `branch-2.4` has no cache as of now, and we hit this failure.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
This PR is tested against personal `branch-2.4`.
- https://github.com/dongjoon-hyun/spark/pull/12
Closes #26794 from dongjoon-hyun/SPARK-30163-2.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 16f1b23d75c0b44aac61111bfb2ae9bb0f3fab68)
The file was modified.github/workflows/master.yml (diff)