SuccessChanges

Summary

  1. [SPARK-23436][SQL][BACKPORT-2.3] Infer partition as Date only if it can (commit: 8ff8e16e2da4aa5457e4b1e5d575bd1a3a1f0358) (details)
  2. [SPARK-23630][YARN] Allow user's hadoop conf customizations to take (commit: bc5ce047658272eb48d744e8d069c6cccdf37682) (details)
  3. [SPARK-23628][SQL][BACKPORT-2.3] calculateParamLength should not return (commit: 3ec25d5a803888e5e24a47a511e9d88c423c5310) (details)
  4. [SPARK-23173][SQL] Avoid creating corrupt parquet files when loading (commit: b083bd107d25bd3f7a4cdcf3aafa07b9895878b6) (details)
  5. [SPARK-23624][SQL] Revise doc of method pushFilters in Datasource V2 (commit: 5bd306c3896f7967243f7f3be30e4f095b51e0fe) (details)
Commit 8ff8e16e2da4aa5457e4b1e5d575bd1a3a1f0358 by hyukjinkwon
[SPARK-23436][SQL][BACKPORT-2.3] Infer partition as Date only if it can
be casted to Date
This PR is to backport https://github.com/apache/spark/pull/20621 to
branch 2.3
---
## What changes were proposed in this pull request?
Before the patch, Spark could infer as Date a partition value which
cannot be casted to Date (this can happen when there are extra
characters after a valid date, like `2018-02-15AAA`).
When this happens and the input format has metadata which define the
schema of the table, then `null` is returned as a value for the
partition column, because the `cast` operator used in
(`PartitioningAwareFileIndex.inferPartitioning`) is unable to convert
the value.
The PR checks in the partition inference that values can be casted to
Date and Timestamp, in order to infer that datatype to them.
## How was this patch tested?
added UT
Author: Marco Gaido <marcogaido91@gmail.com>
Closes #20764 from gatorsmile/backport23436.
(commit: 8ff8e16e2da4aa5457e4b1e5d575bd1a3a1f0358)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala (diff)
Commit bc5ce047658272eb48d744e8d069c6cccdf37682 by vanzin
[SPARK-23630][YARN] Allow user's hadoop conf customizations to take
effect.
This change restores functionality that was inadvertently removed as
part of the fix for SPARK-22372.
Also modified an existing unit test to make sure the feature works as
intended.
Author: Marcelo Vanzin <vanzin@cloudera.com>
Closes #20776 from vanzin/SPARK-23630.
(cherry picked from commit 2c3673680e16f88f1d1cd73a3f7445ded5b3daa8)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(commit: bc5ce047658272eb48d744e8d069c6cccdf37682)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala (diff)
The file was modifiedresource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/YarnClusterSuite.scala (diff)
Commit 3ec25d5a803888e5e24a47a511e9d88c423c5310 by wenchen
[SPARK-23628][SQL][BACKPORT-2.3] calculateParamLength should not return
1 + num of expressions
## What changes were proposed in this pull request?
Backport of ea480990e726aed59750f1cea8d40adba56d991a to branch 2.3.
## How was this patch tested?
added UT
cc cloud-fan hvanhovell
Author: Marco Gaido <marcogaido91@gmail.com>
Closes #20783 from mgaido91/SPARK-23628_2.3.
(commit: 3ec25d5a803888e5e24a47a511e9d88c423c5310)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala (diff)
Commit b083bd107d25bd3f7a4cdcf3aafa07b9895878b6 by gatorsmile
[SPARK-23173][SQL] Avoid creating corrupt parquet files when loading
data from JSON
## What changes were proposed in this pull request?
The from_json() function accepts an additional parameter, where the user
might specify the schema. The issue is that the specified schema might
not be compatible with data. In particular, the JSON data might be
missing data for fields declared as non-nullable in the schema. The
from_json() function does not verify the data against such errors. When
data with missing fields is sent to the parquet encoder, there is no
verification either. The end results is a corrupt parquet file.
To avoid corruptions, make sure that all fields in the user-specified
schema are set to be nullable. Since this changes the behavior of a
public function, we need to include it in release notes. The behavior
can be reverted by setting `spark.sql.fromJsonForceNullableSchema=false`
## How was this patch tested?
Added two new tests.
Author: Michał Świtakowski <michal.switakowski@databricks.com>
Closes #20694 from mswit-databricks/SPARK-23173.
(cherry picked from commit 2ca9bb083c515511d2bfee271fc3e0269aceb9d5)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: b083bd107d25bd3f7a4cdcf3aafa07b9895878b6)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit 5bd306c3896f7967243f7f3be30e4f095b51e0fe by gatorsmile
[SPARK-23624][SQL] Revise doc of method pushFilters in Datasource V2
## What changes were proposed in this pull request?
Revise doc of method pushFilters in
SupportsPushDownFilters/SupportsPushDownCatalystFilters
In `FileSourceStrategy`, except `partitionKeyFilters`(the references of
which is subset of partition keys), all filters needs to be evaluated
after scanning. Otherwise, Spark will get wrong result from data sources
like Orc/Parquet.
This PR is to improve the doc.
Author: Wang Gengliang <gengliang.wang@databricks.com>
Closes #20769 from gengliangwang/revise_pushdown_doc.
(cherry picked from commit 10b0657b035641ce735055bba2c8459e71bc2400)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: 5bd306c3896f7967243f7f3be30e4f095b51e0fe)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsPushDownFilters.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsPushDownCatalystFilters.java (diff)