FailedChanges

Summary

  1. [SPARK-28962][SQL][FOLLOW-UP] Add the parameter description for the (commit: ddf83159a8c61fa12237a60124f7a7aa4e3a53c1) (details)
  2. [MINOR][DOCS] Fix src/dest type documentation for `to_timestamp` (commit: 53fd83a8c5f2311a52ed3645c13eab72a3d1cb94) (details)
  3. [SPARK-30627][SQL] Disable all the V2 file sources by default (commit: ed44926117d81aa5fa8bd823d401efd235260872) (details)
  4. [SPARK-29924][DOCS] Document Apache Arrow JDK11 requirement (commit: d1a673a1bb6f0a3d2816ce7a2c4e6737b52b4c81) (details)
  5. [SPARK-30626][K8S] Add SPARK_APPLICATION_ID into driver pod env (commit: f86a1b9590c2fc661232e2f7fd5859daf118729e) (details)
  6. [SPARK-30630][ML] Remove numTrees in GBT in 3.0.0 (commit: 2f8e4d0d6e56188fa24528741a514ce1f04d2bf2) (details)
  7. [SPARK-29721][SQL] Prune unnecessary nested fields from Generate without (commit: a0e63b61e7c5d55ae2a9213b95ab1e87ac7c203c) (details)
  8. [SPARK-30639][BUILD] Upgrade Jersey to 2.30 (commit: 862959747ec3eb1f90d23ec91c1c6419468c9ea9) (details)
  9. [SPARK-30579][DOC] Document ORDER BY Clause of SELECT statement in SQL (commit: d5b92b24c41b047c64a4d89cc4061ebf534f0995) (details)
Commit ddf83159a8c61fa12237a60124f7a7aa4e3a53c1 by dhyun
[SPARK-28962][SQL][FOLLOW-UP] Add the parameter description for the
Scala function API filter
### What changes were proposed in this pull request? This PR is a
follow-up PR https://github.com/apache/spark/pull/25666 for adding the
description and example for the Scala function API `filter`.
### Why are the changes needed? It is hard to tell which parameter is
the index column.
### Does this PR introduce any user-facing change? No
### How was this patch tested? N/A
Closes #27336 from gatorsmile/spark28962.
Authored-by: Xiao Li <gatorsmile@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: ddf83159a8c61fa12237a60124f7a7aa4e3a53c1)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/functions.scala (diff)
Commit 53fd83a8c5f2311a52ed3645c13eab72a3d1cb94 by gurwls223
[MINOR][DOCS] Fix src/dest type documentation for `to_timestamp`
### What changes were proposed in this pull request?
Minor documentation fix
### Why are the changes needed?
### Does this PR introduce any user-facing change?
### How was this patch tested?
Manually; consider adding tests?
Closes #27295 from deepyaman/patch-2.
Authored-by: Deepyaman Datta <deepyaman.datta@utexas.edu> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 53fd83a8c5f2311a52ed3645c13eab72a3d1cb94)
The file was modifiedpython/pyspark/sql/functions.py (diff)
Commit ed44926117d81aa5fa8bd823d401efd235260872 by dhyun
[SPARK-30627][SQL] Disable all the V2 file sources by default
### What changes were proposed in this pull request?
Disable all the V2 file sources in Spark 3.0 by default.
### Why are the changes needed?
There are still some missing parts in the file source V2 framework: 1.
It doesn't support reporting file scan metrics such as
"numOutputRows"/"numFiles"/"fileSize" like `FileSourceScanExec`. This
requires another patch in the data source V2 framework. Tracked by
[SPARK-30362](https://issues.apache.org/jira/browse/SPARK-30362) 2. It
doesn't support partition pruning with subqueries(including dynamic
partition pruning) for now. Tracked by
[SPARK-30628](https://issues.apache.org/jira/browse/SPARK-30628)
As we are going to code freeze on Jan 31st, this PR proposes to disable
all the V2 file sources in Spark 3.0 by default.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Existing tests.
Closes #27348 from gengliangwang/disableFileSourceV2.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: ed44926117d81aa5fa8bd823d401efd235260872)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcPartitionDiscoverySuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/FileDataSourceV2FallBackSuite.scala (diff)
Commit d1a673a1bb6f0a3d2816ce7a2c4e6737b52b4c81 by dhyun
[SPARK-29924][DOCS] Document Apache Arrow JDK11 requirement
### What changes were proposed in this pull request?
This adds a note for additional setting for Apache Arrow library for
Java 11.
### Why are the changes needed?
Since Apache Arrow 0.14.0, an additional setting is required for Java
9+.
- https://issues.apache.org/jira/browse/ARROW-3191
It's explicitly documented at Apache Arrow 0.15.0.
- https://issues.apache.org/jira/browse/ARROW-6206
However, there is no plan to handle that inside Apache Arrow side.
- https://issues.apache.org/jira/browse/ARROW-7223
In short, we need to document this for the users who is using
Arrow-related feature on JDK11.
For dev environment, we handle this via
[SPARK-29923](https://github.com/apache/spark/pull/26552) .
### Does this PR introduce any user-facing change?
Yes.
### How was this patch tested?
Generated document and see the pages.
![doc](https://user-images.githubusercontent.com/9700541/73096611-0f409d80-3e9a-11ea-804b-c6b5ec7bd78d.png)
Closes #27356 from dongjoon-hyun/SPARK-JDK11-ARROW-DOC.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: d1a673a1bb6f0a3d2816ce7a2c4e6737b52b4c81)
The file was modifieddocs/index.md (diff)
Commit f86a1b9590c2fc661232e2f7fd5859daf118729e by dhyun
[SPARK-30626][K8S] Add SPARK_APPLICATION_ID into driver pod env
### What changes were proposed in this pull request? Add
SPARK_APPLICATION_ID environment when spark configure driver pod.
### Why are the changes needed? Currently, driver doesn't have this in
environments and it's no convenient to retrieve spark id. The use case
is we want to look up spark application id and create application folder
and redirect driver logs to application folder.
### Does this PR introduce any user-facing change? no
### How was this patch tested? unit tested. I also build new
distribution and container image to kick off a job in Kubernetes and I
do see SPARK_APPLICATION_ID added there. .
Closes #27347 from Jeffwan/SPARK-30626.
Authored-by: Jiaxin Shan <seedjeffwan@gmail.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: f86a1b9590c2fc661232e2f7fd5859daf118729e)
The file was modifiedresource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStepSuite.scala (diff)
The file was modifiedresource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala (diff)
Commit 2f8e4d0d6e56188fa24528741a514ce1f04d2bf2 by dhyun
[SPARK-30630][ML] Remove numTrees in GBT in 3.0.0
### What changes were proposed in this pull request? Remove
```numTrees``` in GBT in 3.0.0.
### Why are the changes needed? Currently, GBT has
```
/**
  * Number of trees in ensemble
  */
Since("2.0.0")
val getNumTrees: Int = trees.length
``` and
```
/** Number of trees in ensemble */
val numTrees: Int = trees.length
``` I think we should remove one of them. We deprecated it in 2.4.5 via
https://github.com/apache/spark/pull/27352.
### Does this PR introduce any user-facing change? Yes, remove
```numTrees``` in GBT in 3.0.0
### How was this patch tested? existing tests
Closes #27330 from huaxingao/spark-numTrees.
Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 2f8e4d0d6e56188fa24528741a514ce1f04d2bf2)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/regression/GBTRegressorSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/GBTClassifier.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/GBTRegressor.scala (diff)
The file was modifiedproject/MimaExcludes.scala (diff)
Commit a0e63b61e7c5d55ae2a9213b95ab1e87ac7c203c by dhyun
[SPARK-29721][SQL] Prune unnecessary nested fields from Generate without
Project
### What changes were proposed in this pull request?
This patch proposes to prune unnecessary nested fields from Generate
which has no Project on top of it.
### Why are the changes needed?
In Optimizer, we can prune nested columns from Project(projectList,
Generate). However, unnecessary columns could still possibly be read in
Generate, if no Project on top of it. We should prune it too.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Unit test.
Closes #26978 from viirya/SPARK-29721.
Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Co-authored-by:
Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: a0e63b61e7c5d55ae2a9213b95ab1e87ac7c203c)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/SchemaPruningSuite.scala (diff)
Commit 862959747ec3eb1f90d23ec91c1c6419468c9ea9 by dhyun
[SPARK-30639][BUILD] Upgrade Jersey to 2.30
### What changes were proposed in this pull request?
For better JDK11 support, this PR aims to upgrade **Jersey** and
**javassist** to `2.30` and `3.35.0-GA` respectively.
### Why are the changes needed?
**Jersey**: This will bring the following `Jersey` updates.
-
https://eclipse-ee4j.github.io/jersey.github.io/release-notes/2.30.html
- https://github.com/eclipse-ee4j/jersey/issues/4245 (Java 11
java.desktop module dependency)
**javassist**: This is a transitive dependency from 3.20.0-CR2 to
3.25.0-GA.
- `javassist` officially supports JDK11 from [3.24.0-GA release
note](https://github.com/jboss-javassist/javassist/blob/master/Readme.html#L308).
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Pass the Jenkins with both JDK8 and JDK11.
Closes #27357 from dongjoon-hyun/SPARK-30639.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 862959747ec3eb1f90d23ec91c1c6419468c9ea9)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-1.2 (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.2-hive-2.3 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7-hive-2.3 (diff)
Commit d5b92b24c41b047c64a4d89cc4061ebf534f0995 by yamamuro
[SPARK-30579][DOC] Document ORDER BY Clause of SELECT statement in SQL
Reference
### What changes were proposed in this pull request? Document ORDER BY
clause of SELECT statement in SQL Reference Guide.
### Why are the changes needed? Currently Spark lacks documentation on
the supported SQL constructs causing confusion among users who sometimes
have to look at the code to understand the usage. This is aimed at
addressing this issue.
### Does this PR introduce any user-facing change? Yes.
**Before:** There was no documentation for this.
**After.**
<img width="972" alt="Screen Shot 2020-01-19 at 11 50 57 PM"
src="https://user-images.githubusercontent.com/14225158/72708034-ac0bdf80-3b16-11ea-81f3-48d8087e4e98.png">
<img width="972" alt="Screen Shot 2020-01-19 at 11 51 14 PM"
src="https://user-images.githubusercontent.com/14225158/72708042-b0d09380-3b16-11ea-939e-905b8c031608.png">
<img width="972" alt="Screen Shot 2020-01-19 at 11 51 33 PM"
src="https://user-images.githubusercontent.com/14225158/72708050-b4fcb100-3b16-11ea-95d2-e4e302cace1b.png">
### How was this patch tested? Tested using jykyll build --serve
Closes #27288 from dilipbiswal/sql-ref-select-orderby.
Authored-by: Dilip Biswal <dkbiswal@gmail.com> Signed-off-by: Takeshi
Yamamuro <yamamuro@apache.org>
(commit: d5b92b24c41b047c64a4d89cc4061ebf534f0995)
The file was modifieddocs/sql-ref-syntax-qry-select-orderby.md (diff)