FailedChanges

Summary

  1. [SPARK-30041][SQL][WEBUI] Add Codegen Stage Id to Stage DAG (commit: fd308ade52672840ca4d2afdb655e9b97cb12b28) (details)
  2. [SPARK-27868][CORE][FOLLOWUP] Recover the default value to -1 again (commit: 830e635e67551be6f1cca019457c8fc79cc179da) (details)
  3. [SPARK-29876][SS] Delete/archive file source completed files in separate (commit: abf759a91e01497586b8bb6b7a314dd28fd6cff1) (details)
  4. [SPARK-30312][DOCS][FOLLOWUP] Add a migration guide (commit: fdbded3f71b54baee187392089705f1b619019cc) (details)
  5. [SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with (commit: 96a344511e58d6b4d3a67f800ac1ed0f8ab0c85f) (details)
  6. [SPARK-28152][DOCS][FOLLOWUP] Add a migration guide for MsSQLServer JDBC (commit: 505693c282d94ebb0f763477309f0bba90b5acbc) (details)
  7. [SPARK-30533][ML][PYSPARK] Add classes to represent Java Regressors and (commit: 3228732fd58461e36ef4d1a8e2b48887b99ebbb5) (details)
  8. [SPARK-30544][BUILD] Upgrade the version of Genjavadoc to 0.15 (commit: a3357dfccacadfb40ab103ba1ff3c0927e806dd2) (details)
  9. [SPARK-30539][PYTHON][SQL] Add DataFrame.tail in PySpark (commit: a6bdea3ad4a5dde8a68aaf1db0d870ec36040c67) (details)
  10. [MINOR][DOCS] Remove note about -T for parallel build (commit: ef1af43c9f82ad0ff2ff4e196b1017285bfa7da4) (details)
  11. [MINOR][HIVE] Pick up HIVE-22708 HTTP transport fix (commit: 789a4abfa9bd88302b23c38f399dc538dc3fb740) (details)
Commit fd308ade52672840ca4d2afdb655e9b97cb12b28 by wenchen
[SPARK-30041][SQL][WEBUI] Add Codegen Stage Id to Stage DAG
visualization in Web UI
### What changes were proposed in this pull request? SPARK-29894
provides information on the Codegen Stage Id in WEBUI for SQL Plan
graphs. Similarly, this proposes to add Codegen Stage Id in the DAG
visualization for Stage execution. DAGs for Stage execution are
available in the WEBUI under the Jobs and Stages tabs.
### Why are the changes needed? This is proposed as an aid for
drill-down analysis of complex SQL statement execution, as it is not
always easy to match parts of the SQL Plan graph with the corresponding
Stage DAG execution graph. Adding Codegen Stage Id for WholeStageCodegen
operations makes this task easier.
### Does this PR introduce any user-facing change? Stage DAG
visualization in the WEBUI will show codegen stage id for
WholeStageCodegen operations, as in the example snippet from the WEBUI,
Jobs tab  (the query used in the example is TPCDS 2.4 q14a):
![](https://issues.apache.org/jira/secure/attachment/12987461/Snippet_StagesDags_with_CodegenId%20_annotated.png)
### How was this patch tested? Manually tested, see also example
snippet.
Closes #26675 from LucaCanali/addCodegenStageIdtoStageGraph.
Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Wenchen
Fan <wenchen@databricks.com>
(commit: fd308ade52672840ca4d2afdb655e9b97cb12b28)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala (diff)
The file was modifieddocs/web-ui.md (diff)
The file was modifieddocs/img/AllStagesPageDetail4.png (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/debug/DebuggingSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlanInfo.scala (diff)
The file was modifieddocs/img/JobPageDetail2.png (diff)
The file was modifieddocs/img/AllStagesPageDetail5.png (diff)
Commit 830e635e67551be6f1cca019457c8fc79cc179da by vanzin
[SPARK-27868][CORE][FOLLOWUP] Recover the default value to -1 again
The default value for backLog set back to -1, as any other value may
break existing configuration by overriding Netty's default
io.netty.util.NetUtil#SOMAXCONN. The documentation accordingly adjusted.
See discussion thread: https://github.com/apache/spark/pull/24732
### What changes were proposed in this pull request? Partial rollback of
https://github.com/apache/spark/pull/24732 (default for backLog set back
to -1).
### Why are the changes needed? Previous change introduces backward
incompatibility by overriding default of Netty's
`io.netty.util.NetUtil#SOMAXCONN`
Closes #27230 from xCASx/master.
Authored-by: Maxim Kolesnikov <swe.kolesnikov@gmail.com> Signed-off-by:
Marcelo Vanzin <vanzin@cloudera.com>
(commit: 830e635e67551be6f1cca019457c8fc79cc179da)
The file was modifieddocs/configuration.md (diff)
The file was modifiedcommon/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java (diff)
Commit abf759a91e01497586b8bb6b7a314dd28fd6cff1 by vanzin
[SPARK-29876][SS] Delete/archive file source completed files in separate
thread
### What changes were proposed in this pull request?
[SPARK-20568](https://issues.apache.org/jira/browse/SPARK-20568) added
the possibility to clean up completed files in streaming query.
Deleting/archiving uses the main thread which can slow down processing.
In this PR I've created thread pool to handle file delete/archival. The
number of threads can be configured with
`spark.sql.streaming.fileSource.cleaner.numThreads`.
### Why are the changes needed? Do file delete/archival in separate
thread.
### Does this PR introduce any user-facing change? No.
### How was this patch tested? Existing unit tests.
Closes #26502 from gaborgsomogyi/SPARK-29876.
Authored-by: Gabor Somogyi <gabor.g.somogyi@gmail.com> Signed-off-by:
Marcelo Vanzin <vanzin@cloudera.com>
(commit: abf759a91e01497586b8bb6b7a314dd28fd6cff1)
The file was modifieddocs/structured-streaming-programming-guide.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala (diff)
Commit fdbded3f71b54baee187392089705f1b619019cc by dhyun
[SPARK-30312][DOCS][FOLLOWUP] Add a migration guide
### What changes were proposed in this pull request?
This is a followup of https://github.com/apache/spark/pull/26956 to add
a migration document for 2.4.5.
### Why are the changes needed?
New legacy configuration will restore the previous behavior safely.
### Does this PR introduce any user-facing change?
This PR updates the doc.
<img width="763" alt="screenshot"
src="https://user-images.githubusercontent.com/9700541/72639939-9da5a400-391b-11ea-87b1-14bca15db5a6.png">
### How was this patch tested?
Build the document and see the change manually.
Closes #27269 from dongjoon-hyun/SPARK-30312.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: fdbded3f71b54baee187392089705f1b619019cc)
The file was modifieddocs/sql-migration-guide.md (diff)
Commit 96a344511e58d6b4d3a67f800ac1ed0f8ab0c85f by dhyun
[SPARK-25993][SQL][TESTS] Add test cases for CREATE EXTERNAL TABLE with
subdirectories
### What changes were proposed in this pull request?
This PR aims to add these test cases for resolution of ORC table
location reported by
[SPARK-25993](https://issues.apache.org/jira/browse/SPARK-25993) also
add corresponding test cases for Parquet table.
### Why are the changes needed?
The current behavior is complex, this test case suites are designed to
prevent the accidental behavior change. This pr is rebased on master,
the original pr is [23108](https://github.com/apache/spark/pull/23108)
### Does this PR introduce any user-facing change?
No. This adds test cases only.
### How was this patch tested?
This is a new test case.
Closes #27130 from kevinyu98/spark-25993-2.
Authored-by: Kevin Yu <qyu@us.ibm.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 96a344511e58d6b4d3a67f800ac1ed0f8ab0c85f)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala (diff)
Commit 505693c282d94ebb0f763477309f0bba90b5acbc by dhyun
[SPARK-28152][DOCS][FOLLOWUP] Add a migration guide for MsSQLServer JDBC
dialect
### What changes were proposed in this pull request?
This PR adds a migration guide for MsSQLServer JDBC dialect for Apache
Spark 2.4.4 and 2.4.5.
### Why are the changes needed?
Apache Spark 2.4.4 updates the type mapping correctly according to MS
SQL Server, but missed to mention that in the migration guide. In
addition, 2.4.4 adds a configuration for the legacy behavior.
### Does this PR introduce any user-facing change?
Yes. This is a documentation change.
![screenshot](https://user-images.githubusercontent.com/9700541/72649944-d6517780-3933-11ea-92be-9d4bf38e2eda.png)
### How was this patch tested?
Manually generate and see the doc.
Closes #27270 from dongjoon-hyun/SPARK-28152-DOC.
Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: 505693c282d94ebb0f763477309f0bba90b5acbc)
The file was modifieddocs/sql-migration-guide.md (diff)
Commit 3228732fd58461e36ef4d1a8e2b48887b99ebbb5 by srowen
[SPARK-30533][ML][PYSPARK] Add classes to represent Java Regressors and
RegressionModels
### What changes were proposed in this pull request?
This PR adds:
- `pyspark.ml.regression.JavaRegressor`
- `pyspark.ml.regression.JavaRegressionModel`
classes and replaces `JavaPredictor` and `JavaPredictionModel` in
- `LinearRegression` / `LinearRegressionModel`
- `DecisionTreeRegressor` / `DecisionTreeRegressionModel` (just addition
as `JavaPredictionModel` hasn't been used)
- `RandomForestRegressor` / `RandomForestRegressionModel`  (just
addition as `JavaPredictionModel` hasn't been used)
- `GBTRegressor` / `GBTRegressionModel` (just addition as
`JavaPredictionModel` hasn't been used)
- `AFTSurvivalRegression` / `AFTSurvivalRegressionModel`
- `GeneralizedLinearRegression` / `GeneralizedLinearRegressionModel`
- `FMRegressor` / `FMRegressionModel`
### Why are the changes needed?
- Internal PySpark consistency.
- Feature parity with Scala.
- Intermediate step towards implementing
[SPARK-29212](https://issues.apache.org/jira/browse/SPARK-29212)
### Does this PR introduce any user-facing change?
It adds new base classes, so it will affect `mro`. Otherwise interfaces
should stay intact.
### How was this patch tested?
Existing tests.
Closes #27241 from zero323/SPARK-30533.
Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: Sean Owen
<srowen@gmail.com>
(commit: 3228732fd58461e36ef4d1a8e2b48887b99ebbb5)
The file was modifiedpython/pyspark/ml/regression.py (diff)
Commit a3357dfccacadfb40ab103ba1ff3c0927e806dd2 by dhyun
[SPARK-30544][BUILD] Upgrade the version of Genjavadoc to 0.15
### What changes were proposed in this pull request?
Upgrade the version of Genjavadoc from 0.14 to 0.15.
### Why are the changes needed?
To enable to build for Scala 2.13.1.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
I confirmed there is no dependency error related to genjavadoc by manual
build. Also, I generated javadoc by `LANG=C build/sbt -Pkinesis-asl
-Pyarn -Pkubernetes -Phive-thriftserver  unidoc` for both code
with/without this change and did `diff -r` target/javadoc.
Closes #27255 from sarutak/upgrade-genjavadoc.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
(commit: a3357dfccacadfb40ab103ba1ff3c0927e806dd2)
The file was modifiedproject/SparkBuild.scala (diff)
Commit a6bdea3ad4a5dde8a68aaf1db0d870ec36040c67 by dhyun
[SPARK-30539][PYTHON][SQL] Add DataFrame.tail in PySpark
### What changes were proposed in this pull request?
https://github.com/apache/spark/pull/26809 added `Dataset.tail` API. It
should be good to have it in PySpark API as well.
### Why are the changes needed?
To support consistent APIs.
### Does this PR introduce any user-facing change?
No. It adds a new API.
### How was this patch tested?
Manually tested and doctest was added.
Closes #27251 from HyukjinKwon/SPARK-30539.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(commit: a6bdea3ad4a5dde8a68aaf1db0d870ec36040c67)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
Commit ef1af43c9f82ad0ff2ff4e196b1017285bfa7da4 by dhyun
[MINOR][DOCS] Remove note about -T for parallel build
### What changes were proposed in this pull request?
Removes suggestion to use -T for parallel Maven build.
### Why are the changes needed?
Parallel builds don't necessarily work in the build right now.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
N/A
Closes #27274 from srowen/ParallelBuild.
Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: ef1af43c9f82ad0ff2ff4e196b1017285bfa7da4)
The file was modifiedREADME.md (diff)
Commit 789a4abfa9bd88302b23c38f399dc538dc3fb740 by dhyun
[MINOR][HIVE] Pick up HIVE-22708 HTTP transport fix
### What changes were proposed in this pull request?
Pick up the HTTP fix from
https://issues.apache.org/jira/browse/HIVE-22708
### Why are the changes needed?
This is a small but important fix to digest handling we should pick up
from Hive.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing tests
Closes #27273 from srowen/Hive22708.
Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
(commit: 789a4abfa9bd88302b23c38f399dc538dc3fb740)
The file was modifiedsql/hive-thriftserver/v1.2/src/main/java/org/apache/hive/service/CookieSigner.java (diff)
The file was modifiedsql/hive-thriftserver/v2.3/src/main/java/org/apache/hive/service/CookieSigner.java (diff)