SuccessChanges

Summary

  1. [SPARK-23614][SQL] Fix incorrect reuse exchange when caching is used (commit: 1d0d0a5fc7ee009443797feb48823eb215d1940a) (details)
  2. [MINOR][R] Fix R lint failure (commit: 45761ceb241cdf48b858ff28034a049a0ac62ea3) (details)
  3. [SPARK-23769][CORE] Remove comments that unnecessarily disable (commit: ce0fbec6857a09b095c492bb93d9ebaaa1e5c074) (details)
  4. [SPARK-23759][UI] Unable to bind Spark UI to specific host name / IP (commit: ea44783ad479ea7c66abc2c280f2a3abf2a4d3af) (details)
Commit 1d0d0a5fc7ee009443797feb48823eb215d1940a by wenchen
[SPARK-23614][SQL] Fix incorrect reuse exchange when caching is used
## What changes were proposed in this pull request?
We should provide customized canonicalize plan for `InMemoryRelation`
and `InMemoryTableScanExec`. Otherwise, we can wrongly treat two
different cached plans as same result. It causes wrongly reused exchange
then.
For a test query like this:
```scala val cached = spark.createDataset(Seq(TestDataUnion(1, 2, 3),
TestDataUnion(4, 5, 6))).cache() val group1 =
cached.groupBy("x").agg(min(col("y")) as "value") val group2 =
cached.groupBy("x").agg(min(col("z")) as "value") group1.union(group2)
```
Canonicalized plans before:
First exchange:
``` Exchange hashpartitioning(none#0, 5)
+- *(1) HashAggregate(keys=[none#0], functions=[partial_min(none#1)],
output=[none#0, none#4])
  +- *(1) InMemoryTableScan [none#0, none#1]
        +- InMemoryRelation [x#4253, y#4254, z#4255], true, 10000,
StorageLevel(disk, memory, deserialized, 1 replicas)
              +- LocalTableScan [x#4253, y#4254, z#4255]
```
Second exchange:
``` Exchange hashpartitioning(none#0, 5)
+- *(3) HashAggregate(keys=[none#0], functions=[partial_min(none#1)],
output=[none#0, none#4])
  +- *(3) InMemoryTableScan [none#0, none#1]
        +- InMemoryRelation [x#4253, y#4254, z#4255], true, 10000,
StorageLevel(disk, memory, deserialized, 1 replicas)
              +- LocalTableScan [x#4253, y#4254, z#4255]
```
You can find that they have the canonicalized plans are the same,
although we use different columns in two `InMemoryTableScan`s.
Canonicalized plan after:
First exchange:
``` Exchange hashpartitioning(none#0, 5)
+- *(1) HashAggregate(keys=[none#0], functions=[partial_min(none#1)],
output=[none#0, none#4])
  +- *(1) InMemoryTableScan [none#0, none#1]
        +- InMemoryRelation [none#0, none#1, none#2], true, 10000,
StorageLevel(memory, 1 replicas)
              +- LocalTableScan [none#0, none#1, none#2]
```
Second exchange:
``` Exchange hashpartitioning(none#0, 5)
+- *(3) HashAggregate(keys=[none#0], functions=[partial_min(none#1)],
output=[none#0, none#4])
  +- *(3) InMemoryTableScan [none#0, none#2]
        +- InMemoryRelation [none#0, none#1, none#2], true, 10000,
StorageLevel(memory, 1 replicas)
              +- LocalTableScan [none#0, none#1, none#2]
```
## How was this patch tested?
Added unit test.
Author: Liang-Chi Hsieh <viirya@gmail.com>
Closes #20831 from viirya/SPARK-23614.
(cherry picked from commit b2edc30db1dcc6102687d20c158a2700965fdf51)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 1d0d0a5fc7ee009443797feb48823eb215d1940a)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala (diff)
Commit 45761ceb241cdf48b858ff28034a049a0ac62ea3 by hyukjinkwon
[MINOR][R] Fix R lint failure
## What changes were proposed in this pull request?
The lint failure bugged me:
```R R/SQLContext.R:715:97: style: Trailing whitespace is superfluous.
#'        file-based streaming data source. \code{timeZone} to indicate
a timezone to be used to
                                                                       
                      ^ tests/fulltests/test_streaming.R:239:45: style:
Commas should always have a space after.
expect_equal(times[order(times$eventTime),][1, 2], 2)
                                           ^ lintr checks failed.
```
and I actually saw
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-ubuntu-test/500/console
too. If I understood correctly, there is a try about moving to Unbuntu
one.
## How was this patch tested?
Manually tested by `./dev/lint-r`:
```
... lintr checks passed.
```
Author: hyukjinkwon <gurwls223@apache.org>
Closes #20879 from HyukjinKwon/minor-r-lint.
(cherry picked from commit 92e952557dbd8a170d66d615e25c6c6a8399dd43)
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
(commit: 45761ceb241cdf48b858ff28034a049a0ac62ea3)
The file was modifiedR/pkg/tests/fulltests/test_streaming.R (diff)
The file was modifiedR/pkg/R/SQLContext.R (diff)
Commit ce0fbec6857a09b095c492bb93d9ebaaa1e5c074 by hyukjinkwon
[SPARK-23769][CORE] Remove comments that unnecessarily disable
Scalastyle check
## What changes were proposed in this pull request?
We re-enabled the Scalastyle checker on a line of code. It was
previously disabled, but it does not violate any of the rules. So
there's no reason to disable the Scalastyle checker here.
## How was this patch tested?
We tested this by running `build/mvn scalastyle:check` after removing
the comments that disable the checker. This check passed with no errors
or warnings for Spark Core
```
[INFO]
[INFO]
------------------------------------------------------------------------
[INFO] Building Spark Project Core 2.4.0-SNAPSHOT
[INFO]
------------------------------------------------------------------------
[INFO]
[INFO] --- scalastyle-maven-plugin:1.0.0:check (default-cli)
spark-core_2.11 --- Saving to outputFile=<path to local
dir>/spark/core/target/scalastyle-output.xml Processed 485 file(s) Found
0 errors Found 0 warnings Found 0 infos
``` We did not run all tests (with `dev/run-tests`) since this
Scalastyle check seemed sufficient.
## Co-contributors: chialun-yeh Hrayo712 vpourquie
Author: arucard21 <arucard21@gmail.com>
Closes #20880 from arucard21/scalastyle_util.
(cherry picked from commit 6ac4fba69290e1c7de2c0a5863f224981dedb919)
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
(commit: ce0fbec6857a09b095c492bb93d9ebaaa1e5c074)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockReplicationPolicy.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/util/CompletionIterator.scala (diff)
Commit ea44783ad479ea7c66abc2c280f2a3abf2a4d3af by vanzin
[SPARK-23759][UI] Unable to bind Spark UI to specific host name / IP
## What changes were proposed in this pull request?
Fixes SPARK-23759 by moving connector.start() after connector.setHost()
Problem was created due connector.setHost(hostName) call was after
connector.start()
## How was this patch tested?
Patch was tested after build and deployment. This patch requires
SPARK_LOCAL_IP environment variable to be set on spark-env.sh
Author: bag_of_tricks <falbani@hortonworks.com>
Closes #20883 from felixalbani/SPARK-23759.
(cherry picked from commit 8b56f16640fc4156aa7bd529c54469d27635b951)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(commit: ea44783ad479ea7c66abc2c280f2a3abf2a4d3af)
The file was modifiedcore/src/main/scala/org/apache/spark/ui/JettyUtils.scala (diff)