SuccessChanges

Summary

  1. [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky (commit: f87785a76f443bcf996c60d190afc29aa2e3b6e4) (details)
  2. [SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not reduce starting (commit: 3a22feab4dc9f0cffe3aaec692e27ab277666507) (details)
Commit f87785a76f443bcf996c60d190afc29aa2e3b6e4 by wenchen
[SPARK-23775][TEST] Make DataFrameRangeSuite not flaky
## What changes were proposed in this pull request?
DataFrameRangeSuite.test("Cancelling stage in a query with Range.")
stays sometimes in an infinite loop and times out the build.
There were multiple issues with the test:
1. The first valid stageId is zero when the test started alone and not
in a suite and the following code waits until timeout:
``` eventually(timeout(10.seconds), interval(1.millis)) {
assert(DataFrameRangeSuite.stageToKill > 0)
}
```
2. The `DataFrameRangeSuite.stageToKill` was overwritten by the task's
thread after the reset which ended up in canceling the same stage 2
times. This caused the infinite wait.
This PR solves this mentioned flakyness by removing the shared
`DataFrameRangeSuite.stageToKill` and using `onTaskStart` where stage ID
is provided. In order to make sure cancelStage called for all stages
`waitUntilEmpty` is called on `ListenerBus`.
In [PR20888](https://github.com/apache/spark/pull/20888) this tried to
get solved by:
* Stopping the executor thread with `wait`
* Wait for all `cancelStage` called
* Kill the executor thread by setting
`SparkContext.SPARK_JOB_INTERRUPT_ON_CANCEL`
but the thread killing left the shared `SparkContext` sometimes in a
state where further jobs can't be submitted. As a result
DataFrameRangeSuite.test("Cancelling stage in a query with Range.") test
passed properly but the next test inside the suite was hanging.
## How was this patch tested?
Existing unit test executed 10k times.
Author: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Closes #21214 from gaborgsomogyi/SPARK-23775_1.
(cherry picked from commit c5981976f1d514a3ad8a684b9a21cebe38b786fa)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: f87785a76f443bcf996c60d190afc29aa2e3b6e4)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala (diff)
Commit 3a22feab4dc9f0cffe3aaec692e27ab277666507 by ybliang8
[SPARK-23291][SQL][R][BRANCH-2.3] R's substr should not reduce starting
position by 1 when calling Scala API
## What changes were proposed in this pull request?
This PR backports
https://github.com/apache/spark/commit/24b5c69ee3feded439e5bb6390e4b63f503eeafe
and https://github.com/apache/spark/pull/21249
There's no conflict but I opened this just to run the test and for sure.
See the discussion in https://issues.apache.org/jira/browse/SPARK-23291
## How was this patch tested?
Jenkins tests.
Author: hyukjinkwon <gurwls223@apache.org> Author: Liang-Chi Hsieh
<viirya@gmail.com>
Closes #21250 from HyukjinKwon/SPARK-23291-backport.
(commit: 3a22feab4dc9f0cffe3aaec692e27ab277666507)
The file was modifieddocs/sparkr.md (diff)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL.R (diff)
The file was modifiedR/pkg/R/column.R (diff)