SuccessChanges

Summary

  1. [SPARK-23408][SS][BRANCH-2.3] Synchronize successive AddData actions in (commit: abce8460a89ed1557d82100b68724c5ddac9b230) (details)
  2. [SPARK-23491][SS] Remove explicit job cancellation from (commit: 7f13fd0c5a79ab21c4ace2445127e6c69a7f745c) (details)
  3. [SPARK-23416][SS] Add a specific stop method for ContinuousExecution. (commit: 55d5a19c8e01de945c4c9e42752ed132df4b9110) (details)
Commit abce8460a89ed1557d82100b68724c5ddac9b230 by sean.owen
[SPARK-23408][SS][BRANCH-2.3] Synchronize successive AddData actions in
Streaming*JoinSuite
## What changes were proposed in this pull request?
**The best way to review this PR is to ignore whitespace/indent changes.
Use this link - https://github.com/apache/spark/pull/20650/files?w=1**
The stream-stream join tests add data to multiple sources and expect it
all to show up in the next batch. But there's a race condition; the new
batch might trigger when only one of the AddData actions has been
reached.
Prior attempt to solve this issue by jose-torres in #20646 attempted to
simultaneously synchronize on all memory sources together when
consecutive AddData was found in the actions. However, this carries the
risk of deadlock as well as unintended modification of stress tests (see
the above PR for a detailed explanation). Instead, this PR attempts the
following.
- A new action called `StreamProgressBlockedActions` that allows
multiple actions to be executed while the streaming query is blocked
from making progress. This allows data to be added to multiple sources
that are made visible simultaneously in the next batch.
- An alias of `StreamProgressBlockedActions` called `MultiAddData` is
explicitly used in the `Streaming*JoinSuites` to add data to two memory
sources simultaneously.
This should avoid unintentional modification of the stress tests (or any
other test for that matter) while making sure that the flaky tests are
deterministic.
NOTE: This patch is modified a bit from origin PR (#20650) to cover DSv2
incompatibility between Spark 2.3 and 2.4: StreamingDataSourceV2Relation
is a class for 2.3, whereas it is a case class for 2.4
## How was this patch tested?
Modified test cases in `Streaming*JoinSuites` where there are
consecutive `AddData` actions.
Closes #23757 from
HeartSaVioR/fix-streaming-join-test-flakiness-branch-2.3.
Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com>
Co-authored-by: Tathagata Das <tathagata.das1565@gmail.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
(commit: abce8460a89ed1557d82100b68724c5ddac9b230)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamTest.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala (diff)
Commit 7f13fd0c5a79ab21c4ace2445127e6c69a7f745c by yamamuro
[SPARK-23491][SS] Remove explicit job cancellation from
ContinuousExecution reconfiguring
## What changes were proposed in this pull request?
Remove queryExecutionThread.interrupt() from ContinuousExecution. As
detailed in the JIRA, interrupting the thread is only relevant in the
microbatch case; for continuous processing the query execution can
quickly clean itself up without.
## How was this patch tested?
existing tests
Author: Jose Torres <jose@databricks.com>
Closes #20622 from jose-torres/SPARK-23441.
(commit: 7f13fd0c5a79ab21c4ace2445127e6c69a7f745c)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala (diff)
Commit 55d5a19c8e01de945c4c9e42752ed132df4b9110 by yamamuro
[SPARK-23416][SS] Add a specific stop method for ContinuousExecution.
## What changes were proposed in this pull request?
Add a specific stop method for ContinuousExecution. The previous
StreamExecution.stop() method had a race condition as applied to
continuous processing: if the cancellation was round-tripped to the
driver too quickly, the generic SparkException it caused would be
reported as the query death cause. We earlier decided that
SparkException should not be added to the
StreamExecution.isInterruptionException() whitelist, so we need to
ensure this never happens instead.
## How was this patch tested?
Existing tests. I could consistently reproduce the previous flakiness by
putting Thread.sleep(1000) between the first job cancellation and thread
interruption in StreamExecution.stop().
Author: Jose Torres <torres.joseph.f+github@gmail.com>
Closes #21384 from jose-torres/fixKafka.
(commit: 55d5a19c8e01de945c4c9e42752ed132df4b9110)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala (diff)