1. [SPARK-24014][PYSPARK] Add onStreamingStarted method to (commit: 32bec6ca3d9e47587c84f928d4166475fe29f596) (details)
  2. [SPARK-24021][CORE] fix bug in BlacklistTracker's (commit: 7fb11176f285b4de47e61511c09acbbb79e5c44c) (details)
  3. [SPARK-23989][SQL] exchange should copy data before non-serialized (commit: fb968215ca014c5cf40097a3c4588bbee11e2c02) (details)
  4. [SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3 (commit: be184d16e86f96a748d6bf1642c1c319d2a09f5c) (details)
  5. [SPARK-24022][TEST] Make SparkContextSuite not flaky (commit: 9b562d6fef765cb8357dbc31390e60b5947a9069) (details)
Commit 32bec6ca3d9e47587c84f928d4166475fe29f596 by jerryshao
[SPARK-24014][PYSPARK] Add onStreamingStarted method to
## What changes were proposed in this pull request?
The `StreamingListener` in PySpark side seems to be lack of
`onStreamingStarted` method. This patch adds it and a test for it.
This patch also includes a trivial doc improvement for
Original PR is #21057.
## How was this patch tested?
Added test.
Author: Liang-Chi Hsieh <>
Closes #21098 from viirya/SPARK-24014.
(cherry picked from commit 8bb0df2c65355dfdcd28e362ff661c6c7ebc99c0)
Signed-off-by: jerryshao <>
(commit: 32bec6ca3d9e47587c84f928d4166475fe29f596)
The file was modifiedpython/pyspark/streaming/ (diff)
The file was modifiedpython/pyspark/streaming/ (diff)
The file was modifiedpython/pyspark/streaming/ (diff)
Commit 7fb11176f285b4de47e61511c09acbbb79e5c44c by irashid
[SPARK-24021][CORE] fix bug in BlacklistTracker's
## What changes were proposed in this pull request?
There‘s a miswrite in BlacklistTracker's updateBlacklistForFetchFailure:
``` val blacklistedExecsOnNode =
   nodeToBlacklistedExecs.getOrElseUpdate(exec, HashSet[String]())
blacklistedExecsOnNode += exec
``` where first **exec** should be **host**.
## How was this patch tested?
adjust existed test.
Author: wuyi <>
Closes #21104 from Ngone51/SPARK-24021.
(cherry picked from commit 0deaa5251326a32a3d2d2b8851193ca926303972)
Signed-off-by: Imran Rashid <>
(commit: 7fb11176f285b4de47e61511c09acbbb79e5c44c)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/BlacklistTrackerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/BlacklistTracker.scala (diff)
Commit fb968215ca014c5cf40097a3c4588bbee11e2c02 by hvanhovell
[SPARK-23989][SQL] exchange should copy data before non-serialized
## What changes were proposed in this pull request?
In Spark SQL, we usually reuse the `UnsafeRow` instance and need to copy
the data when a place buffers non-serialized objects.
Shuffle may buffer objects if we don't make it to the bypass merge
shuffle or unsafe shuffle.
`ShuffleExchangeExec.needToCopyObjectsBeforeShuffle` misses the case
that, if `spark.sql.shuffle.partitions` is large enough, we could fail
to run unsafe shuffle and go with the non-serialized shuffle.
This bug is very hard to hit since users wouldn't set such a large
number of partitions(16 million) for Spark SQL exchange.
TODO: test
## How was this patch tested?
Author: Wenchen Fan <>
Closes #21101 from cloud-fan/shuffle.
(cherry picked from commit 6e19f7683fc73fabe7cdaac4eb1982d2e3e607b7)
Signed-off-by: Herman van Hovell <>
(commit: fb968215ca014c5cf40097a3c4588bbee11e2c02)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala (diff)
Commit be184d16e86f96a748d6bf1642c1c319d2a09f5c by gatorsmile
[SPARK-23340][SQL][BRANCH-2.3] Upgrade Apache ORC to 1.4.3
## What changes were proposed in this pull request?
This PR updates Apache ORC dependencies to 1.4.3 released on February
9th. Apache ORC 1.4.2 release removes unnecessary dependencies and 1.4.3
has 5 more patches (
Especially, the following ORC-285 is fixed at 1.4.3.
```scala scala> val df = Seq(Array.empty[Float]).toDF()
scala> df.write.format("orc").save("/tmp/floatarray")
scala>"/tmp/floatarray") res1:
org.apache.spark.sql.DataFrame = [value: array<float>]
scala>"/tmp/floatarray").show() 18/02/12 22:09:10 ERROR
Executor: Exception in task 0.0 in stage 1.0 (TID 1) Error reading file:
... Caused by: Read past EOF for compressed stream
Stream for column 2 kind DATA position: 0 length: 0 range: 0 offset: 0
limit: 0
## How was this patch tested?
Pass the Jenkins test.
Author: Dongjoon Hyun <>
Closes #21093 from dongjoon-hyun/SPARK-23340-2.
(commit: be184d16e86f96a748d6bf1642c1c319d2a09f5c)
The file was modifieddev/deps/spark-deps-hadoop-2.7 (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala (diff)
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.6 (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcQuerySuite.scala (diff)
Commit 9b562d6fef765cb8357dbc31390e60b5947a9069 by vanzin
[SPARK-24022][TEST] Make SparkContextSuite not flaky
## What changes were proposed in this pull request?
SparkContextSuite.test("Cancelling stages/jobs with custom reasons.")
could stay in an infinite loop because of the problem found and fixed in
This PR solves this mentioned flakyness by removing shared variable
usages when cancel happens in a loop and using wait and CountDownLatch
for synhronization.
## How was this patch tested?
Existing unit test.
Author: Gabor Somogyi <>
Closes #21105 from gaborgsomogyi/SPARK-24022.
(cherry picked from commit e55953b0bf2a80b34127ba123417ee54955a6064)
Signed-off-by: Marcelo Vanzin <>
(commit: 9b562d6fef765cb8357dbc31390e60b5947a9069)
The file was modifiedcore/src/test/scala/org/apache/spark/SparkContextSuite.scala (diff)