SuccessChanges

Summary

  1. [SPARK-23835][SQL] Add not-null check to Tuples' arguments (commit: 9857e249c20f842868cb4681ea374b8e316c3ead) (details)
  2. [SPARK-23986][SQL] freshName can generate non-unique names (commit: 564019b926eddec39d0485217e693a1e6b4b8e14) (details)
  3. [SPARK-23948] Trigger mapstage's job listener in submitMissingTasks (commit: 6b99d5bc3f3898a0aff30468a623a3f64bb20b62) (details)
Commit 9857e249c20f842868cb4681ea374b8e316c3ead by wenchen
[SPARK-23835][SQL] Add not-null check to Tuples' arguments
deserialization
## What changes were proposed in this pull request?
There was no check on nullability for arguments of `Tuple`s. This could
lead to have weird behavior when a null value had to be deserialized
into a non-nullable Scala object: in those cases, the `null` got
silently transformed in a valid value (like `-1` for `Int`),
corresponding to the default value we are using in the SQL codebase.
This situation was very likely to happen when deserializing to a Tuple
of primitive Scala types (like Double, Int, ...).
The PR adds the `AssertNotNull` to arguments of tuples which have been
asked to be converted to non-nullable types.
## How was this patch tested?
added UT
Author: Marco Gaido <marcogaido91@gmail.com>
Closes #20976 from mgaido91/SPARK-23835.
(cherry picked from commit 0a9172a05e604a4a94adbb9208c8c02362afca00)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 9857e249c20f842868cb4681ea374b8e316c3ead)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousSinkSuite.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/ScalaReflectionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala (diff)
Commit 564019b926eddec39d0485217e693a1e6b4b8e14 by wenchen
[SPARK-23986][SQL] freshName can generate non-unique names
## What changes were proposed in this pull request?
We are using `CodegenContext.freshName` to get a unique name for any new
variable we are adding. Unfortunately, this method currently fails to
create a unique name when we request more than one instance of variables
with starting name `name1` and an instance with starting name `name11`.
The PR changes the way a new name is generated by
`CodegenContext.freshName` so that we generate unique names in this
scenario too.
## How was this patch tested?
added UT
Author: Marco Gaido <marcogaido91@gmail.com>
Closes #21080 from mgaido91/SPARK-23986.
(cherry picked from commit f39e82ce150b6a7ea038e6858ba7adbaba3cad88)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 564019b926eddec39d0485217e693a1e6b4b8e14)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
Commit 6b99d5bc3f3898a0aff30468a623a3f64bb20b62 by irashid
[SPARK-23948] Trigger mapstage's job listener in submitMissingTasks
## What changes were proposed in this pull request?
SparkContext submitted a map stage from `submitMapStage` to
`DAGScheduler`,
`markMapStageJobAsFinished` is called only in
(https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L933
and
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1314);
But think about below scenario: 1. stage0 and stage1 are all
`ShuffleMapStage` and stage1 depends on stage0; 2. We submit stage1 by
`submitMapStage`; 3. When stage 1 running, `FetchFailed` happened,
stage0 and stage1 got resubmitted as stage0_1 and stage1_1; 4. When
stage0_1 running, speculated tasks in old stage1 come as succeeded, but
stage1 is not inside `runningStages`. So even though all
splits(including the speculated tasks) in stage1 succeeded, job listener
in stage1 will not be called; 5. stage0_1 finished, stage1_1 starts
running. When `submitMissingTasks`, there is no missing tasks. But in
current code, job listener is not triggered.
We should call the job listener for map stage in `5`.
## How was this patch tested?
Not added yet.
Author: jinxing <jinxing6042126.com>
(cherry picked from commit 3990daaf3b6ca2c5a9f7790030096262efb12cb2)
Author: jinxing <jinxing6042@126.com>
Closes #21085 from squito/cp.
(commit: 6b99d5bc3f3898a0aff30468a623a3f64bb20b62)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala (diff)