SuccessChanges

Summary

  1. [SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE (commit: 8b7098053d207a6f6ab823c21498b52626acbb73) (details)
  2. [SPARK-27065][CORE] avoid more than one active task set managers for a (commit: 877b8db25b70ffb0793a619342e7b8edda712b31) (details)
  3. [SPARK-25863][SPARK-21871][SQL] Check if code size statistics is empty (commit: dfde0c6501637cce4704ee0edd146a73f9119305) (details)
Commit 8b7098053d207a6f6ab823c21498b52626acbb73 by dhyun
[SPARK-24669][SQL] Invalidate tables in case of DROP DATABASE CASCADE
  ## What changes were proposed in this pull request? Before dropping
database refresh the tables of that database, so as to refresh all
cached entries associated with those tables. We follow the same when
dropping a table.
UT is added
Closes #23905 from Udbhav30/SPARK-24669.
Authored-by: Udbhav30 <u.agrawal30@gmail.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
(cherry picked from commit 9bddf7180e9e76e1cabc580eee23962dd66f84c3)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(commit: 8b7098053d207a6f6ab823c21498b52626acbb73)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala (diff)
Commit 877b8db25b70ffb0793a619342e7b8edda712b31 by irashid
[SPARK-27065][CORE] avoid more than one active task set managers for a
stage
## What changes were proposed in this pull request?
This is another attempt to fix the
more-than-one-active-task-set-managers bug.
https://github.com/apache/spark/pull/17208 is the first attempt. It
marks the TSM as zombie before sending a task completion event to
DAGScheduler. This is necessary, because when the DAGScheduler gets the
task completion event, and it's for the last partition, then the stage
is finished. However, if it's a shuffle stage and it has missing map
outputs, DAGScheduler will resubmit it(see the
[code](https://github.com/apache/spark/blob/v2.4.0/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1416-L1422))
and create a new TSM for this stage. This leads to more than one active
TSM of a stage and fail.
This fix has a hole: Let's say a stage has 10 partitions and 2 task set
managers: TSM1(zombie) and TSM2(active). TSM1 has a running task for
partition 10 and it completes. TSM2 finishes tasks for partitions 1-9,
and thinks he is still active because he hasn't finished partition 10
yet. However, DAGScheduler gets task completion events for all the 10
partitions and thinks the stage is finished. Then the same problem
occurs: DAGScheduler may resubmit the stage and cause more than one
actice TSM error.
https://github.com/apache/spark/pull/21131 fixed this hole by notifying
all the task set managers when a task finishes. For the above case, TSM2
will know that partition 10 is already completed, so he can mark himself
as zombie after partitions 1-9 are completed.
However, #21131 still has a hole: TSM2 may be created after the task
from TSM1 is completed. Then TSM2 can't get notified about the task
completion, and leads to the more than one active TSM error.
#22806 and #23871 are created to fix this hole. However the fix is
complicated and there are still ongoing discussions.
This PR proposes a simple fix, which can be easy to backport: mark all
existing task set managers as zombie when trying to create a new task
set manager.
After this PR, #21131 is still necessary, to avoid launching unnecessary
tasks and fix
[SPARK-25250](https://issues.apache.org/jira/browse/SPARK-25250 ).
#22806 and #23871 are its followups to fix the hole.
## How was this patch tested?
existing tests.
Closes #23927 from cloud-fan/scheduler.
Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Imran
Rashid <irashid@cloudera.com>
(cherry picked from commit cb20fbc43e7f54af1ed30b9eb6d76ca50b4eb750)
Signed-off-by: Imran Rashid <irashid@cloudera.com>
(commit: 877b8db25b70ffb0793a619342e7b8edda712b31)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala (diff)
Commit dfde0c6501637cce4704ee0edd146a73f9119305 by yamamuro
[SPARK-25863][SPARK-21871][SQL] Check if code size statistics is empty
or not in updateAndGetCompilationStats
## What changes were proposed in this pull request?
`CodeGenerator.updateAndGetCompilationStats` throws an unsupported
exception for empty code size statistics. This pr added code to check if
it is empty or not.
## How was this patch tested? Pass Jenkins.
Closes #23947 from maropu/SPARK-21871-FOLLOWUP.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by:
Takeshi Yamamuro <yamamuro@apache.org>
(commit: dfde0c6501637cce4704ee0edd146a73f9119305)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)