FailedChanges

Summary

  1. [SPARK-32738][CORE][2.4] Should reduce the number of active threads if (details)
Commit b8e6fa7d86697ae9967764220c8220fad0f9d669 by mridulatgmail.com
[SPARK-32738][CORE][2.4] Should reduce the number of active threads if
fatal error happens in `Inbox.process`
This is a backport for
[pr#29580](https://github.com/apache/spark/pull/29580) to branch 2.4.
### What changes were proposed in this pull request?
Processing for `ThreadSafeRpcEndpoint` is controlled by
`numActiveThreads` in `Inbox`. Now if any fatal error happens during
`Inbox.process`, `numActiveThreads` is not reduced. Then other threads
can not process messages in that inbox, which causes the endpoint to
"hang". For other type of endpoints, we also should keep
`numActiveThreads` correct.
This problem is more serious in previous Spark 2.x versions since the
driver, executor and block manager endpoints are all thread safe
endpoints.
To fix this, we should reduce the number of active threads if fatal
error happens in `Inbox.process`.
### Why are the changes needed?
`numActiveThreads` is not correct when fatal error happens and will
cause the described problem.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Add a new test.
Closes #29764 from wzhfy/deal_with_fatal_error_2.4.
Authored-by: Zhenhua Wang <wzh_zju@163.com> Signed-off-by: Mridul
Muralidharan <mridul<at>gmail.com>
The file was modifiedcore/src/main/scala/org/apache/spark/rpc/netty/Inbox.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/rpc/netty/InboxSuite.scala (diff)