FailedChanges

Summary

  1. [SPARK-31219][YARN] Enable closeIdleConnections in YarnShuffleService (details)
  2. [SPARK-29574][K8S][FOLLOWUP] Fix bash comparison error in Docker (details)
  3. [SPARK-30775][DOC] Improve the description of executor metrics in the (details)
  4. Revert "[SPARK-31280][SQL] Perform propagating empty relation after (details)
Commit 0d997e5156a751c99cd6f8be1528ed088a585d1f by tgraves
[SPARK-31219][YARN] Enable closeIdleConnections in YarnShuffleService
### What changes were proposed in this pull request? Close idle
connections at shuffle server side when an `IdleStateEvent` is triggered
after `spark.shuffle.io.connectionTimeout` or `spark.network.timeout`
time. It's based on following investigations.
1. We found connections on our clusters building up continuously (> 10k
for some nodes). Is that normal ? We don't think so. 2. We looked into
the connections on one node and found there were a lot of half-open
connections. (connections only existed on one node) 3. We also checked
those connections were very old (> 21 hours). (FYI,
https://superuser.com/questions/565991/how-to-determine-the-socket-connection-up-time-on-linux)
4. Looking at the code, TransportContext registers an IdleStateHandler
which should fire an IdleStateEvent when timeout. We did a heap dump of
the YarnShuffleService and checked the attributes of IdleStateHandler.
It turned out firstAllIdleEvent of many IdleStateHandlers were already
false so IdleStateEvent were already fired. 5. Finally, we realized the
IdleStateEvent would not be handled since closeIdleConnections are
hardcoded to false for YarnShuffleService.
### Why are the changes needed? Idle connections to YarnShuffleService
could never be closed, and will be accumulating and taking up memory and
file descriptors.
### Does this PR introduce any user-facing change? No.
### How was this patch tested? Existing tests.
Closes #27998 from manuzhang/spark-31219.
Authored-by: manuzhang <owenzhang1990@gmail.com> Signed-off-by: Thomas
Graves <tgraves@apache.org>
The file was modifiedcommon/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java (diff)
Commit 1d0fc9aa85b3ad3326b878de49b748413dee1dd9 by dongjoon
[SPARK-29574][K8S][FOLLOWUP] Fix bash comparison error in Docker
entrypoint.sh
### What changes were proposed in this pull request? A small change to
fix an error in Docker `entrypoint.sh`
### Why are the changes needed? When spark running on Kubernetes, I got
the following logs:
```log
+ '[' -n ']'
+ '[' -z ']'
++ /bin/hadoop classpath
/opt/entrypoint.sh: line 62: /bin/hadoop: No such file or directory
+ export SPARK_DIST_CLASSPATH=
+ SPARK_DIST_CLASSPATH=
``` This is because you are missing some quotes on bash comparisons.
### Does this PR introduce any user-facing change? No
## How was this patch tested? CI
Closes #28075 from dungdm93/patch-1.
Authored-by: Đặng Minh Dũng <dungdm93@live.com> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
The file was modifiedresource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh (diff)
Commit aa98ac52dbbe3fc2d3b152af9324a71f48439a38 by dongjoon
[SPARK-30775][DOC] Improve the description of executor metrics in the
monitoring documentation
### What changes were proposed in this pull request? This PR
(SPARK-30775) aims to improve the description of the executor metrics in
the monitoring documentation.
### Why are the changes needed? Improve and clarify monitoring
documentation by:
- adding reference to the Prometheus end point, as implemented in
[SPARK-29064]
- extending the list and descripion of executor metrics, following up
from [SPARK-27157]
### Does this PR introduce any user-facing change? Documentation update.
### How was this patch tested? n.a.
Closes #27526 from LucaCanali/docPrometheusMetricsFollowupSpark29064.
Authored-by: Luca Canali <luca.canali@cern.ch> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
The file was modifieddocs/monitoring.md (diff)
Commit cda2e30e77dfb0d5c20fdd4dd147b329257994c1 by dongjoon
Revert "[SPARK-31280][SQL] Perform propagating empty relation after
RewritePredicateSubquery"
This reverts commit f376d24ea1f40740864d38ceb424713372e7e6ce.
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala (diff)