SuccessChanges

Summary

  1. [SPARK-23867][SCHEDULER] use droppedCount in logWarning (commit: 908c681c6786ef0d772a43508285cb8891fc524a) (details)
  2. [SPARK-23748][SS] Fix SS continuous process doesn't support (commit: 2995b79d6a78bf632aa4c1c99bebfc213fb31c54) (details)
  3. [SPARK-23815][CORE] Spark writer dynamic partition overwrite mode may (commit: dfdf1bb9be19bd31e398f97310391b391fabfcfd) (details)
Commit 908c681c6786ef0d772a43508285cb8891fc524a by wenchen
[SPARK-23867][SCHEDULER] use droppedCount in logWarning
## What changes were proposed in this pull request?
Get the count of dropped events for output in log message.
## How was this patch tested?
The fix is pretty trivial, but `./dev/run-tests` were run and were
successful.
Please review http://spark.apache.org/contributing.html before opening a
pull request.
vanzin cloud-fan
The contribution is my original work and I license the work to the
project under the project’s open source license.
Author: Patrick Pisciuneri <Patrick.Pisciuneri@target.com>
Closes #20977 from phpisciuneri/fix-log-warning.
(cherry picked from commit 682002b6da844ed11324ee5ff4d00fc0294c0b31)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 908c681c6786ef0d772a43508285cb8891fc524a)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala (diff)
Commit 2995b79d6a78bf632aa4c1c99bebfc213fb31c54 by tathagata.das1565
[SPARK-23748][SS] Fix SS continuous process doesn't support
SubqueryAlias issue
## What changes were proposed in this pull request?
Current SS continuous doesn't support processing on temp table or
`df.as("xxx")`, SS will throw an exception as LogicalPlan not supported,
details described in
[here](https://issues.apache.org/jira/browse/SPARK-23748).
So here propose to add this support.
## How was this patch tested?
new UT.
Author: jerryshao <sshao@hortonworks.com>
Closes #21017 from jerryshao/SPARK-23748.
(cherry picked from commit 14291b061b9b40eadbf4ed442f9a5021b8e09597)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
(commit: 2995b79d6a78bf632aa4c1c99bebfc213fb31c54)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousSuite.scala (diff)
Commit dfdf1bb9be19bd31e398f97310391b391fabfcfd by wenchen
[SPARK-23815][CORE] Spark writer dynamic partition overwrite mode may
fail to write output on multi level partition
## What changes were proposed in this pull request?
Spark introduced new writer mode to overwrite only related partitions in
SPARK-20236. While we are using this feature in our production cluster,
we found a bug when writing multi-level partitions on HDFS.
A simple test case to reproduce this issue: val df =
Seq(("1","2","3")).toDF("col1", "col2","col3")
df.write.partitionBy("col1","col2").mode("overwrite").save("/my/hdfs/location")
If HDFS location "/my/hdfs/location" does not exist, there will be no
output.
This seems to be caused by the job commit change in SPARK-20236 in
HadoopMapReduceCommitProtocol.
In the commit job process, the output has been written into staging dir
/my/hdfs/location/.spark-staging.xxx/col1=1/col2=2, and then the code
calls fs.rename to rename
/my/hdfs/location/.spark-staging.xxx/col1=1/col2=2 to
/my/hdfs/location/col1=1/col2=2. However, in our case the operation will
fail on HDFS because /my/hdfs/location/col1=1 does not exists. HDFS
rename can not create directory for more than one level.
This does not happen in the new unit test added with SPARK-20236 which
uses local file system.
We are proposing a fix. When cleaning current partition dir
/my/hdfs/location/col1=1/col2=2 before the rename op, if the delete op
fails (because /my/hdfs/location/col1=1/col2=2 may not exist), we call
mkdirs op to create the parent dir /my/hdfs/location/col1=1 (if the
parent dir does not exist) so the following rename op can succeed.
Reference: in official HDFS
document(https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html),
the rename command has precondition "dest must be root, or have a parent
that exists"
## How was this patch tested?
We have tested this patch on our production cluster and it fixed the
problem
Author: Fangshi Li <fli@linkedin.com>
Closes #20931 from fangshil/master.
(cherry picked from commit 4b07036799b01894826b47c73142fe282c607a57)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: dfdf1bb9be19bd31e398f97310391b391fabfcfd)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala (diff)