FailedChanges

Summary

  1. [MINOR][SQL][DOCS] consistency in argument naming for time functions (commit: 6ae400ccbe79376ec9f7c25a56b691cef8f6c10f) (details)
  2. [MINOR][DOCS] Fix typo in PySpark example in ml-datasource.md (commit: c56c84af473547c9e9cab7ef6422f2b550084b59) (details)
  3. [SPARK-32270][SQL] Use TextFileFormat in CSV's schema inference with a (commit: c4b0639f830cb5184328473db65e17b3fd0e74fc) (details)
  4. [SPARK-31831][SQL][TESTS] Use subclasses for mock in (commit: ad90cbff42cce91ee106378f51552e438593c68d) (details)
  5. [SPARK-32245][INFRA][FOLLOWUP] Reenable Github Actions on commit (commit: bc3d4bacb598d57ad9d43ff8c313ff9b8132b572) (details)
  6. [SPARK-32258][SQL] NormalizeFloatingNumbers directly normalizes (commit: b6229df16c02d9edcd53bc16ee12b199aaa0ee38) (details)
  7. [SPARK-32105][SQL] Refactor current ScriptTransformationExec code (commit: 6d499647b36c45ff43e190af754a670321c6b274) (details)
  8. [SPARK-32220][SQL][FOLLOW-UP] SHUFFLE_REPLICATE_NL Hint should not (commit: 5521afbd227ecd0adf1a914698738d4ebe1bac8c) (details)
  9. [SPARK-32292][SPARK-32252][INFRA] Run the relevant tests only in GitHub (commit: 27ef3629dd96d5ee3368cdb258561ff96e907880) (details)
  10. [SPARK-32004][ALL] Drop references to slave (commit: 90ac9f975bbb73e2f020a6c310e00fe1e71b6258) (details)
Commit 6ae400ccbe79376ec9f7c25a56b691cef8f6c10f by srowen
[MINOR][SQL][DOCS] consistency in argument naming for time functions
### What changes were proposed in this pull request?
Rename documented argument `format` as `fmt`, to match the same argument
name in several other SQL date/time functions, to wit, `date_format`,
`date_trunc`, `trunc`, `to_date`, and `to_timestamp` all use `fmt`. Also
`format_string` and `printf` use the same abbreviation in their argument
`strfmt`.
### Why are the changes needed?
Consistency -- I was trying to scour the documentation for functions
with arguments using Java string formatting, it would have been nice to
rely on searching for `fmt` instead of my more manual approach.
### Does this PR introduce _any_ user-facing change?
In the documentation only
### How was this patch tested?
No tests
Closes #29007 from MichaelChirico/sql-doc-format-fmt.
Authored-by: Michael Chirico <michael.chirico@grabtaxi.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
(commit: 6ae400ccbe79376ec9f7c25a56b691cef8f6c10f)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala (diff)
Commit c56c84af473547c9e9cab7ef6422f2b550084b59 by dongjoon
[MINOR][DOCS] Fix typo in PySpark example in ml-datasource.md
### What changes were proposed in this pull request?
This PR changes `true` to `True` in the python code.
### Why are the changes needed?
The previous example is a syntax error.
### Does this PR introduce _any_ user-facing change?
Yes, but this is doc-only typo fix.
### How was this patch tested?
Manually run the example.
Closes #29073 from ChuliangXiao/patch-1.
Authored-by: Chuliang Xiao <ChuliangX@gmail.com> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: c56c84af473547c9e9cab7ef6422f2b550084b59)
The file was modifieddocs/ml-datasource.md (diff)
Commit c4b0639f830cb5184328473db65e17b3fd0e74fc by dongjoon
[SPARK-32270][SQL] Use TextFileFormat in CSV's schema inference with a
different encoding
### What changes were proposed in this pull request?
This PR proposes to use text datasource in CSV's schema inference. This
shares the same reasons of SPARK-18362, SPARK-19885 and SPARK-19918 -
we're currently using Hadoop RDD when the encoding is different, which
is unnecessary. This PR completes SPARK-18362, and address the comment
at https://github.com/apache/spark/pull/15813#discussion_r90751405.
We should better keep the code paths consistent with existing CSV and
JSON datasources as well, but this CSV schema inference with the
encoding specified is different from UTF-8 alone.
There can be another story that this PR might lead to a bug fix: Spark
session configurations, say Hadoop configurations, are not respected
during CSV schema inference when the encoding is different (but it has
to be set to Spark context for schema inference when the encoding is
different).
### Why are the changes needed?
For consistency, potentially better performance, and fixing a
potentially very corner case bug.
### Does this PR introduce _any_ user-facing change?
Virtually no.
### How was this patch tested?
Existing tests should cover.
Closes #29063 from HyukjinKwon/SPARK-32270.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: c4b0639f830cb5184328473db65e17b3fd0e74fc)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala (diff)
Commit ad90cbff42cce91ee106378f51552e438593c68d by kabhwan.opensource
[SPARK-31831][SQL][TESTS] Use subclasses for mock in
HiveSessionImplSuite
### What changes were proposed in this pull request? Fix flaky test
org.apache.spark.sql.hive.thriftserver.HiveSessionImplSuite by using
subclasses to avoid classloader issue.
### Why are the changes needed? It causes build instability.
### Does this PR introduce _any_ user-facing change? No.
### How was this patch tested? It is a fix for a flaky test, but need to
run multiple times against Jenkins.
Closes #29069 from frankyin-factual/hive-tests.
Authored-by: Frank Yin <frank@factual.com> Signed-off-by: Jungtaek Lim
(HeartSaVioR) <kabhwan.opensource@gmail.com>
(commit: ad90cbff42cce91ee106378f51552e438593c68d)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveSessionImplSuite.scala (diff)
Commit bc3d4bacb598d57ad9d43ff8c313ff9b8132b572 by dongjoon
[SPARK-32245][INFRA][FOLLOWUP] Reenable Github Actions on commit
### What changes were proposed in this pull request?
This PR reenables GitHub Action on every commit as a next step.
### Why are the changes needed?
We carefully enabled GitHub Action on every PRs, and it looks good so
far.
As we saw at https://github.com/apache/spark/pull/29072, GitHub Action
is already triggered at every commits on every PRs. Enabling GitHub
Action on `master` branch commit doesn't make a big difference. And, we
need to start to test at every commit as a next step.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manual.
Closes #29076 from dongjoon-hyun/reenable_gha_commit.
Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: bc3d4bacb598d57ad9d43ff8c313ff9b8132b572)
The file was modified.github/workflows/master.yml (diff)
Commit b6229df16c02d9edcd53bc16ee12b199aaa0ee38 by dongjoon
[SPARK-32258][SQL] NormalizeFloatingNumbers directly normalizes
IF/CaseWhen/Coalesce child expressions
### What changes were proposed in this pull request?
This patch proposes to let `NormalizeFloatingNumbers` rule directly
normalizes on certain children expressions. It could simplify expression
tree.
### Why are the changes needed?
Currently NormalizeFloatingNumbers rule treats some expressions as black
box but we can optimize it a bit by normalizing directly the inner
children expressions.
Also see
https://github.com/apache/spark/pull/28962#discussion_r448526240.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Unit tests.
Closes #29061 from viirya/SPARK-32258.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: b6229df16c02d9edcd53bc16ee12b199aaa0ee38)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingPointNumbersSuite.scala (diff)
Commit 6d499647b36c45ff43e190af754a670321c6b274 by wenchen
[SPARK-32105][SQL] Refactor current ScriptTransformationExec code
### What changes were proposed in this pull request?
* Renamed  hive transform scrip class
`hive/execution/ScriptTransformationExec` to
`hive/execution/HiveScriptTransformationExec` (don't rename file)
* Extract class `BaseScriptTransformationExec ` about common code used
across `SparkScriptTransformationExec(next pr add this)` and
`HiveScriptTransformationExec`
* Extract class `BaseScriptTransformationWriterThread` of writing data
thread across `SparkScriptTransformationWriterThread(added next for
support transform in sql/core )`  and
`HiveScriptTransformationWriterThread` ,
* `HiveScriptTransformationWriterThread` additionally supports Hive
serde format
* Rename current `Script` strategies in hive module to `HiveScript`, in
next pr will add `SparkScript` strategies for support transform in
sql/core.
Todo List;
- Support transform in sql/core base on `BaseScriptTransformationExec`,
which would run script operator in SQL mode (without Hive). The output
of script would be read as a string and column values are extracted by
using a delimiter (default : tab character)
- For Hive, by default only serde's must be used, and without hive we
can run without serde
- Cleanup past hacks that are observed (and people suggest / report),
such as
      - [Solve string value error about Date/Timestamp in
ScriptTransform](https://issues.apache.org/jira/browse/SPARK-31947)
      - [support use transform with
aggregation](https://issues.apache.org/jira/browse/SPARK-28227)
      - [support array/map as transform's
input](https://issues.apache.org/jira/browse/SPARK-22435)
- Use code-gen projection to serialize rows to output stream()
### Why are the changes needed? Support run transform in SQL mode
without hive
### Does this PR introduce any user-facing change? Yes
### How was this patch tested? Added UT
Closes #27983 from AngersZhuuuu/follow_spark_15694.
Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 6d499647b36c45ff43e190af754a670321c6b274)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala (diff)
The file was removedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/ScriptTransformationSuite.scala
The file was addedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala
Commit 5521afbd227ecd0adf1a914698738d4ebe1bac8c by dongjoon
[SPARK-32220][SQL][FOLLOW-UP] SHUFFLE_REPLICATE_NL Hint should not
change Non-Cartesian Product join result
### What changes were proposed in this pull request? follow comment
https://github.com/apache/spark/pull/29035#discussion_r453468999 Explain
for pr
### Why are the changes needed? add comment
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Not need
Closes #29084 from AngersZhuuuu/follow-spark-32220.
Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: 5521afbd227ecd0adf1a914698738d4ebe1bac8c)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala (diff)
Commit 27ef3629dd96d5ee3368cdb258561ff96e907880 by dongjoon
[SPARK-32292][SPARK-32252][INFRA] Run the relevant tests only in GitHub
Actions
### What changes were proposed in this pull request?
This PR mainly proposes to run only relevant tests just like Jenkins PR
builder does. Currently, GitHub Actions always run full tests which
wastes the resources.
In addition, this PR also fixes 3 more issues  very closely related
together while I am here.
1. The main idea here is: It reuses the existing logic embedded in
`dev/run-tests.py` which Jenkins PR builder use in order to run only the
related test cases.
2. While I am here, I fixed SPARK-32292 too to run the doc tests. It was
because other references were not available when it is cloned via
`checkoutv2`. With `fetch-depth: 0`, the history is available.
3. In addition, it fixes the `dev/run-tests.py` to match with
`python/run-tests.py` in terms of its options. Environment variables
such as `TEST_ONLY_XXX` were moved as proper options. For example,
    ```bash
   dev/run-tests.py --modules sql,core
   ```
    which is consistent with `python/run-tests.py`, for example,
    ```bash
   python/run-tests.py --modules pyspark-core,pyspark-ml
   ```
4. Lastly, also fixed the formatting issue in module specification in
the matrix:
    ```diff
   -            network_common, network_shuffle, repl, launcher
   +            network-common, network-shuffle, repl, launcher,
   ```
    which incorrectly runs build/test the modules.
### Why are the changes needed?
By running only related tests, we can hugely save the resources and
avoid unrelated flaky tests, etc. Also, now it runs the doctest of
`dev/run-tests.py` properly, the usages are similar between
`dev/run-tests.py` and `python/run-tests.py`, and run `network-common`,
`network-shuffle`, `launcher` and `examples` modules too.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Manually tested in my own forked Spark:
https://github.com/HyukjinKwon/spark/pull/7
https://github.com/HyukjinKwon/spark/pull/8
https://github.com/HyukjinKwon/spark/pull/9
https://github.com/HyukjinKwon/spark/pull/10
https://github.com/HyukjinKwon/spark/pull/11
https://github.com/HyukjinKwon/spark/pull/12
Closes #29086 from HyukjinKwon/SPARK-32292.
Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: 27ef3629dd96d5ee3368cdb258561ff96e907880)
The file was modified.github/workflows/master.yml (diff)
The file was modifieddev/run-tests.py (diff)
Commit 90ac9f975bbb73e2f020a6c310e00fe1e71b6258 by hkarau
[SPARK-32004][ALL] Drop references to slave
### What changes were proposed in this pull request?
This change replaces the world slave with alternatives matching the
context.
### Why are the changes needed?
There is no need to call things slave, we might as well use better
clearer names.
### Does this PR introduce _any_ user-facing change?
Yes, the ouput JSON does change. To allow backwards compatibility this
is an additive change. The shell scripts for starting & stopping workers
are renamed, and for backwards compatibility old scripts are added to
call through to the new ones while printing a deprecation message to
stderr.
### How was this patch tested?
Existing tests.
Closes #28864 from holdenk/SPARK-32004-drop-references-to-slave.
Lead-authored-by: Holden Karau <hkarau@apple.com> Co-authored-by: Holden
Karau <holden@pigscanfly.ca> Signed-off-by: Holden Karau
<hkarau@apple.com>
(commit: 90ac9f975bbb73e2f020a6c310e00fe1e71b6258)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/ui/DriverPage.scala (diff)
The file was modifiedsql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala (diff)
The file was addedcore/src/main/scala/org/apache/spark/storage/BlockManagerStorageEndpoint.scala
The file was modifiedcore/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/ExecutorLossReason.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerMessages.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ContextCleanerSuite.scala (diff)
The file was removedcore/src/main/scala/org/apache/spark/storage/BlockManagerSlaveEndpoint.scala
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala (diff)
The file was modifiedsbin/stop-slave.sh (diff)
The file was modifieddocs/spark-standalone.md (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala (diff)
The file was addedsbin/decommission-worker.sh
The file was modifieddocs/configuration.md (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/SparkContext.scala (diff)
The file was addedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosScheduler.scala
The file was modifiedcore/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/util/RawTextHelper.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/deploy/mesos/ui/MesosClusterPage.scala (diff)
The file was modifiedsbin/start-slave.sh (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/StandaloneDynamicAllocationSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala (diff)
The file was modifiedsbin/stop-all.sh (diff)
The file was modifiedsbin/start-all.sh (diff)
The file was addedsbin/workers.sh
The file was modifiedsbin/stop-slaves.sh (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala (diff)
The file was modifiedresource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/Utils.scala (diff)
The file was addedconf/workers.template
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala (diff)
The file was modifiedresource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterSchedulerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala (diff)
The file was modifiedresource-managers/mesos/src/test/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackendSuite.scala (diff)
The file was addedsbin/start-worker.sh
The file was modifieddocs/streaming-programming-guide.md (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerInfoSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala (diff)
The file was modifiedsbin/slaves.sh (diff)
The file was modifiedsbin/spark-daemons.sh (diff)
The file was addedsbin/stop-workers.sh
The file was modifiedcore/src/test/scala/org/apache/spark/DistributedSuite.scala (diff)
The file was modifiedsbin/start-slaves.sh (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/HeartbeatReceiver.scala (diff)
The file was removedconf/slaves.template
The file was modifieddocs/running-on-mesos.md (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/CheckpointSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala (diff)
The file was addedsbin/start-workers.sh
The file was modifiedsbin/decommission-slave.sh (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosFineGrainedSchedulerBackend.scala (diff)
The file was modifieddocs/job-scheduling.md (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala (diff)
The file was addedsbin/stop-worker.sh
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/JsonProtocol.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManager.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ExternalShuffleServiceSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala (diff)