SuccessChanges

Summary

  1. [SPARK-32596][CORE] Clear Ivy resolution files as part of finally block (details)
  2. [SPARK-32400][SQL] Improve test coverage of HiveScriptTransformationExec (details)
  3. [SPARK-31703][SQL] Parquet RLE float/double are read incorrectly on big (details)
  4. [SPARK-32599][SQL][TESTS] Check the TEXTFILE file format in (details)
  5. [SPARK-32250][SPARK-27510][CORE][TEST] Fix flaky MasterSuite.test(...) (details)
  6. [SPARK-32352][SQL] Partially push down support data filter if it mixed (details)
  7. [SPARK-31694][SQL] Add SupportsPartitions APIs on DataSourceV2 (details)
  8. [SPARK-31198][CORE] Use graceful decommissioning as part of dynamic (details)
Commit 2d6eb00256d1ebc7e2cd82b13286dd9571a6b331 by mridulatgmail.com
[SPARK-32596][CORE] Clear Ivy resolution files as part of finally block
<!-- Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://spark.apache.org/contributing.html
2. Ensure you have added or run the appropriate tests for your PR:
https://spark.apache.org/developer-tools.html
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g.,
'[WIP][SPARK-XXXX] Your PR title ...'.
4. Be sure to keep the PR description updated to reflect all changes.
5. Please write your PR title to summarize what this PR proposes.
6. If possible, provide a concise example to reproduce the issue for a
faster review.
7. If you want to add a new configuration, please read the guideline
first for naming configurations in
  
'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
-->
### What changes were proposed in this pull request? Clear Ivy
resolution files as part of finally block if not failures while
artifacts resolution can leave the resolution files around. Use
tempIvyPath for SparkSubmitUtils.buildIvySettings in tests. This why the
test
"SPARK-10878: test resolution files cleaned after resolving artifact"
did not capture these issues.
### Why are the changes needed? This is a bug
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Existing unit tests
Closes #29411 from venkata91/SPARK-32596.
Authored-by: Venkata krishnan Sowrirajan <vsowrirajan@linkedin.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/SparkSubmitUtilsSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala (diff)
Commit 4cf8c1d07d0e3a49fc95e1b59c5cab1aab180c77 by wenchen
[SPARK-32400][SQL] Improve test coverage of HiveScriptTransformationExec
### What changes were proposed in this pull request?
1. Extract common test case (no serde) to
BasicScriptTransformationExecSuite 2. Add more test case for no serde
mode about supported data type and behavior in
`BasicScriptTransformationExecSuite` 3. Add more test case for hive
serde mode about supported type and behavior in
`HiveScriptTransformationExecSuite`
### Why are the changes needed? Improve test coverage of Script
Transformation
### Does this PR introduce _any_ user-facing change? NO
### How was this patch tested? Added UT
Closes #29401 from AngersZhuuuu/SPARK-32400.
Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
The file was addedsql/core/src/test/resources/test_script.py
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala (diff)
The file was removedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/TestUncaughtExceptionHandler.scala
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/TestUncaughtExceptionHandler.scala
The file was removedsql/hive/src/test/resources/test_script.py
The file was addedsql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala
Commit a418548dad57775fbb10b4ea690610bad1a8bfb0 by wenchen
[SPARK-31703][SQL] Parquet RLE float/double are read incorrectly on big
endian platforms
### What changes were proposed in this pull request? This PR fixes the
issue introduced during SPARK-26985.
SPARK-26985 changes the `putDoubles()` and `putFloats()` methods to
respect the platform's endian-ness.  However, that causes the RLE paths
in VectorizedRleValuesReader.java to read the RLE entries in parquet as
BIG_ENDIAN on big endian platforms (i.e., as is), even though parquet
data is always in little endian format.
The comments in `WriteableColumnVector.java` say those methods are used
for "ieee formatted doubles in platform native endian" (or floats), but
since the data in parquet is always in little endian format, use of
those methods appears to be inappropriate.
To demonstrate the problem with spark-shell:
```scala import org.apache.spark._ import org.apache.spark.sql._ import
org.apache.spark.sql.types._
var data = Seq(
(1.0, 0.1),
(2.0, 0.2),
(0.3, 3.0),
(4.0, 4.0),
(5.0, 5.0))
var df =
spark.createDataFrame(data).write.mode(SaveMode.Overwrite).parquet("/tmp/data.parquet2")
var df2 = spark.read.parquet("/tmp/data.parquet2") df2.show()
```
result:
```scala
+--------------------+--------------------+
|                  _1|                  _2|
+--------------------+--------------------+
|           3.16E-322|-1.54234871366845...|
|         2.0553E-320|         2.0553E-320|
|          2.561E-320|          2.561E-320|
|4.66726145843124E-62|         1.0435E-320|
|        3.03865E-319|-1.54234871366757...|
+--------------------+--------------------+
```
Also tests in ParquetIOSuite that involve float/double data would fail,
e.g.,
- basic data types (without binary)
- read raw Parquet file
/examples/src/main/python/mllib/isotonic_regression_example.py would
fail as well.
Purposed code change is to add `putDoublesLittleEndian()` and
`putFloatsLittleEndian()` methods for parquet to invoke, just like the
existing `putIntsLittleEndian()` and `putLongsLittleEndian()`.  On
little endian platforms they would call `putDoubles()` and
`putFloats()`, on big endian they would read the entries as little
endian like pre-SPARK-26985.
No new unit-test is introduced as the existing ones are actually
sufficient.
### Why are the changes needed? RLE float/double data in parquet files
will not be read back correctly on big endian platforms.
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? All unit tests (mvn test) were ran and
OK.
Closes #29383 from tinhto-000/SPARK-31703.
Authored-by: Tin Hang To <tinto@us.ibm.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedPlainValuesReader.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala (diff)
The file was modifiedsql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java (diff)
Commit f664aaaab13997bf61381aecfd4703f7e32e8fa1 by gurwls223
[SPARK-32599][SQL][TESTS] Check the TEXTFILE file format in
`HiveSerDeReadWriteSuite`
### What changes were proposed in this pull request?
- Test TEXTFILE together with the PARQUET and ORC file formats in
`HiveSerDeReadWriteSuite`
- Remove the "SPARK-32594: insert dates to a Hive table" added by #29409
### Why are the changes needed?
- To improve test coverage, and test other row SerDe -
`org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe`.
- The removed test is not needed anymore because the bug reported in
SPARK-32594 is triggered by the TEXTFILE file format too.
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? By running the modified test suite
`HiveSerDeReadWriteSuite`.
Closes #29417 from MaxGekk/textfile-HiveSerDeReadWriteSuite.
Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSerDeReadWriteSuite.scala (diff)
Commit c6ea98323fd23393541efadd814a611a25fa78b2 by gurwls223
[SPARK-32250][SPARK-27510][CORE][TEST] Fix flaky MasterSuite.test(...)
in Github Actions
### What changes were proposed in this pull request?
Set more dispatcher threads for the flaky test.
### Why are the changes needed?
When running test on Github Actions machine, the available processors in
JVM  is only 2, while on Jenkins it's 32. For this specific test, 2
available processors, which also decides the number of threads in
Dispatcher, are not enough to consume the messages. In the worst
situation, `MockExecutorLaunchFailWorker` would occupy these 2 threads
for handling messages `LaunchDriver`, `LaunchExecutor` at the same time
but leave no thread for the driver to handle the message
`RegisteredApplication`. At the end, it results in a deadlock situation
and causes the test failure.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
We can check whether the test is still flaky in Github Actions after
this fix.
Closes #29408 from Ngone51/spark-32250.
Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/master/MasterSuite.scala (diff)
Commit 643cd876e4cfc7faa307db9a2d1dd1b5ca0881f1 by wenchen
[SPARK-32352][SQL] Partially push down support data filter if it mixed
in partition filters
### What changes were proposed in this pull request? We support
partially push partition filters since SPARK-28169. We can also support
partially push down data filters if it mixed in partition filters and
data filters. For example:
``` spark.sql(
s"""
    |CREATE TABLE t(i INT, p STRING)
    |USING parquet
    |PARTITIONED BY (p)""".stripMargin)
spark.range(0, 1000).selectExpr("id as
col").createOrReplaceTempView("temp") for (part <- Seq(1, 2, 3, 4)) {
sql(s"""
        |INSERT OVERWRITE TABLE t PARTITION (p='$part')
        |SELECT col FROM temp""".stripMargin)
}
spark.sql("SELECT * FROM t WHERE  WHERE (p = '1' AND i = 1) OR (p = '2'
and i = 2)").explain()
```
We can also push down ```i = 1 or i = 2 ```
### Why are the changes needed? Extract more data filter to
FileSourceScanExec
### Does this PR introduce _any_ user-facing change? NO
### How was this patch tested? Added UT
Closes #29406 from AngersZhuuuu/SPARK-32352.
Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategySuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileSourceStrategy.scala (diff)
Commit 60fa8e304d1284c269b8ac6be9a3fd65197ef6ec by wenchen
[SPARK-31694][SQL] Add SupportsPartitions APIs on DataSourceV2
### What changes were proposed in this pull request? There are no
partition Commands, such as AlterTableAddPartition supported in
DatasourceV2, it is widely used in mysql or hive or other datasources.
Thus it is necessary to defined Partition API to support these Commands.
We defined the partition API as part of Table API, as it will change
table data sometimes. And a partition is composed of identifier and
properties, while identifier is defined with InternalRow and properties
is defined as a Map.
### Does this PR introduce _any_ user-facing change? Yes. This PR will
enable user to use some partition commands
### How was this patch tested? run all tests and add some partition api
tests
Closes #28617 from stczwd/SPARK-31694.
Authored-by: stczwd <qcsd2011@163.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryAtomicPartitionTable.scala
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryPartitionTable.scala
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/SupportsPartitionManagementSuite.scala
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagementSuite.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlreadyExistException.scala (diff)
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsPartitionManagement.java
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala (diff)
Commit 548ac7c4af2270a6bdbf7a6f29f4846eecdc0171 by hkarau
[SPARK-31198][CORE] Use graceful decommissioning as part of dynamic
scaling
### What changes were proposed in this pull request?
If graceful decommissioning is enabled, Spark's dynamic scaling uses
this instead of directly killing executors.
### Why are the changes needed?
When scaling down Spark we should avoid triggering recomputes as much as
possible.
### Does this PR introduce _any_ user-facing change?
Hopefully their jobs run faster or at the same speed. It also enables
experimental shuffle service free dynamic scaling when graceful
decommissioning is enabled (using the same code as the shuffle tracking
dynamic scaling).
### How was this patch tested?
For now I've extended the ExecutorAllocationManagerSuite for both core &
streaming.
Closes #29367 from
holdenk/SPARK-31198-use-graceful-decommissioning-as-part-of-dynamic-scaling.
Lead-authored-by: Holden Karau <hkarau@apple.com> Co-authored-by: Holden
Karau <holden@pigscanfly.ca> Signed-off-by: Holden Karau
<hkarau@apple.com>
The file was modifiedresource-managers/kubernetes/docker/src/main/dockerfiles/spark/decom.sh (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManager.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/KubernetesSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/storage/BlockManagerDecommissionIntegrationSuite.scala (diff)
The file was modifiedstreaming/src/test/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManagerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/dynalloc/ExecutorMonitor.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ExecutorAllocationManagerSuite.scala (diff)
The file was modifiedproject/SparkBuild.scala (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionExtendedSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala (diff)
The file was modifiedresource-managers/kubernetes/integration-tests/tests/decommissioning.py (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala (diff)