FailedChanges

Summary

  1. [SPARK-31448][PYTHON] Fix storage level used in persist() in (commit: 6f36db1) (details)
  2. [SPARK-32827][SQL] Add spark.sql.maxMetadataStringLength config (commit: 888b343) (details)
  3. [SPARK-32481][SQL][TESTS][FOLLOW-UP] Skip the test if trash directory (commit: 108c4c8) (details)
  4. [SPARK-32704][SQL][TESTS][FOLLOW-UP] Check any physical rule instead of (commit: b46c730) (details)
  5. [SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for (commit: 6051755) (details)
  6. [SPARK-32861][SQL] GenerateExec should require column ordering (commit: 2e3aa2f) (details)
  7. [SPARK-32888][DOCS] Add user document about header flag and RDD as path (commit: 550c1c9) (details)
  8. [SPARK-32835][PYTHON] Add withField method to the pyspark Column class (commit: e884290) (details)
  9. [SPARK-32814][PYTHON] Replace __metaclass__ field with metaclass keyword (commit: c918909) (details)
  10. [SPARK-32706][SQL] Improve cast string to decimal type (commit: 3bc13e6) (details)
  11. [SPARK-32804][LAUNCHER][FOLLOWUP] Fix SparkSubmitCommandBuilderSuite (commit: 355ab6a) (details)
  12. [SPARK-32850][CORE] Simplify the RPC message flow of decommission (commit: 56ae950) (details)
  13. [SPARK-32816][SQL] Fix analyzer bug when aggregating multiple distinct (commit: 40ef5c9) (details)
  14. [SPARK-32897][PYTHON] Don't show a deprecation warning at (commit: 657e39a) (details)
  15. [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 (commit: 7fdb571) (details)
Commit 6f36db1fa511940dd43d597b7fe337fc3d5c2558 by srowen
[SPARK-31448][PYTHON] Fix storage level used in persist() in
dataframe.py
### What changes were proposed in this pull request? Since the data is
serialized on the Python side, we should make cache() in PySpark
dataframes use StorageLevel.MEMORY_AND_DISK mode which has
deserialized=false. This change was done to `pyspark/rdd.py` as part of
SPARK-2014 but was missed from `pyspark/dataframe.py`
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Using existing tests
Closes #29242 from abhishekd0907/SPARK-31448.
Authored-by: Abhishek Dixit <abhishekdixit0907@gmail.com> Signed-off-by:
Sean Owen <srowen@gmail.com>
(commit: 6f36db1)
The file was modifiedpython/pyspark/storagelevel.py (diff)
The file was modifiedpython/pyspark/sql/dataframe.py (diff)
Commit 888b343587c98ae0252311d72e20abbca8262ab3 by wenchen
[SPARK-32827][SQL] Add spark.sql.maxMetadataStringLength config
### What changes were proposed in this pull request?
Add a new config `spark.sql.maxMetadataStringLength`. This config aims
to limit metadata value length, e.g. file location.
### Why are the changes needed?
Some metadata have been abbreviated by `...` when I tried to add some
test in `SQLQueryTestSuite`. We need to replace such value to
`notIncludedMsg`. That caused we can't replace that like location value
by `className` since the `className` has been abbreviated.
Here is a case:
``` CREATE table  explain_temp1 (key int, val int) USING PARQUET;
EXPLAIN EXTENDED SELECT sum(distinct val) FROM explain_temp1;
-- ignore parsed,analyzed,optimized
-- The output like
== Physical Plan ==
*HashAggregate(keys=[], functions=[sum(distinct cast(val#x as
bigint)#xL)], output=[sum(DISTINCT val)#xL])
+- Exchange SinglePartition, true, [id=#x]
  +- *HashAggregate(keys=[], functions=[partial_sum(distinct cast(val#x
as bigint)#xL)], output=[sum#xL])
     +- *HashAggregate(keys=[cast(val#x as bigint)#xL], functions=[],
output=[cast(val#x as bigint)#xL])
        +- Exchange hashpartitioning(cast(val#x as bigint)#xL, 4), true,
[id=#x]
           +- *HashAggregate(keys=[cast(val#x as bigint) AS cast(val#x
as bigint)#xL], functions=[], output=[cast(val#x as bigint)#xL])
              +- *ColumnarToRow
                 +- FileScan parquet default.explain_temp1[val#x]
Batched: true, DataFilters: [], Format: Parquet, Location:
InMemoryFileIndex[file:/home/runner/work/spark/spark/sql/core/spark-warehouse/org.apache.spark.sq...],
PartitionFilters: ...
```
### Does this PR introduce _any_ user-facing change?
No, a new config.
### How was this patch tested?
new test.
Closes #29688 from ulysses-you/SPARK-32827.
Authored-by: ulysses <youxiduo@weidian.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 888b343)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala (diff)
Commit 108c4c8fdc6c839bf5f43af7a55594aa024d2eb6 by gurwls223
[SPARK-32481][SQL][TESTS][FOLLOW-UP] Skip the test if trash directory
cannot be created
### What changes were proposed in this pull request?
This PR skips the test if trash directory cannot be created. It is
possible that the trash directory cannot be created, for example, by
permission. And the test fails below:
```
- SPARK-32481 Move data to trash on truncate table if enabled *** FAILED
*** (154 milliseconds)
fs.exists(trashPath) was false (DDLSuite.scala:3184)
org.scalatest.exceptions.TestFailedException:
at
org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
at
org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)
at
org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
at
org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:503)
```
### Why are the changes needed?
To make the tests pass independently.
### Does this PR introduce _any_ user-facing change?
No, test-only.
### How was this patch tested?
Manually tested.
Closes #29759 from HyukjinKwon/SPARK-32481.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 108c4c8)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
Commit b46c7302db73ee3671035ccfd8f51297b4d5e10e by gurwls223
[SPARK-32704][SQL][TESTS][FOLLOW-UP] Check any physical rule instead of
a specific rule in the test
### What changes were proposed in this pull request?
This PR only checks if there's any physical rule runs instead of a
specific rule. This is rather just a trivial fix to make the tests more
robust.
In fact, I faced a test failure from a in-house fork that applies a
different physical rule that makes `CollapseCodegenStages` ineffective.
### Why are the changes needed?
To make the test more robust by unrelated changes.
### Does this PR introduce _any_ user-facing change?
No, test-only
### How was this patch tested?
Manually tested. Jenkins tests should pass.
Closes #29766 from HyukjinKwon/SPARK-32704.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: b46c730)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala (diff)
Commit 6051755bfe23a0e4564bf19476ec34cd7fd6008d by yamamuro
[SPARK-32688][SQL][TEST] Add special values to LiteralGenerator for
float and double
### What changes were proposed in this pull request?
The `LiteralGenerator` for float and double datatypes was supposed to
yield special values (NaN, +-inf) among others, but the `Gen.chooseNum`
method does not yield values that are outside the defined range. The
`Gen.chooseNum` for a wide range of floats and doubles does not yield
values in the "everyday" range as stated in
https://github.com/typelevel/scalacheck/issues/113 .
There is an similar class `RandomDataGenerator` that is used in some
other tests. Added `-0.0` and `-0.0f` as special values to there too.
These changes revealed an inconsistency with the equality check between
`-0.0` and `0.0`.
### Why are the changes needed?
The `LiteralGenerator` is mostly used in the
`checkConsistencyBetweenInterpretedAndCodegen` method in
`MathExpressionsSuite`. This change would have caught the bug fixed in
#29495 .
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Locally reverted #29495 and verified that the existing test cases caught
the bug.
Closes #29515 from tanelk/SPARK-32688.
Authored-by: Tanel Kiis <tanel.kiis@gmail.com> Signed-off-by: Takeshi
Yamamuro <yamamuro@apache.org>
(commit: 6051755)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala (diff)
Commit 2e3aa2f0232a539346da3df8a20cd8e7c2b7dd4f by wenchen
[SPARK-32861][SQL] GenerateExec should require column ordering
### What changes were proposed in this pull request? This PR updates the
`RemoveRedundantProjects` rule to make `GenerateExec` require column
ordering.
### Why are the changes needed?
`GenerateExec` was originally considered as a node that does not require
column ordering. However, `GenerateExec` binds its input rows directly
with its `requiredChildOutput` without using the child's output schema.
In `doExecute()`:
```scala val proj = UnsafeProjection.create(output, output)
``` In `doConsume()`:
```scala val values = if (requiredChildOutput.nonEmpty) {
input
} else {
Seq.empty
}
``` In this case, changing input column ordering will result in
`GenerateExec` binding the wrong schema to the input columns. For
example, if we do not require child columns to be ordered, the
`requiredChildOutput` [a, b, c] will directly bind to the schema of the
input columns [c, b, a], which is incorrect:
``` GenerateExec explode(array(a, b, c)), [a, b, c], false, [d]
HashAggregate(keys=[a, b, c], functions=[], output=[c, b, a])
   ...
```
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested? Unit test
Closes #29734 from allisonwang-db/generator.
Authored-by: allisonwang-db
<66282705+allisonwang-db@users.noreply.github.com> Signed-off-by:
Wenchen Fan <wenchen@databricks.com>
(commit: 2e3aa2f)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantProjects.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantProjectsSuite.scala (diff)
Commit 550c1c9cfb5e6439cdd835388fe90a9ca1ebc695 by gurwls223
[SPARK-32888][DOCS] Add user document about header flag and RDD as path
for reading CSV
### What changes were proposed in this pull request?
This proposes to enhance user document of the API for loading a Dataset
of strings storing CSV rows. If the header option is set to true, the
API will remove all lines same with the header.
### Why are the changes needed?
This behavior can confuse users. We should explicitly document it.
### Does this PR introduce _any_ user-facing change?
No. Only doc change.
### How was this patch tested?
Only doc change.
Closes #29765 from viirya/SPARK-32888.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 550c1c9)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
Commit e88429058723572b95502fd369f7c2c609c561e6 by gurwls223
[SPARK-32835][PYTHON] Add withField method to the pyspark Column class
### What changes were proposed in this pull request?
This PR adds a `withField` method on the pyspark Column class to call
the Scala API method added in
https://github.com/apache/spark/pull/27066.
### Why are the changes needed?
To update the Python API to match a new feature in the Scala API.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New unit test
Closes #29699 from Kimahriman/feature/pyspark-with-field.
Authored-by: Adam Binford <adam.binford@radiantsolutions.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: e884290)
The file was modifiedpython/pyspark/sql/tests/test_column.py (diff)
The file was modifiedpython/pyspark/sql/column.py (diff)
Commit c918909c1a173505e9150f01ac7882fc621cd769 by gurwls223
[SPARK-32814][PYTHON] Replace __metaclass__ field with metaclass keyword
### What changes were proposed in this pull request?
Replace `__metaclass__` fields with `metaclass` keyword in the class
statements.
### Why are the changes needed?
`__metaclass__` is no longer supported in Python 3. This means, for
example, that types are no longer handled as singletons.
```
>>> from pyspark.sql.types import BooleanType
>>> BooleanType() is BooleanType() False
```
and classes, which suppose to be abstract, are not
```
>>> import inspect
>>> from pyspark.ml import Estimator
>>> inspect.isabstract(Estimator) False
```
### Does this PR introduce _any_ user-facing change?
Yes (classes which were no longer abstract or singleton in Python 3, are
now), though visible changes should be consider a bug-fix.
### How was this patch tested?
Existing tests.
Closes #29664 from zero323/SPARK-32138-FOLLOW-UP-METACLASS.
Authored-by: zero323 <mszymkiewicz@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
(commit: c918909)
The file was modifiedpython/pyspark/ml/classification.py (diff)
The file was modifiedpython/pyspark/ml/base.py (diff)
The file was modifiedpython/pyspark/ml/param/__init__.py (diff)
The file was modifiedpython/pyspark/sql/types.py (diff)
The file was modifiedpython/pyspark/ml/evaluation.py (diff)
The file was modifiedpython/pyspark/ml/wrapper.py (diff)
The file was modifiedpython/pyspark/ml/regression.py (diff)
Commit 3bc13e641257182dde097d759555698701a2fcc3 by wenchen
[SPARK-32706][SQL] Improve cast string to decimal type
### What changes were proposed in this pull request?
This pr makes cast string type to decimal decimal type fast fail if
precision larger that 38.
### Why are the changes needed?
It is very slow if precision very large.
Benchmark and benchmark result:
```scala import org.apache.spark.benchmark.Benchmark val bd1 = new
java.math.BigDecimal("6.0790316E+25569151") val bd2 = new
java.math.BigDecimal("6.0790316E+25");
val benchmark = new Benchmark("Benchmark string to decimal", 1,
minNumIters = 2) benchmark.addCase(bd1.toString) { _ =>
println(Decimal(bd1).precision)
} benchmark.addCase(bd2.toString) { _ =>
println(Decimal(bd2).precision)
} benchmark.run()
```
``` Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.6
Intel(R) Core(TM) i9-9980HK CPU  2.40GHz Benchmark string to decimal:  
          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per
Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
6.0790316E+25569151                                9340           9381 
       57          0.0  9340094625.0       1.0X 6.0790316E+25          
                             0              0           0          0.5 
     2150.0 4344230.1X
``` Stacktrace:
![image](https://user-images.githubusercontent.com/5399861/92941705-4c868980-f483-11ea-8a15-b93acde8c0f4.png)
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Unit test and benchmark test: Dataset | Before this pr (Seconds) | After
this pr (Seconds)
-- | -- | --
https://issues.apache.org/jira/secure/attachment/13011406/part-00000.parquet
| 2640 | 2
Closes #29731 from wangyum/SPARK-32706.
Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 3bc13e6)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/types/DecimalSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala (diff)
Commit 355ab6ae94a972011d56b8449c612fd7ad30d860 by gurwls223
[SPARK-32804][LAUNCHER][FOLLOWUP] Fix SparkSubmitCommandBuilderSuite
test failure without jars
### What changes were proposed in this pull request?
It's a followup of https://github.com/apache/spark/pull/29653. Tests in
`SparkSubmitCommandBuilderSuite` may fail if you didn't build first and
have jars before test, so if `isTesting` we should set a dummy
`SparkLauncher.NO_RESOURCE`.
### Why are the changes needed?
Fix tests failure.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
mvn clean test (test without jars built first).
Closes #29769 from KevinSmile/bug-fix-master.
Authored-by: KevinSmile <kevinwang013@hotmail.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: 355ab6a)
The file was modifiedlauncher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java (diff)
The file was modifiedlauncher/src/test/java/org/apache/spark/launcher/SparkSubmitCommandBuilderSuite.java (diff)
Commit 56ae95053df4afa9764df3f1d88f300896ca0183 by wenchen
[SPARK-32850][CORE] Simplify the RPC message flow of decommission
### What changes were proposed in this pull request?
This PR cleans up the RPC message flow among the multiple decommission
use cases, it includes changes:
* Keep `Worker`'s decommission status be consistent between the case
where decommission starts from `Worker` and the case where decommission
starts from the `MasterWebUI`: sending `DecommissionWorker` from
`Master` to `Worker` in the latter case.
* Change from two-way communication to one-way communication when
notifying decommission between driver and executor: it's obviously
unnecessary for the executor to acknowledge the decommission status to
the driver since the decommission request is from the driver. And it's
same in reverse.
* Only send one message instead of
two(`DecommissionSelf`/`DecommissionBlockManager`) when decommission the
executor: executor and `BlockManager` are in the same JVM.
* Clean up codes around here.
### Why are the changes needed?
Before:
<img width="1948" alt="WeChat56c00cc34d9785a67a544dca036d49da"
src="https://user-images.githubusercontent.com/16397174/92850308-dc461c80-f41e-11ea-8ac0-287825f4e0c4.png">
After:
<img width="1968" alt="WeChat05f7afb017e3f0132394c5e54245e49e"
src="https://user-images.githubusercontent.com/16397174/93189571-de88dd80-f774-11ea-9300-1943920aa27d.png">
(Note the diagrams only counts those RPC calls that needed to go through
the network. Local RPC calls are not counted here.)
After this change, We reduced 6 original RPC calls and added one more
RPC call for keeping the consistent decommission status for the Worker.
And the RPC flow becomes more clear.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Updated existing tests.
Closes #29722 from Ngone51/simplify-decommission-rpc.
Authored-by: yi.wu <yi.wu@databricks.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
(commit: 56ae950)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/client/AppClientSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManager.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/DecommissionWorkerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/storage/BlockManagerStorageEndpoint.scala (diff)
The file was modifiedstreaming/src/test/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManagerSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationClient.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/master/Master.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/worker/Worker.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/DeployMessage.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/cluster/StandaloneSchedulerBackend.scala (diff)
Commit 40ef5c91ade906b38169f959b3991ce8b0f45154 by wenchen
[SPARK-32816][SQL] Fix analyzer bug when aggregating multiple distinct
DECIMAL columns
### What changes were proposed in this pull request? This PR fixes a
conflict between `RewriteDistinctAggregates` and `DecimalAggregates`. In
some cases, `DecimalAggregates` will wrap the decimal column to
`UnscaledValue` using different rules for different aggregates.
This means, same distinct column with different aggregates will change
to different distinct columns after `DecimalAggregates`. For example:
`avg(distinct decimal_col), sum(distinct decimal_col)` may change to
`avg(distinct UnscaledValue(decimal_col)), sum(distinct decimal_col)`
We assume after `RewriteDistinctAggregates`, there will be at most one
distinct column in aggregates, but `DecimalAggregates` breaks this
assumption. To fix this, we have to switch the order of these two rules.
### Why are the changes needed? bug fix
### Does this PR introduce _any_ user-facing change? no
### How was this patch tested? added test cases
Closes #29673 from linhongliu-db/SPARK-32816.
Authored-by: Linhong Liu <linhong.liu@databricks.com> Signed-off-by:
Wenchen Fan <wenchen@databricks.com>
(commit: 40ef5c9)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/inputs/group-by.sql (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/group-by.sql.out (diff)
Commit 657e39a3346daf0c67cff3cf90fe68176c479747 by ueshin
[SPARK-32897][PYTHON] Don't show a deprecation warning at
SparkSession.builder.getOrCreate
### What changes were proposed in this pull request?
In PySpark shell, if you call `SparkSession.builder.getOrCreate` as
below:
```python import warnings from pyspark.sql import SparkSession,
SQLContext warnings.simplefilter('always', DeprecationWarning)
spark.stop() SparkSession.builder.getOrCreate()
```
it shows the deprecation warning as below:
```
/.../spark/python/pyspark/sql/context.py:72: DeprecationWarning:
Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
DeprecationWarning)
```
via
https://github.com/apache/spark/blob/d3304268d3046116d39ec3d54a8e319dce188f36/python/pyspark/sql/session.py#L222
We shouldn't print the deprecation warning from it. This is the only
place ^.
### Why are the changes needed?
To prevent to inform users that `SparkSession.builder.getOrCreate` is
deprecated mistakenly.
### Does this PR introduce _any_ user-facing change?
Yes, it won't show a deprecation warning to end users for calling
`SparkSession.builder.getOrCreate`.
### How was this patch tested?
Manually tested as above.
Closes #29768 from HyukjinKwon/SPARK-32897.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Takuya
UESHIN <ueshin@databricks.com>
(commit: 657e39a)
The file was modifiedpython/pyspark/sql/context.py (diff)
Commit 7fdb57196313b0dfce1695fa4c165cf8998efbba by srowen
[SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13
### What changes were proposed in this pull request? This pr fix failed
cases in sql hive module in Scala 2.13 as follow:
- HiveSchemaInferenceSuite (1 FAILED -> PASS)
- HiveSparkSubmitSuite (1 FAILED-> PASS)
- StatisticsSuite (1 FAILED-> PASS)
- HiveDDLSuite (1 FAILED-> PASS)
After this patch all test passed in sql hive module in Scala 2.13.
### Why are the changes needed? We need to support a Scala 2.13 build.
### Does this PR introduce _any_ user-facing change? No
### How was this patch tested?
- Scala 2.12: Pass the Jenkins or GitHub Action
- Scala 2.13: All tests passed.
Do the following:
``` dev/change-scala-version.sh 2.13 mvn clean install -DskipTests -pl
sql/hive -am -Pscala-2.13 -Phive mvn clean test -pl sql/hive
-Pscala-2.13 -Phive
```
**Before**
``` Tests: succeeded 3662, failed 4, canceled 0, ignored 601, pending 0
*** 4 TESTS FAILED ***
```
**After**
``` Tests: succeeded 3666, failed 0, canceled 0, ignored 601, pending 0
All tests passed.
```
Closes #29760 from LuciferYang/sql-hive-test.
Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Sean Owen
<srowen@gmail.com>
(commit: 7fdb571)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSchemaInferenceSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala (diff)
The file was addedsql/hive/src/test/resources/regression-test-SPARK-8489/test-2.13.jar
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveSparkSubmitSuite.scala (diff)