SuccessChanges

Summary

  1. [MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSuite and (commit: bad56bb7b2340d338eac8cea07e9f1bb3e08b1ac) (details)
  2. [SPARK-24934][SQL] Explicitly whitelist supported types in upper/lower (commit: aa51c070f8944fd2aa94ac891b45ff51ffcc1ef2) (details)
  3. [SPARK-24957][SQL] Average with decimal followed by aggregation returns (commit: 25ea27b09147ba1e5bd3ba654bce35d7008fa607) (details)
Commit bad56bb7b2340d338eac8cea07e9f1bb3e08b1ac by hyukjinkwon
[MINOR][CORE][TEST] Fix afterEach() in TastSetManagerSuite and
TaskSchedulerImplSuite
## What changes were proposed in this pull request?
In the `afterEach()` method of both `TastSetManagerSuite` and
`TaskSchedulerImplSuite`, `super.afterEach()` shall be called at the
end, because it shall stop the SparkContext.
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93706/testReport/org.apache.spark.scheduler/TaskSchedulerImplSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
The test failure is caused by the above reason, the newly added
`barrierCoordinator` required `rpcEnv` which has been stopped before
`TaskSchedulerImpl` doing cleanup.
## How was this patch tested? Existing tests.
Author: Xingbo Jiang <xingbo.jiang@databricks.com>
Closes #21908 from jiangxb1987/afterEach.
(cherry picked from commit 3695ba57731a669ed20e7f676edee602c292fbed)
Signed-off-by: hyukjinkwon <gurwls223@apache.org>
(commit: bad56bb7b2340d338eac8cea07e9f1bb3e08b1ac)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala (diff)
Commit aa51c070f8944fd2aa94ac891b45ff51ffcc1ef2 by wenchen
[SPARK-24934][SQL] Explicitly whitelist supported types in upper/lower
bounds for in-memory partition pruning
## What changes were proposed in this pull request?
Looks we intentionally set `null` for upper/lower bounds for complex
types and don't use it. However, these look used in in-memory partition
pruning, which ends up with incorrect results.
This PR proposes to explicitly whitelist the supported types.
```scala val df = Seq(Array("a", "b"), Array("c", "d")).toDF("arrayCol")
df.cache().filter("arrayCol > array('a', 'b')").show()
```
```scala val df = sql("select cast('a' as binary) as a")
df.cache().filter("a == cast('a' as binary)").show()
```
**Before:**
```
+--------+
|arrayCol|
+--------+
+--------+
```
```
+---+
|  a|
+---+
+---+
```
**After:**
```
+--------+
|arrayCol|
+--------+
|  [c, d]|
+--------+
```
```
+----+
|   a|
+----+
|[61]|
+----+
```
## How was this patch tested?
Unit tests were added and manually tested.
Author: hyukjinkwon <gurwls223@apache.org>
Closes #21882 from HyukjinKwon/stats-filter.
(cherry picked from commit bfe60fcdb49aa48534060c38e36e06119900140d)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: aa51c070f8944fd2aa94ac891b45ff51ffcc1ef2)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/columnar/PartitionBatchPruningSuite.scala (diff)
Commit 25ea27b09147ba1e5bd3ba654bce35d7008fa607 by wenchen
[SPARK-24957][SQL] Average with decimal followed by aggregation returns
wrong result
## What changes were proposed in this pull request?
When we do an average, the result is computed dividing the sum of the
values by their count. In the case the result is a DecimalType, the way
we are casting/managing the precision and scale is not really optimized
and it is not coherent with what we do normally.
In particular, a problem can happen when the `Divide` operand returns a
result which contains a precision and scale different by the ones which
are expected as output of the `Divide` operand. In the case reported in
the JIRA, for instance, the result of the `Divide` operand is a
`Decimal(38, 36)`, while the output data type for `Divide` is 38, 22.
This is not an issue when the `Divide` is followed by a `CheckOverflow`
or a `Cast` to the right data type, as these operations return a decimal
with the defined precision and scale. Despite in the `Average` operator
we do have a `Cast`, this may be bypassed if the result of `Divide` is
the same type which it is casted to, hence the issue reported in the
JIRA may arise.
The PR proposes to use the normal rules/handling of the arithmetic
operators with Decimal data type, so we both reuse the existing code
(having a single logic for operations between decimals) and we fix this
problem as the result is always guarded by `CheckOverflow`.
## How was this patch tested?
added UT
Author: Marco Gaido <marcogaido91@gmail.com>
Closes #21910 from mgaido91/SPARK-24957.
(cherry picked from commit 85505fc8a58ca229bbaf240c6bc23ea876d594db)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 25ea27b09147ba1e5bd3ba654bce35d7008fa607)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala (diff)