SuccessChanges

Summary

  1. [SPARK-24007][SQL] EqualNullSafe for FloatType and DoubleType might (commit: a1c56b66970a683e458e3f44fd6788110e869093) (details)
  2. [SPARK-23963][SQL] Properly handle large number of columns in query on (commit: 5bcb7bdccf967ff5ad3d8c76f4ad8c9c4031e7c2) (details)
  3. [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky (commit: 130641102ceecf2a795d7f0dc6412c7e56eb03a8) (details)
Commit a1c56b66970a683e458e3f44fd6788110e869093 by gatorsmile
[SPARK-24007][SQL] EqualNullSafe for FloatType and DoubleType might
generate a wrong result by codegen.
## What changes were proposed in this pull request?
`EqualNullSafe` for `FloatType` and `DoubleType` might generate a wrong
result by codegen.
```scala scala> val df = Seq((Some(-1.0d), None), (None,
Some(-1.0d))).toDF() df: org.apache.spark.sql.DataFrame = [_1: double,
_2: double]
scala> df.show()
+----+----+
|  _1|  _2|
+----+----+
|-1.0|null|
|null|-1.0|
+----+----+
scala> df.filter("_1 <=> _2").show()
+----+----+
|  _1|  _2|
+----+----+
|-1.0|null|
|null|-1.0|
+----+----+
```
The result should be empty but the result remains two rows.
## How was this patch tested?
Added a test.
Author: Takuya UESHIN <ueshin@databricks.com>
Closes #21094 from ueshin/issues/SPARK-24007/equalnullsafe.
(cherry picked from commit f09a9e9418c1697d198de18f340b1288f5eb025c)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>
(commit: a1c56b66970a683e458e3f44fd6788110e869093)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala (diff)
Commit 5bcb7bdccf967ff5ad3d8c76f4ad8c9c4031e7c2 by gatorsmile
[SPARK-23963][SQL] Properly handle large number of columns in query on
text-based Hive table
## What changes were proposed in this pull request?
TableReader would get disproportionately slower as the number of columns
in the query increased.
I fixed the way TableReader was looking up metadata for each column in
the row. Previously, it had been looking up this data in linked lists,
accessing each linked list by an index (column number). Now it looks up
this data in arrays, where indexing by column number works better.
## How was this patch tested?
Manual testing All sbt unit tests python sql tests
Author: Bruce Robbins <bersprockets@gmail.com>
Closes #21043 from bersprockets/tabreadfix.
(commit: 5bcb7bdccf967ff5ad3d8c76f4ad8c9c4031e7c2)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala (diff)
Commit 130641102ceecf2a795d7f0dc6412c7e56eb03a8 by vanzin
[SPARK-23775][TEST] Make DataFrameRangeSuite not flaky
## What changes were proposed in this pull request?
DataFrameRangeSuite.test("Cancelling stage in a query with Range.")
stays sometimes in an infinite loop and times out the build.
There were multiple issues with the test:
1. The first valid stageId is zero when the test started alone and not
in a suite and the following code waits until timeout:
``` eventually(timeout(10.seconds), interval(1.millis)) {
assert(DataFrameRangeSuite.stageToKill > 0)
}
```
2. The `DataFrameRangeSuite.stageToKill` was overwritten by the task's
thread after the reset which ended up in canceling the same stage 2
times. This caused the infinite wait.
This PR solves this mentioned flakyness by removing the shared
`DataFrameRangeSuite.stageToKill` and using `wait` and `CountDownLatch`
for synhronization.
## How was this patch tested?
Existing unit test.
Author: Gabor Somogyi <gabor.g.somogyi@gmail.com>
Closes #20888 from gaborgsomogyi/SPARK-23775.
(cherry picked from commit 0c94e48bc50717e1627c0d2acd5382d9adc73c97)
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
(commit: 130641102ceecf2a795d7f0dc6412c7e56eb03a8)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala (diff)