1. [SPARK-25714][BACKPORT-2.3] Fix Null Handling in the Optimizer rule (commit: d87896b63726ce60765ceee2802bf1fd2c9dfa0e) (details)
  2. [SPARK-25674][FOLLOW-UP] Update the stats for each ColumnarBatch (commit: 0726bc56fce83c3ec30cfbb6c12dfcd68a85cd0f) (details)
Commit d87896b63726ce60765ceee2802bf1fd2c9dfa0e by wenchen
[SPARK-25714][BACKPORT-2.3] Fix Null Handling in the Optimizer rule
This PR is to backport to
branch 2.3.
## What changes were proposed in this pull request?
   val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2")
   val df2 ="/tmp/test1")
   df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()
Before the PR, it returns both rows. After the fix, it returns `Row
("abc", 1))`. This is to fix the bug in NULL handling in
BooleanSimplification. This is a bug introduced in Spark 1.6 release.
## How was this patch tested? Added test cases
Closes #22718 from gatorsmile/cherrypickSPARK-25714.
Authored-by: gatorsmile <> Signed-off-by: Wenchen
Fan <>
(commit: d87896b63726ce60765ceee2802bf1fd2c9dfa0e)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala (diff)
Commit 0726bc56fce83c3ec30cfbb6c12dfcd68a85cd0f by sean.owen
[SPARK-25674][FOLLOW-UP] Update the stats for each ColumnarBatch
This PR is a follow-up of .
This alternative can avoid the unneeded computation in the hot code
- For row-based scan, we keep the original way.
- For the columnar scan, we just need to update the stats after each
Closes #22731 from gatorsmile/udpateStatsFileScanRDD.
Authored-by: gatorsmile <> Signed-off-by: Wenchen
Fan <>
(cherry picked from commit 4cee191c04f14d7272347e4b29201763c6cfb6bf)
Signed-off-by: Sean Owen <>
(commit: 0726bc56fce83c3ec30cfbb6c12dfcd68a85cd0f)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala (diff)