[SPARK-25714][BACKPORT-2.3] Fix Null Handling in the Optimizer rule
This PR is to backport to
branch 2.3.
## What changes were proposed in this pull request?
   val df1 = Seq(("abc", 1), (null, 3)).toDF("col1", "col2")
   val df2 ="/tmp/test1")
   df2.filter("col1 = 'abc' OR (col1 != 'abc' AND col2 == 3)").show()
Before the PR, it returns both rows. After the fix, it returns `Row
("abc", 1))`. This is to fix the bug in NULL handling in
BooleanSimplification. This is a bug introduced in Spark 1.6 release.
## How was this patch tested? Added test cases
Closes #22718 from gatorsmile/cherrypickSPARK-25714.
Authored-by: gatorsmile
Fan <>
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BooleanSimplificationSuite.scala (diff)
[SPARK-25674][FOLLOW-UP] Update the stats for each ColumnarBatch
This PR is a follow-up of .
This alternative can avoid the unneeded computation in the hot code
- For row-based scan, we keep the original way.
- For the columnar scan, we just need to update the stats after each
Closes #22731 from gatorsmile/udpateStatsFileScanRDD.
Authored-by: gatorsmile
Fan <>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala (diff)