1. [SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision by (commit: c2f4ee7baf07501cc1f8a23dd21d14aea53606c7) (details)
Commit c2f4ee7baf07501cc1f8a23dd21d14aea53606c7 by gatorsmile
[SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision by
zero in a case of empty table with analyzed statistics
>What changes were proposed in this pull request?
During evaluation of IN conditions, if the source data frame, is
represented by a plan, that uses hive table with columns, which were
previously analysed, and the plan has conditions for these fields, that
cannot be satisfied (which leads us to an empty data frame),
FilterEstimation.evaluateInSet method produces NumberFormatException and
ClassCastException. In order to fix this bug, method
FilterEstimation.evaluateInSet at first checks, if distinct count is not
zero, and also checks if colStat.min and colStat.max  are defined, and
only in this case proceeds with the calculation. If at least one of the
conditions is not satisfied, zero is returned.
>How was this patch tested?
In order to test the PR two tests were implemented: one in
FilterEstimationSuite, that tests the plan with the statistics that
violates the conditions mentioned above,  and another one in
StatisticsCollectionSuite, that test the whole process of
analysis/optimisation of the query, that leads to the problems,
mentioned in the first section.
Author: Mykhailo Shtelma <> Author:
smikesh <>
Closes #21052 from mshtelma/filter_estimation_evaluateInSet_Bugs.
(cherry picked from commit c48085aa91c60615a4de3b391f019f46f3fcdbe3)
Signed-off-by: gatorsmile <>
(commit: c2f4ee7baf07501cc1f8a23dd21d14aea53606c7)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/StatisticsCollectionSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala (diff)