SuccessChanges

Summary

  1. [SPARK-26422][R] Support to disable Hive support in SparkR even for (commit: b4aeb819163268f8d28723e763a9d26da37edc5e) (details)
  2. [SPARK-26366][SQL][BACKPORT-2.3] ReplaceExceptWithFilter should consider (commit: a7d50ae24a5f92e8d9b6622436f0bb4c2e06cbe1) (details)
  3. Revert "[SPARK-26366][SQL][BACKPORT-2.3] ReplaceExceptWithFilter should (commit: d9d3beafad8dbee5d21f062a181343b8640d2ccd) (details)
Commit b4aeb819163268f8d28723e763a9d26da37edc5e by gurwls223
[SPARK-26422][R] Support to disable Hive support in SparkR even for
Hadoop versions unsupported by Hive fork
## What changes were proposed in this pull request?
Currently,  even if I explicitly disable Hive support in SparkR session
as below:
```r sparkSession <- sparkR.session("local[4]", "SparkR",
Sys.getenv("SPARK_HOME"),
                              enableHiveSupport = FALSE)
```
produces when the Hadoop version is not supported by our Hive fork:
``` java.lang.reflect.InvocationTargetException
... Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop
major version number: 3.1.1.3.1.0.0-78
at
org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
at
org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
at
org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
at
org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:368)
... 43 more Error in handleErrors(returnStatus, conn) :
java.lang.ExceptionInInitializerError
at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:105)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:193)
at
org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1116)
at
org.apache.spark.sql.api.r.SQLUtils$.getOrCreateSparkSession(SQLUtils.scala:52)
at
org.apache.spark.sql.api.r.SQLUtils.getOrCreateSparkSession(SQLUtils.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
```
The root cause is that:
``` SparkSession.hiveClassesArePresent
```
check if the class is loadable or not to check if that's in classpath
but `org.apache.hadoop.hive.conf.HiveConf` has a check for Hadoop
version as static logic which is executed right away. This throws an
`IllegalArgumentException` and that's not caught:
https://github.com/apache/spark/blob/36edbac1c8337a4719f90e4abd58d38738b2e1fb/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala#L1113-L1121
So, currently, if users have a Hive built-in Spark with unsupported
Hadoop version by our fork (namely 3+), there's no way to use SparkR
even though it could work.
This PR just propose to change the order of bool comparison so that we
can don't execute `SparkSession.hiveClassesArePresent` when:
  1. `enableHiveSupport` is explicitly disabled
2. `spark.sql.catalogImplementation` is `in-memory`
so that we **only** check `SparkSession.hiveClassesArePresent` when Hive
support is explicitly enabled by short circuiting.
## How was this patch tested?
It's difficult to write a test since we don't run tests against Hadoop 3
yet. See https://github.com/apache/spark/pull/21588. Manually tested.
Closes #23356 from HyukjinKwon/SPARK-26422.
Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin
Kwon <gurwls223@apache.org>
(cherry picked from commit 305e9b5ad22b428501fd42d3730d73d2e09ad4c5)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(commit: b4aeb819163268f8d28723e763a9d26da37edc5e)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala (diff)
Commit a7d50ae24a5f92e8d9b6622436f0bb4c2e06cbe1 by gatorsmile
[SPARK-26366][SQL][BACKPORT-2.3] ReplaceExceptWithFilter should consider
NULL as False
## What changes were proposed in this pull request?
In `ReplaceExceptWithFilter` we do not consider properly the case in
which the condition returns NULL. Indeed, in that case, since negating
NULL still returns NULL, so it is not true the assumption that negating
the condition returns all the rows which didn't satisfy it, rows
returning NULL may not be returned. This happens when constraints
inferred by `InferFiltersFromConstraints` are not enough, as it happens
with `OR` conditions.
The rule had also problems with non-deterministic conditions: in such a
scenario, this rule would change the probability of the output.
The PR fixes these problem by:
- returning False for the condition when it is Null (in this way we do
return all the rows which didn't satisfy it);
- avoiding any transformation when the condition is non-deterministic.
## How was this patch tested?
added UTs
Closes #23350 from mgaido91/SPARK-26366_2.3.
Authored-by: Marco Gaido <marcogaido91@gmail.com> Signed-off-by:
gatorsmile <gatorsmile@gmail.com>
(commit: a7d50ae24a5f92e8d9b6622436f0bb4c2e06cbe1)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceExceptWithFilter.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceOperatorSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
Commit d9d3beafad8dbee5d21f062a181343b8640d2ccd by dongjoon
Revert "[SPARK-26366][SQL][BACKPORT-2.3] ReplaceExceptWithFilter should
consider NULL as False"
This reverts commit a7d50ae24a5f92e8d9b6622436f0bb4c2e06cbe1.
(commit: d9d3beafad8dbee5d21f062a181343b8640d2ccd)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceExceptWithFilter.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceOperatorSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala (diff)