FailedChanges

Summary

  1. [SPARK-33228][SQL] Don't uncache data when replacing a view having the (details)
  2. [SPARK-33234][INFRA] Generates SHA-512 using shasum (details)
  3. [SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same (details)
Commit 87b498462b82fce02dd50286887092cf7858d2e8 by dhyun
[SPARK-33228][SQL] Don't uncache data when replacing a view having the
same logical plan
### What changes were proposed in this pull request?
SPARK-30494's updated the `CreateViewCommand` code to implicitly drop
cache when replacing an existing view. But, this change drops cache even
when replacing a view having the same logical plan. A sequence of
queries to reproduce this as follows;
```
// Spark v2.4.6+ scala> val df = spark.range(1).selectExpr("id a", "id
b") scala> df.cache() scala> df.explain()
== Physical Plan ==
*(1) ColumnarToRow
+- InMemoryTableScan [a#2L, b#3L]
     +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory,
deserialized, 1 replicas)
           +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
              +- *(1) Range (0, 1, step=1, splits=4)
scala> df.createOrReplaceTempView("t") scala> sql("select * from
t").explain()
== Physical Plan ==
*(1) ColumnarToRow
+- InMemoryTableScan [a#2L, b#3L]
     +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory,
deserialized, 1 replicas)
           +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
              +- *(1) Range (0, 1, step=1, splits=4)
// If one re-runs the same query `df.createOrReplaceTempView("t")`, the
cache's swept away scala> df.createOrReplaceTempView("t") scala>
sql("select * from t").explain()
== Physical Plan ==
*(1) Project [id#0L AS a#2L, id#0L AS b#3L]
+- *(1) Range (0, 1, step=1, splits=4)
// Until v2.4.6 scala> val df = spark.range(1).selectExpr("id a", "id
b") scala> df.cache() scala> df.createOrReplaceTempView("t") scala>
sql("select * from t").explain() 20/10/23 22:33:42 WARN ObjectStore:
Failed to get database global_temp, returning NoSuchObjectException
== Physical Plan ==
*(1) InMemoryTableScan [a#2L, b#3L]
  +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory,
deserialized, 1 replicas)
        +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
           +- *(1) Range (0, 1, step=1, splits=4)
scala> df.createOrReplaceTempView("t") scala> sql("select * from
t").explain()
== Physical Plan ==
*(1) InMemoryTableScan [a#2L, b#3L]
  +- InMemoryRelation [a#2L, b#3L], StorageLevel(disk, memory,
deserialized, 1 replicas)
        +- *(1) Project [id#0L AS a#2L, id#0L AS b#3L]
           +- *(1) Range (0, 1, step=1, splits=4)
```
### Why are the changes needed?
bugfix.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Added tests.
Closes #30140 from maropu/FixBugInReplaceView.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala (diff)
Commit ce0ebf5f023b1d2230bbd4b9ffad294edef3bca7 by dhyun
[SPARK-33234][INFRA] Generates SHA-512 using shasum
### What changes were proposed in this pull request?
I am generating the SHA-512 using the standard shasum which also has a
better output compared to GPG.
### Why are the changes needed?
Which makes the hash much easier to verify for users that don't have
GPG.
Because an user having GPG can check the keys but an user without GPG
will have a hard time validating the SHA-512 based on the 'pretty
printed' format.
Apache Spark is the only project where I've seen this format. Most other
Apache projects have a one-line hash file.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
This patch assumes the build system has shasum (it should, but I can't
test this).
Closes #30123 from emilianbold/master.
Authored-by: Emi <emilian.bold@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
The file was modifieddev/create-release/release-build.sh (diff)
Commit 56ab60fb7ae37ca64d668bc4a1f18216cc7186fd by gurwls223
[SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same
with hive
### What changes were proposed in this pull request? In current Spark
script transformation with hive serde mode, in case of schema less,
result is different with hive. This pr to keep result same with hive
script transform  serde.
#### Hive Scrip Transform with serde in schemaless
``` hive> create table t (c0 int, c1 int, c2 int); hive> INSERT INTO t
VALUES (1, 1, 1); hive> INSERT INTO t VALUES (2, 2, 2); hive> CREATE
VIEW v AS SELECT TRANSFORM(c0, c1, c2) USING 'cat' FROM t;
hive> DESCRIBE v; key                 string value              
string
hive> SELECT * FROM v; 1 1 1 2 2 2
hive> SELECT key FROM v; 1 2
hive> SELECT value FROM v; 1 1 2 2
```
#### Spark script transform with hive serde in schema less.
``` hive> create table t (c0 int, c1 int, c2 int); hive> INSERT INTO t
VALUES (1, 1, 1); hive> INSERT INTO t VALUES (2, 2, 2); hive> CREATE
VIEW v AS SELECT TRANSFORM(c0, c1, c2) USING 'cat' FROM t;
hive> SELECT * FROM v; 1   1 2   2
```
**No serde mode in hive (ROW FORMATTED DELIMITED)**
![image](https://user-images.githubusercontent.com/46485123/90088770-55841e00-dd52-11ea-92dd-7fe52d93f0b3.png)
### Why are the changes needed? Keep same behavior with hive script
transform
### Does this PR introduce _any_ user-facing change? Before this pr with
hive serde script transform
``` select transform(*) USING 'cat' from ( select 1, 2, 3, 4
) tmp
key     value 1         2
``` After
``` select transform(*) USING 'cat' from ( select 1, 2, 3, 4
) tmp
key     value 1         2   3  4
```
### How was this patch tested? UT
Closes #29421 from AngersZhuuuu/SPARK-32388.
Authored-by: angerszhu <angers.zhu@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveScriptTransformationSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/BaseScriptTransformationSuite.scala (diff)