SuccessChanges

Summary

  1. [SPARK-25313][BRANCH-2.3][SQL] Fix regression in FileFormatWriter output (commit: 9db81fd864dcc97bed2bf5bd2028787c3f07a6d0) (details)
  2. [SPARK-25072][PYSPARK] Forbid extra value for custom Row (commit: 31dab7140a4b271e7b976762af7a36f8bfbb8381) (details)
Commit 9db81fd864dcc97bed2bf5bd2028787c3f07a6d0 by wenchen
[SPARK-25313][BRANCH-2.3][SQL] Fix regression in FileFormatWriter output
names
Port https://github.com/apache/spark/pull/22320 to branch-2.3
## What changes were proposed in this pull request?
Let's see the follow example:
```
       val location = "/tmp/t"
       val df = spark.range(10).toDF("id")
       df.write.format("parquet").saveAsTable("tbl")
       spark.sql("CREATE VIEW view1 AS SELECT id FROM tbl")
       spark.sql(s"CREATE TABLE tbl2(ID long) USING parquet location
$location")
       spark.sql("INSERT OVERWRITE TABLE tbl2 SELECT ID FROM view1")
       println(spark.read.parquet(location).schema)
       spark.table("tbl2").show()
``` The output column name in schema will be `id` instead of `ID`, thus
the last query shows nothing from `tbl2`. By enabling the debug message
we can see that the output naming is changed from `ID` to `id`, and then
the `outputColumns` in `InsertIntoHadoopFsRelationCommand` is changed in
`RemoveRedundantAliases`.
![wechatimg5](https://user-images.githubusercontent.com/1097932/44947871-6299f200-ae46-11e8-9c96-d45fe368206c.jpeg)
![wechatimg4](https://user-images.githubusercontent.com/1097932/44947866-56ae3000-ae46-11e8-8923-8b3bbe060075.jpeg)
**To guarantee correctness**, we should change the output columns from
`Seq[Attribute]` to `Seq[String]` to avoid its names being replaced by
optimizer.
I will fix project elimination related rules in
https://github.com/apache/spark/pull/22311 after this one.
## How was this patch tested?
Unit test.
Closes #22346 from gengliangwang/portSchemaOutputName2.3.
Authored-by: Gengliang Wang <gengliang.wang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(commit: 9db81fd864dcc97bed2bf5bd2028787c3f07a6d0)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala (diff)
Commit 31dab7140a4b271e7b976762af7a36f8bfbb8381 by cutlerb
[SPARK-25072][PYSPARK] Forbid extra value for custom Row
## What changes were proposed in this pull request?
Add value length check in `_create_row`, forbid extra value for custom
Row in PySpark.
## How was this patch tested?
New UT in pyspark-sql
Closes #22140 from xuanyuanking/SPARK-25072.
Lead-authored-by: liyuanjian <liyuanjian@baidu.com> Co-authored-by:
Yuanjian Li <xyliyuanjian@gmail.com> Signed-off-by: Bryan Cutler
<cutlerb@gmail.com>
(cherry picked from commit c84bc40d7f33c71eca1c08f122cd60517f34c1f8)
Signed-off-by: Bryan Cutler <cutlerb@gmail.com>
(commit: 31dab7140a4b271e7b976762af7a36f8bfbb8381)
The file was modifiedpython/pyspark/sql/types.py (diff)
The file was modifiedpython/pyspark/sql/tests.py (diff)