FailedChanges

Summary

  1. [MINOR][SQL][TEST-HIVE1.2] Fix scalastyle error due to length line in (details)
  2. [SPARK-30416][SQL] Log a warning for deprecated SQL config in `set()` (details)
  3. [SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD (details)
  4. [SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on pyspark test (details)
  5. [SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax (details)
  6. [SPARK-30447][SQL] Constant propagation nullability issue (details)
  7. Revert "[SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on pyspark (details)
  8. [SPARK-30234][SQL] ADD FILE cannot add directories from sql CLI (details)
  9. [SPARK-30448][CORE] accelerator aware scheduling enforce cores as (details)
  10. [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates (details)
  11. [SPARK-30468][SQL] Use multiple lines to display data columns for show (details)
  12. [SPARK-29779][CORE] Compact old event log files and cleanup (details)
  13. [SPARK-30312][SQL] Preserve path permission and acl when truncate table (details)
  14. [SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark for (details)
  15. [SPARK-30489][BUILD] Make build delete pyspark.zip file properly (details)
  16. [SPARK-30312][SQL][FOLLOWUP] Use inequality check instead to be robust (details)
Commit 4d239388933cf27c8cf1dab5ce9ca61b50747aba by incomplete
[MINOR][SQL][TEST-HIVE1.2] Fix scalastyle error due to length line in
hive-1.2 profile
### What changes were proposed in this pull request?
fixing a broken build:
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-1.2/3/console
### Why are the changes needed?
the build is teh borked!
### Does this PR introduce any user-facing change?
newp
### How was this patch tested?
by the build system
Closes #27156 from shaneknapp/fix-scala-style.
Authored-by: shane knapp <incomplete@gmail.com> Signed-off-by: shane
knapp <incomplete@gmail.com>
The file was modifiedsql/core/v1.2/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala (diff)
Commit 1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4 by gurwls223
[SPARK-30416][SQL] Log a warning for deprecated SQL config in `set()`
and `unset()`
### What changes were proposed in this pull request? 1. Put all
deprecated SQL configs the map `SQLConf.deprecatedSQLConfigs` with extra
info about when configs were deprecated and additional comments that
explain why a config was deprecated, what an user can use instead of it.
Here is the list of already deprecated configs:
   - spark.sql.hive.verifyPartitionPath
   - spark.sql.execution.pandas.respectSessionTimeZone
   - spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
   - spark.sql.parquet.int64AsTimestampMillis
   - spark.sql.variable.substitute.depth
   - spark.sql.execution.arrow.enabled
   - spark.sql.execution.arrow.fallback.enabled
2. Output warning in `set()` and `unset()` about deprecated SQL configs
### Why are the changes needed? This should improve UX with Spark SQL
and notify users about already deprecated SQL configs.
### Does this PR introduce any user-facing change? Yes, before:
``` spark-sql> set spark.sql.hive.verifyPartitionPath=true;
spark.sql.hive.verifyPartitionPath true
``` After:
``` spark-sql> set spark.sql.hive.verifyPartitionPath=true; 20/01/03
21:28:17 WARN RuntimeConfig: The SQL config
'spark.sql.hive.verifyPartitionPath' has been deprecated in Spark v3.0.0
and may be removed in the future. This config is replaced by
spark.files.ignoreMissingFiles. spark.sql.hive.verifyPartitionPath true
```
### How was this patch tested? Add new test which registers new log
appender and catches all logging to check that `set()` and `unset()` log
any warning.
Closes #27092 from MaxGekk/group-deprecated-sql-configs.
Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/RuntimeConfig.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala (diff)
Commit 0ec0355611e7ce79599f86862a90611f7cde6227 by gurwls223
[SPARK-30439][SQL] Support non-nullable column in CREATE TABLE, ADD
COLUMN and ALTER TABLE
### What changes were proposed in this pull request?
Allow users to specify NOT NULL in CREATE TABLE and ADD COLUMN column
definition, and add a new SQL syntax to alter column nullability: ALTER
TABLE ... ALTER COLUMN SET/DROP NOT NULL. This is a SQL standard syntax:
```
<alter column definition> ::=
ALTER [ COLUMN ] <column name> <alter column action>
<alter column action> ::=
   <set column default clause>
| <drop column default clause>
| <set column not null clause>
| <drop column not null clause>
| ...
<set column not null clause> ::=
SET NOT NULL
<drop column not null clause> ::=
DROP NOT NULL
```
### Why are the changes needed?
Previously we don't support it because the table schema in hive catalog
are always nullable. Since we have catalog plugin now, it makes more
sense to support NOT NULL at spark side, and let catalog implementations
to decide if they support it or not.
### Does this PR introduce any user-facing change?
Yes, this is a new feature
### How was this patch tested?
new tests
Closes #27110 from cloud-fan/nullable.
Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/TableChange.java (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalogSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/TableCatalogSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statements.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala (diff)
Commit afd70a0f6fc1b44164e41a57dfc4fd8a5df642e1 by gurwls223
[SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on pyspark test
### What changes were proposed in this pull request?
This patch increases the memory limit in the test 'test_memory_limit'
from 1m to 8m. Credit to srowen and HyukjinKwon to provide the idea of
suspicion and guide how to fix.
### Why are the changes needed?
We observed consistent Pyspark test failures on multiple PRs (#26955,
#26201, #27064) which block the PR builds whenever the test is included.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Jenkins builds passed in WIP PR (#27159)
Closes #27162 from HeartSaVioR/SPARK-30480.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
The file was modifiedpython/pyspark/tests/test_worker.py (diff)
Commit bcf07cbf5f760f0959b8178ef807cb61adec8cc3 by wenchen
[SPARK-30018][SQL] Support ALTER DATABASE SET OWNER syntax
### What changes were proposed in this pull request? In this pull
request, we are going to support `SET OWNER` syntax for databases and
namespaces,
```sql ALTER (DATABASE|SCHEME|NAMESPACE) database_name SET OWNER
[USER|ROLE|GROUP] user_or_role_group;
``` Before this commit
https://github.com/apache/spark/commit/332e252a1448a27cfcfc1d1d794f7979e6cd331a,
we didn't care much about ownerships for the catalog objects. In
https://github.com/apache/spark/commit/332e252a1448a27cfcfc1d1d794f7979e6cd331a,
we determined to use properties to store ownership staff, and
temporarily used `alter database ... set dbproperties ...` to support
switch ownership of a database. This PR aims to use the formal syntax to
replace it.
In hive, `ownerName/Type` are fields of the database objects, also they
can be normal properties.
``` create schema test1 with dbproperties('ownerName'='yaooqinn')
``` The create/alter database syntax will not change the owner to
`yaooqinn` but store it in parameters. e.g.
```
+----------+----------+---------------------------------------------------------------+-------------+-------------+-----------------------+--+
| db_name  | comment  |                           location             
             | owner_name  | owner_type  |      parameters       |
+----------+----------+---------------------------------------------------------------+-------------+-------------+-----------------------+--+
| test1    |          |
hdfs://quickstart.cloudera:8020/user/hive/warehouse/test1.db  |
anonymous   | USER        | {ownerName=yaooqinn}  |
+----------+----------+---------------------------------------------------------------+-------------+-------------+-----------------------+--+
``` In this pull request, because we let the `ownerName` become
reversed, so it will neither change the owner nor store in dbproperties,
just be omitted silently.
## Why are the changes needed?
Formal syntax support for changing database ownership
### Does this PR introduce any user-facing change?
yes, add a new syntax
### How was this patch tested?
add unit tests
Closes #26775 from yaooqinn/SPARK-30018.
Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan
<wenchen@databricks.com>
The file was modifieddocs/sql-keywords.md (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsNamespaces.java (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateNamespaceExec.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala (diff)
The file was modifiedsql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalogSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2SessionCatalog.scala (diff)
Commit 418f7dc9731403d820a8167d5ddcb99a6246668f by yamamuro
[SPARK-30447][SQL] Constant propagation nullability issue
## What changes were proposed in this pull request?
This PR fixes `ConstantPropagation` rule as the current implementation
produce incorrect results in some cases. E.g.
``` SELECT * FROM t WHERE NOT(c = 1 AND c + 1 = 1)
``` returns those rows where `c` is null due to `1 + 1 = 1` propagation
but it shouldn't.
## Why are the changes needed?
To fix a bug.
## Does this PR introduce any user-facing change?
Yes, fixes a bug.
## How was this patch tested?
New UTs.
Closes #27119 from peter-toth/SPARK-30447.
Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Takeshi
Yamamuro <yamamuro@apache.org>
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
Commit d0983af38ffb123fa440bc5fcf3912db7658dd28 by gurwls223
Revert "[SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on pyspark
test"
This reverts commit afd70a0f6fc1b44164e41a57dfc4fd8a5df642e1.
The file was modifiedpython/pyspark/tests/test_worker.py (diff)
Commit 2a629e5d105461e12499503d2e4e95292d66a7fc by gurwls223
[SPARK-30234][SQL] ADD FILE cannot add directories from sql CLI
### What changes were proposed in this pull request? Now users can add
directories from sql CLI as well using ADD FILE command and setting
spark.sql.addDirectory.recursive to true.
### Why are the changes needed? In SPARK-4687, support was added for
adding directories as resources. But sql users cannot use that feature
from CLI.
`ADD FILE /path/to/folder` gives the following error:
`org.apache.spark.SparkException: Added file /path/to/folder is a
directory and recursive is not turned on.`
Users need to turn on `recursive` for adding directories. Thus a
configuration was required which will allow users to turn on
`recursive`. Also Hive allow users to add directories from their shell.
### Does this PR introduce any user-facing change? Yes. Users can set
recursive using `spark.sql.addDirectory.recursive`.
### How was this patch tested? Manually. Will add test cases soon.
SPARK SCREENSHOTS When `spark.sql.addDirectory.recursive` is not turned
on.
![Screenshot from 2019-12-13
08-02-13](https://user-images.githubusercontent.com/15366835/70765124-c6b4a100-1d7f-11ea-9352-9c010af5b38b.png)
After setting `spark.sql.addDirectory.recursive` to true.
![Screenshot from 2019-12-13
08-02-59](https://user-images.githubusercontent.com/15366835/70765118-be5c6600-1d7f-11ea-9faf-0b1c46ee299b.png)
HIVE SCREENSHOT
![Screenshot from 2019-12-13
14-44-41](https://user-images.githubusercontent.com/15366835/70788979-17e08700-1db8-11ea-9c0c-b6d6f6e80a35.png)
`RELEASE_NOTES.txt` is text file while `dummy` is a directory.
Closes #26863 from iRakson/SPARK-30234.
Lead-authored-by: root1 <raksonrakesh@gmail.com> Co-authored-by: iRakson
<raksonrakesh@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
Commit d6532c7079f22f32e90e1c69c25bdfab51c7c53e by tgraves
[SPARK-30448][CORE] accelerator aware scheduling enforce cores as
limiting resource
### What changes were proposed in this pull request?
This PR is to make sure cores is the limiting resource when using
accelerator aware scheduling and fix a few issues with
SparkContext.checkResourcesPerTask
For the first version of accelerator aware scheduling(SPARK-27495), the
SPIP had a condition that we can support dynamic allocation because we
were going to have a strict requirement that we don't waste any
resources. This means that the number of slots each executor has could
be calculated from the number of cores and task cpus just as is done
today.
Somewhere along the line of development we relaxed that and only warn
when we are wasting resources. This breaks the dynamic allocation logic
if the limiting resource is no longer the cores because its using the
cores and task cpus to calculate the number of executors it needs.  This
means we will request less executors then we really need to run
everything. We have to enforce that cores is always the limiting
resource so we should throw if its not.
The only issue with us enforcing this is on cluster managers (standalone
and mesos coarse grained) where we don't know the executor cores up
front by default. Meaning the spark.executor.cores config defaults to 1
but when the executor is started by default it gets all the cores of the
Worker. So we have to add logic specifically to handle that and we can't
enforce this requirements, we can just warn when dynamic allocation is
enabled for those.
### Why are the changes needed?
Bug in dynamic allocation if cores is not limiting resource and warnings
not correct.
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
Unit test added and manually tested the confiditions on local mode,
local cluster mode, standalone mode, and yarn.
Closes #27138 from tgravescs/SPARK-30446.
Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas
Graves <tgraves@apache.org>
The file was modifiedcore/src/test/scala/org/apache/spark/SparkContextSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/SparkContext.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala (diff)
Commit b942832bd3bd3bbca6f73606e3f7b1d423e19120 by yamamuro
[SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates
### What changes were proposed in this pull request?
This pr intends to skip the unnecessary checks that most aggregate
quries don't need in RewriteDistinctAggregates.
### Why are the changes needed?
For minor optimization.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes #26997 from maropu/OptDistinctAggRewrite.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by:
Takeshi Yamamuro <yamamuro@apache.org>
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala (diff)
Commit 2bd8731813850180bab5887317ecf7fe83f6e8e1 by srowen
[SPARK-30468][SQL] Use multiple lines to display data columns for show
create table command
### What changes were proposed in this pull request? Currently data
columns are displayed in one line for show create table command, when
the table has many columns (to make things even worse, columns may have
long names or comments), the displayed result is really hard to read.
To improve readability, we print each column in a separate line. Note
that other systems like Hive/MySQL also display in this way.
Also, for data columns, table properties and options, we put the right
parenthesis to the end of the last column/property/option, instead of
occupying a separate line.
### Why are the changes needed? for better readability
### Does this PR introduce any user-facing change? before the change:
``` spark-sql> show create table test_table; CREATE TABLE `test_table`
(`col1` INT COMMENT 'This is comment for column 1', `col2` STRING
COMMENT 'This is comment for column 2', `col3` DOUBLE COMMENT 'This is
comment for column 3') USING parquet OPTIONS (
`bar` '2',
`foo` '1'
) TBLPROPERTIES (
'a' = 'x',
'b' = 'y'
)
``` after the change:
``` spark-sql> show create table test_table; CREATE TABLE `test_table` (
`col1` INT COMMENT 'This is comment for column 1',
`col2` STRING COMMENT 'This is comment for column 2',
`col3` DOUBLE COMMENT 'This is comment for column 3') USING parquet
OPTIONS (
`bar` '2',
`foo` '1') TBLPROPERTIES (
'a' = 'x',
'b' = 'y')
```
### How was this patch tested? modified existing tests
Closes #27147 from wzhfy/multi_line_columns.
Authored-by: Zhenhua Wang <wzh_zju@163.com> Signed-off-by: Sean Owen
<srowen@gmail.com>
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show-create-table.sql.out (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ShowCreateTableSuite.scala (diff)
Commit 7fb17f59435a76d871251c1b5923f96943f5e540 by vanzin
[SPARK-29779][CORE] Compact old event log files and cleanup
### What changes were proposed in this pull request?
This patch proposes to compact old event log files when end users enable
rolling event log, and clean up these files after compaction.
Here the "compaction" really mean is filtering out listener events for
finished/removed things - like jobs which take most of space for event
log file except SQL related events. To achieve this, compactor does two
phases reading: 1) tracking the live jobs (and more to add) 2) filtering
events via leveraging the information about live things and rewriting to
the "compacted" file.
This approach retains the ability of compatibility on event log file and
adds the possibility of reducing the overall size of event logs. There's
a downside here as well: executor metrics for tasks would be inaccurate,
as compactor will filter out the task events which job is finished, but
I don't feel it as a blocker.
Please note that SPARK-29779 leaves below functionalities for future
JIRA issue as the patch for SPARK-29779 is too huge and we decided to
break down:
* apply filter in SQL events
* integrate compaction into FsHistoryProvider
* documentation about new configuration
### Why are the changes needed?
One of major goal of SPARK-28594 is to prevent the event logs to become
too huge, and SPARK-29779 achieves the goal. We've got another approach
in prior, but the old approach required models in both KVStore and live
entities to guarantee compatibility, while they're not designed to do
so.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Added UTs.
Closes #27085 from HeartSaVioR/SPARK-29779-part1.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/EventLogFileReadersSuite.scala (diff)
The file was addedcore/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
The file was addedcore/src/test/scala/org/apache/spark/deploy/history/BasicEventFilterSuite.scala
The file was addedcore/src/main/scala/org/apache/spark/deploy/history/BasicEventFilterBuilder.scala
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/EventLogFileWritersSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/EventLogTestHelper.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was addedcore/src/main/scala/org/apache/spark/deploy/history/EventFilter.scala
The file was addedcore/src/main/resources/META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder
The file was modifiedcore/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala (diff)
The file was addedcore/src/test/scala/org/apache/spark/status/ListenerEventsTestHelper.scala
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala (diff)
The file was addedcore/src/test/scala/org/apache/spark/deploy/history/BasicEventFilterBuilderSuite.scala
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala (diff)
The file was addedcore/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala
Commit b5bc3e12a629e547e32e340ee0439bc53745d862 by dhyun
[SPARK-30312][SQL] Preserve path permission and acl when truncate table
### What changes were proposed in this pull request?
This patch proposes to preserve existing permission/acls of paths when
truncate table/partition.
### Why are the changes needed?
When Spark SQL truncates table, it deletes the paths of
table/partitions, then re-create new ones. If permission/acls were set
on the paths, the existing permission/acls will be deleted.
We should preserve the permission/acls if possible.
### Does this PR introduce any user-facing change?
Yes. When truncate table/partition, Spark will keep permission/acls of
paths.
### How was this patch tested?
Unit test.
Manual test:
1. Create a table. 2. Manually change it permission/acl 3. Truncate
table 4. Check permission/acl
```scala val df = Seq(1, 2, 3).toDF
df.write.mode("overwrite").saveAsTable("test.test_truncate_table") val
testTable = spark.table("test.test_truncate_table") testTable.show()
+-----+
|value|
+-----+
|    1|
|    2|
|    3|
+-----+
// hdfs dfs -setfacl ...
// hdfs dfs -getfacl ... sql("truncate table test.test_truncate_table")
// hdfs dfs -getfacl ... val testTable2 =
spark.table("test.test_truncate_table") testTable2.show()
+-----+
|value|
+-----+
+-----+
```
![Screen Shot 2019-12-30 at 3 12 15
PM](https://user-images.githubusercontent.com/68855/71604577-c7875a00-2b17-11ea-913a-ba88096d20ab.jpg)
Closes #26956 from viirya/truncate-table-permission.
Lead-authored-by: Liang-Chi Hsieh <liangchi@uber.com> Co-authored-by:
Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon Hyun
<dhyun@apple.com>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
Commit f372d1cf4fff535bcd0b0be0736da18037457fde by cutlerb
[SPARK-29748][PYTHON][SQL] Remove Row field sorting in PySpark for
version 3.6+
### What changes were proposed in this pull request?
Removing the sorting of PySpark SQL Row fields that were previously
sorted by name alphabetically for Python versions 3.6 and above. Field
order will now match that as entered. Rows will be used like tuples and
are applied to schema by position. For Python versions < 3.6, the order
of kwargs is not guaranteed and therefore will be sorted automatically
as in previous versions of Spark.
### Why are the changes needed?
This caused inconsistent behavior in that local Rows could be applied to
a schema by matching names, but once serialized the Row could only be
used by position and the fields were possibly in a different order.
### Does this PR introduce any user-facing change?
Yes, Row fields are no longer sorted alphabetically but will be in the
order entered. For Python < 3.6 `kwargs` can not guarantee the order as
entered, so `Row`s will be automatically sorted.
An environment variable "PYSPARK_ROW_FIELD_SORTING_ENABLED" can be set
that will override construction of `Row` to maintain compatibility with
Spark 2.x.
### How was this patch tested?
Existing tests are run with PYSPARK_ROW_FIELD_SORTING_ENABLED=true and
added new test with unsorted fields for Python 3.6+
Closes #26496 from BryanCutler/pyspark-remove-Row-sorting-SPARK-29748.
Authored-by: Bryan Cutler <cutlerb@gmail.com> Signed-off-by: Bryan
Cutler <cutlerb@gmail.com>
The file was modifieddocs/pyspark-migration-guide.md (diff)
The file was modifiedpython/pyspark/sql/tests/test_types.py (diff)
The file was modifiedpython/pyspark/sql/types.py (diff)
The file was modifiedpython/run-tests.py (diff)
Commit 582509b7ae76bc298c31a68bcfd7011c1b9e23a7 by dhyun
[SPARK-30489][BUILD] Make build delete pyspark.zip file properly
### What changes were proposed in this pull request?
A small fix to the Maven build file under the `assembly` module by
switch "dir" attribute to "file".
### Why are the changes needed?
To make the `<delete>` task properly delete an existing zip file.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Ran a build with the change and confirmed that a corrupted zip file was
replaced with the correct one.
Closes #27171 from jeff303/SPARK-30489.
Authored-by: Jeff Evans <jeffrey.wayne.evans@gmail.com> Signed-off-by:
Dongjoon Hyun <dhyun@apple.com>
The file was modifiedassembly/pom.xml (diff)
Commit b04407169b8165fe634c9c2214c0f54e45642fa6 by dhyun
[SPARK-30312][SQL][FOLLOWUP] Use inequality check instead to be robust
### What changes were proposed in this pull request?
This is a followup to fix a brittle assert in a test case.
### Why are the changes needed?
Original assert assumes that default permission is `rwxr-xr-x`, but in
jenkins
[env](https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7-hive-1.2/6/testReport/junit/org.apache.spark.sql.execution.command/InMemoryCatalogedDDLSuite/SPARK_30312__truncate_table___keep_acl_permission/)
it could be `rwxrwxr-x`.
### Does this PR introduce any user-facing change?
No
### How was this patch tested?
Unit test.
Closes #27175 from viirya/hot-fix.
Authored-by: Liang-Chi Hsieh <viirya@gmail.com> Signed-off-by: Dongjoon
Hyun <dhyun@apple.com>
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)