SuccessChanges

Summary

  1. [SPARK-30447][SQL] Constant propagation nullability issue (details)
  2. Revert "[SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on pyspark (details)
  3. [SPARK-30234][SQL] ADD FILE cannot add directories from sql CLI (details)
  4. [SPARK-30448][CORE] accelerator aware scheduling enforce cores as (details)
  5. [SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates (details)
  6. [SPARK-30468][SQL] Use multiple lines to display data columns for show (details)
  7. [SPARK-29779][CORE] Compact old event log files and cleanup (details)
Commit 418f7dc9731403d820a8167d5ddcb99a6246668f by yamamuro
[SPARK-30447][SQL] Constant propagation nullability issue
## What changes were proposed in this pull request?
This PR fixes `ConstantPropagation` rule as the current implementation
produce incorrect results in some cases. E.g.
``` SELECT * FROM t WHERE NOT(c = 1 AND c + 1 = 1)
``` returns those rows where `c` is null due to `1 + 1 = 1` propagation
but it shouldn't.
## Why are the changes needed?
To fix a bug.
## Does this PR introduce any user-facing change?
Yes, fixes a bug.
## How was this patch tested?
New UTs.
Closes #27119 from peter-toth/SPARK-30447.
Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Takeshi
Yamamuro <yamamuro@apache.org>
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConstantPropagationSuite.scala (diff)
Commit d0983af38ffb123fa440bc5fcf3912db7658dd28 by gurwls223
Revert "[SPARK-30480][PYSPARK][TESTS] Fix 'test_memory_limit' on pyspark
test"
This reverts commit afd70a0f6fc1b44164e41a57dfc4fd8a5df642e1.
The file was modifiedpython/pyspark/tests/test_worker.py (diff)
Commit 2a629e5d105461e12499503d2e4e95292d66a7fc by gurwls223
[SPARK-30234][SQL] ADD FILE cannot add directories from sql CLI
### What changes were proposed in this pull request? Now users can add
directories from sql CLI as well using ADD FILE command and setting
spark.sql.addDirectory.recursive to true.
### Why are the changes needed? In SPARK-4687, support was added for
adding directories as resources. But sql users cannot use that feature
from CLI.
`ADD FILE /path/to/folder` gives the following error:
`org.apache.spark.SparkException: Added file /path/to/folder is a
directory and recursive is not turned on.`
Users need to turn on `recursive` for adding directories. Thus a
configuration was required which will allow users to turn on
`recursive`. Also Hive allow users to add directories from their shell.
### Does this PR introduce any user-facing change? Yes. Users can set
recursive using `spark.sql.addDirectory.recursive`.
### How was this patch tested? Manually. Will add test cases soon.
SPARK SCREENSHOTS When `spark.sql.addDirectory.recursive` is not turned
on.
![Screenshot from 2019-12-13
08-02-13](https://user-images.githubusercontent.com/15366835/70765124-c6b4a100-1d7f-11ea-9352-9c010af5b38b.png)
After setting `spark.sql.addDirectory.recursive` to true.
![Screenshot from 2019-12-13
08-02-59](https://user-images.githubusercontent.com/15366835/70765118-be5c6600-1d7f-11ea-9faf-0b1c46ee299b.png)
HIVE SCREENSHOT
![Screenshot from 2019-12-13
14-44-41](https://user-images.githubusercontent.com/15366835/70788979-17e08700-1db8-11ea-9c0c-b6d6f6e80a35.png)
`RELEASE_NOTES.txt` is text file while `dummy` is a directory.
Closes #26863 from iRakson/SPARK-30234.
Lead-authored-by: root1 <raksonrakesh@gmail.com> Co-authored-by: iRakson
<raksonrakesh@gmail.com> Signed-off-by: HyukjinKwon
<gurwls223@apache.org>
The file was modifieddocs/sql-migration-guide.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala (diff)
Commit d6532c7079f22f32e90e1c69c25bdfab51c7c53e by tgraves
[SPARK-30448][CORE] accelerator aware scheduling enforce cores as
limiting resource
### What changes were proposed in this pull request?
This PR is to make sure cores is the limiting resource when using
accelerator aware scheduling and fix a few issues with
SparkContext.checkResourcesPerTask
For the first version of accelerator aware scheduling(SPARK-27495), the
SPIP had a condition that we can support dynamic allocation because we
were going to have a strict requirement that we don't waste any
resources. This means that the number of slots each executor has could
be calculated from the number of cores and task cpus just as is done
today.
Somewhere along the line of development we relaxed that and only warn
when we are wasting resources. This breaks the dynamic allocation logic
if the limiting resource is no longer the cores because its using the
cores and task cpus to calculate the number of executors it needs.  This
means we will request less executors then we really need to run
everything. We have to enforce that cores is always the limiting
resource so we should throw if its not.
The only issue with us enforcing this is on cluster managers (standalone
and mesos coarse grained) where we don't know the executor cores up
front by default. Meaning the spark.executor.cores config defaults to 1
but when the executor is started by default it gets all the cores of the
Worker. So we have to add logic specifically to handle that and we can't
enforce this requirements, we can just warn when dynamic allocation is
enabled for those.
### Why are the changes needed?
Bug in dynamic allocation if cores is not limiting resource and warnings
not correct.
### Does this PR introduce any user-facing change?
no
### How was this patch tested?
Unit test added and manually tested the confiditions on local mode,
local cluster mode, standalone mode, and yarn.
Closes #27138 from tgravescs/SPARK-30446.
Authored-by: Thomas Graves <tgraves@nvidia.com> Signed-off-by: Thomas
Graves <tgraves@apache.org>
The file was modifiedcore/src/test/scala/org/apache/spark/SparkContextSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/SparkContext.scala (diff)
Commit b942832bd3bd3bbca6f73606e3f7b1d423e19120 by yamamuro
[SPARK-30343][SQL] Skip unnecessary checks in RewriteDistinctAggregates
### What changes were proposed in this pull request?
This pr intends to skip the unnecessary checks that most aggregate
quries don't need in RewriteDistinctAggregates.
### Why are the changes needed?
For minor optimization.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes #26997 from maropu/OptDistinctAggRewrite.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by:
Takeshi Yamamuro <yamamuro@apache.org>
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala (diff)
Commit 2bd8731813850180bab5887317ecf7fe83f6e8e1 by srowen
[SPARK-30468][SQL] Use multiple lines to display data columns for show
create table command
### What changes were proposed in this pull request? Currently data
columns are displayed in one line for show create table command, when
the table has many columns (to make things even worse, columns may have
long names or comments), the displayed result is really hard to read.
To improve readability, we print each column in a separate line. Note
that other systems like Hive/MySQL also display in this way.
Also, for data columns, table properties and options, we put the right
parenthesis to the end of the last column/property/option, instead of
occupying a separate line.
### Why are the changes needed? for better readability
### Does this PR introduce any user-facing change? before the change:
``` spark-sql> show create table test_table; CREATE TABLE `test_table`
(`col1` INT COMMENT 'This is comment for column 1', `col2` STRING
COMMENT 'This is comment for column 2', `col3` DOUBLE COMMENT 'This is
comment for column 3') USING parquet OPTIONS (
`bar` '2',
`foo` '1'
) TBLPROPERTIES (
'a' = 'x',
'b' = 'y'
)
``` after the change:
``` spark-sql> show create table test_table; CREATE TABLE `test_table` (
`col1` INT COMMENT 'This is comment for column 1',
`col2` STRING COMMENT 'This is comment for column 2',
`col3` DOUBLE COMMENT 'This is comment for column 3') USING parquet
OPTIONS (
`bar` '2',
`foo` '1') TBLPROPERTIES (
'a' = 'x',
'b' = 'y')
```
### How was this patch tested? modified existing tests
Closes #27147 from wzhfy/multi_line_columns.
Authored-by: Zhenhua Wang <wzh_zju@163.com> Signed-off-by: Sean Owen
<srowen@gmail.com>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ShowCreateTableSuite.scala (diff)
The file was modifiedsql/core/src/test/resources/sql-tests/results/show-create-table.sql.out (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala (diff)
Commit 7fb17f59435a76d871251c1b5923f96943f5e540 by vanzin
[SPARK-29779][CORE] Compact old event log files and cleanup
### What changes were proposed in this pull request?
This patch proposes to compact old event log files when end users enable
rolling event log, and clean up these files after compaction.
Here the "compaction" really mean is filtering out listener events for
finished/removed things - like jobs which take most of space for event
log file except SQL related events. To achieve this, compactor does two
phases reading: 1) tracking the live jobs (and more to add) 2) filtering
events via leveraging the information about live things and rewriting to
the "compacted" file.
This approach retains the ability of compatibility on event log file and
adds the possibility of reducing the overall size of event logs. There's
a downside here as well: executor metrics for tasks would be inaccurate,
as compactor will filter out the task events which job is finished, but
I don't feel it as a blocker.
Please note that SPARK-29779 leaves below functionalities for future
JIRA issue as the patch for SPARK-29779 is too huge and we decided to
break down:
* apply filter in SQL events
* integrate compaction into FsHistoryProvider
* documentation about new configuration
### Why are the changes needed?
One of major goal of SPARK-28594 is to prevent the event logs to become
too huge, and SPARK-29779 achieves the goal. We've got another approach
in prior, but the old approach required models in both KVStore and live
entities to guarantee compatibility, while they're not designed to do
so.
### Does this PR introduce any user-facing change?
No.
### How was this patch tested?
Added UTs.
Closes #27085 from HeartSaVioR/SPARK-29779-part1.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Marcelo Vanzin <vanzin@cloudera.com>
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/package.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/EventLogFileReadersSuite.scala (diff)
The file was addedcore/src/test/scala/org/apache/spark/deploy/history/BasicEventFilterSuite.scala
The file was addedcore/src/test/scala/org/apache/spark/deploy/history/EventLogFileCompactorSuite.scala
The file was addedcore/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/EventLogFileWritersSuite.scala (diff)
The file was addedcore/src/main/scala/org/apache/spark/deploy/history/BasicEventFilterBuilder.scala
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/EventLogTestHelper.scala (diff)
The file was addedcore/src/main/scala/org/apache/spark/deploy/history/EventFilter.scala
The file was addedcore/src/test/scala/org/apache/spark/deploy/history/BasicEventFilterBuilderSuite.scala
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/history/EventLogFileReaders.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala (diff)
The file was addedcore/src/main/resources/META-INF/services/org.apache.spark.deploy.history.EventFilterBuilder
The file was addedcore/src/test/scala/org/apache/spark/status/ListenerEventsTestHelper.scala