SuccessChanges

Summary

  1. [SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI (details)
  2. [SPARK-32303][PYTHON][TESTS] Remove leftover from editable mode (details)
  3. [SPARK-32301][PYTHON][TESTS] Add a test case for toPandas to work with (details)
  4. [MINOR][R] Match collectAsArrowToR with non-streaming (details)
  5. [SPARK-32316][TESTS][INFRA] Test PySpark with Python 3.8 in Github (details)
  6. [SPARK-32276][SQL] Remove redundant sorts before repartition nodes (details)
  7. [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in (details)
  8. Revert "[SPARK-32276][SQL] Remove redundant sorts before repartition (details)
  9. [SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's (details)
  10. [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for (details)
  11. [SPARK-32036] Replace references to blacklist/whitelist language with (details)
  12. [SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel (details)
  13. [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) (details)
Commit 90b0c26b222dcb8f207f152494604aac090eb940 by kabhwan.opensource
[SPARK-31608][CORE][WEBUI] Add a new type of KVStore to make loading UI
faster
### What changes were proposed in this pull request? Add a new class
HybridStore to make the history server faster when loading event files.
When rebuilding the application state from event logs, HybridStore will
write data to InMemoryStore at first and use a background thread to dump
data to LevelDB once the writing to InMemoryStore is completed.
HybridStore is to make content serving faster by using more memory. It's
only safe to enable it when the cluster is not having a heavy load.
### Why are the changes needed? HybridStore can greatly reduce the event
logs loading time, especially for large log files. In general, it has 4x
- 6x UI loading speed improvement for large log files. The detailed
result is shown in comments.
### Does this PR introduce any user-facing change? This PR adds new
configs `spark.history.store.hybridStore.enabled` and
`spark.history.store.hybridStore.maxMemoryUsage`.
### How was this patch tested? A test suite for HybridStore is added. I
also manually tested it on 3.1.0 on mac os.
This is a follow-up for the work done by Hieu Huynh in 2019.
Closes #28412 from baohe-zhang/SPARK-31608.
Authored-by: Baohe Zhang <baohe.zhang@verizonmedia.com> Signed-off-by:
Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
The file was modifiedcore/src/main/scala/org/apache/spark/internal/config/History.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala (diff)
The file was addedcore/src/main/scala/org/apache/spark/deploy/history/HistoryServerMemoryManager.scala
The file was addedcore/src/main/scala/org/apache/spark/deploy/history/HybridStore.scala
The file was modifieddocs/monitoring.md (diff)
Commit 902e1342a324c9e1e01dc68817850d9241a58227 by dongjoon
[SPARK-32303][PYTHON][TESTS] Remove leftover from editable mode
installation in PIP test
### What changes were proposed in this pull request?
Currently the Jenkins PIP packaging test fails as below intermediately:
``` Installing dist into virtual env Processing
./python/dist/pyspark-3.1.0.dev0.tar.gz Collecting py4j==0.10.9 (from
pyspark==3.1.0.dev0)
Downloading
https://files.pythonhosted.org/packages/9e/b6/6a4fb90cd235dc8e265a6a2067f2a2c99f0d91787f06aca4bcf7c23f3f80/py4j-0.10.9-py2.py3-none-any.whl
(198kB) Installing collected packages: py4j, pyspark
Found existing installation: py4j 0.10.9
   Uninstalling py4j-0.10.9:
     Successfully uninstalled py4j-0.10.9
Found existing installation: pyspark 3.1.0.dev0 Exception: Traceback
(most recent call last):
File
"/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/cli/base_command.py",
line 179, in main
   status = self.run(options, args)
File
"/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/commands/install.py",
line 393, in run
   use_user_site=options.use_user_site,
File
"/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/__init__.py",
line 50, in install_given_reqs
   auto_confirm=True
File
"/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_install.py",
line 816, in uninstall
   uninstalled_pathset = UninstallPathSet.from_dist(dist)
File
"/home/anaconda/envs/py36/lib/python3.6/site-packages/pip/_internal/req/req_uninstall.py",
line 505, in from_dist
   '(at %s)' % (link_pointer, dist.project_name, dist.location)
AssertionError: Egg-link
/home/jenkins/workspace/SparkPullRequestBuilder3/python does not match
installed
```
- https://github.com/apache/spark/pull/29099#issuecomment-658073453
(amp-jenkins-worker-04)
- https://github.com/apache/spark/pull/29090#issuecomment-657819973
(amp-jenkins-worker-03)
Seems like the previous installation of editable mode affects other PRs.
This PR simply works around by removing the symbolic link from the
previous editable installation. This is a common workaround up to my
knowledge.
### Why are the changes needed?
To recover the Jenkins build.
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Jenkins build will test it out.
Closes #29102 from HyukjinKwon/SPARK-32303.
Lead-authored-by: HyukjinKwon <gurwls223@apache.org> Co-authored-by:
Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun
<dongjoon@apache.org>
The file was modifieddev/run-pip-tests (diff)
Commit 676d92ecceb3d46baa524c725b9f9a14450f1e9d by gurwls223
[SPARK-32301][PYTHON][TESTS] Add a test case for toPandas to work with
empty partitioned Spark DataFrame
### What changes were proposed in this pull request?
This PR proposes to port the test case from
https://github.com/apache/spark/pull/29098 to branch-3.0 and master.  In
the master and branch-3.0, this was fixed together at
https://github.com/apache/spark/commit/ecaa495b1fe532c36e952ccac42f4715809476af
but no partition case is not being tested.
### Why are the changes needed?
To improve test coverage.
### Does this PR introduce _any_ user-facing change?
No, test-only.
### How was this patch tested?
Unit test was forward-ported.
Closes #29099 from HyukjinKwon/SPARK-32300-1.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
The file was modifiedpython/pyspark/sql/tests/test_arrow.py (diff)
Commit 03b5707b516187aaa8012049fce8b1cd0ac0fddd by gurwls223
[MINOR][R] Match collectAsArrowToR with non-streaming
collectAsArrowToPython
### What changes were proposed in this pull request?
This PR proposes to port forward #29098 to `collectAsArrowToR`.
`collectAsArrowToR` follows `collectAsArrowToPython` in branch-2.4 due
to the limitation of ARROW-4512. SparkR vectorization currently cannot
use streaming format.
### Why are the changes needed?
For simplicity and consistency.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
The same code is being tested in `collectAsArrowToPython` of branch-2.4.
Closes #29100 from HyukjinKwon/minor-parts.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/Dataset.scala (diff)
Commit 6bdd710c4d4125b0801a93d57f53e05e301ebebd by dongjoon
[SPARK-32316][TESTS][INFRA] Test PySpark with Python 3.8 in Github
Actions
### What changes were proposed in this pull request?
This PR aims to test PySpark with Python 3.8 in Github Actions. In the
script side, it is already ready:
https://github.com/apache/spark/blob/4ad9bfd53b84a6d2497668c73af6899bae14c187/python/run-tests.py#L161
This PR includes small related fixes together:
1. Install Python 3.8 2. Only install one Python implementation instead
of installing many for SQL and Yarn test cases because they need one
Python executable in their test cases that is higher than Python 2. 3.
Do not install Python 2 which is not needed anymore after we dropped
Python 2 at SPARK-32138 4. Remove a comment about installing PyPy3 on
Jenkins - SPARK-32278. It is already installed.
### Why are the changes needed?
Currently, only PyPy3 and Python 3.6 are being tested with PySpark in
Github Actions. We should test the latest version of Python as well
because some optimizations can be only enabled with Python 3.8+. See
also https://github.com/apache/spark/pull/29114
### Does this PR introduce _any_ user-facing change?
No, dev-only.
### How was this patch tested?
Was not tested. Github Actions build in this PR will test it out.
Closes #29116 from HyukjinKwon/test-python3.8-togehter.
Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
The file was modifiedpython/run-tests.py (diff)
The file was modified.github/workflows/master.yml (diff)
Commit af8e65fca989518cf65ec47f77eea2ce649bd6bb by dongjoon
[SPARK-32276][SQL] Remove redundant sorts before repartition nodes
### What changes were proposed in this pull request?
This PR removes redundant sorts before repartition nodes with shuffles
and repartitionByExpression with deterministic expressions.
### Why are the changes needed?
It looks like our `EliminateSorts` rule can be extended further to
remove sorts before repartition nodes that shuffle data as such
repartition operations change the ordering and distribution of data.
That's why it seems safe to perform the following rewrites:
- `Repartition -> Sort -> Scan` as `Repartition -> Scan`
- `Repartition -> Project -> Sort -> Scan` as `Repartition -> Project ->
Scan`
We don't apply this optimization to coalesce as it uses
`DefaultPartitionCoalescer` that may preserve the ordering of data if
there is no locality info in the parent RDD. At the same time, there is
no guarantee that will happen.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
More test cases.
Closes #29089 from aokolnychyi/spark-32276.
Authored-by: Anton Okolnychyi <aokolnychyi@apple.com> Signed-off-by:
Dongjoon Hyun <dongjoon@apache.org>
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was addedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsBeforeRepartitionSuite.scala
Commit 542aefb4c4dd5ca2734773ffe983ba740729d074 by kabhwan.opensource
[SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in
continuous mode
### What changes were proposed in this pull request?
This removes the undocumented and incomplete feature of "stateful
aggregation" in continuous mode, which would reduce 1100+ lines of code.
### Why are the changes needed?
The work for the feature had been stopped for over an year, and no one
asked/requested for the availability of such feature in community.
Current state for the feature is that it only works with `coalesce(1)`
which force the query to read and process, and write in "a" task, which
doesn't make sense in production.
The remaining code increases the work on DSv2 changes as well - that's
why I don't simply propose reverting relevant commits - the code path
has been changed due to DSv2 evolution.
### Does this PR introduce _any_ user-facing change?
Technically no, because it's never documented and can't be used in
production in current shape.
### How was this patch tested?
Existing tests.
Closes #29077 from HeartSaVioR/SPARK-31985.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
The file was removedsql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/ContinuousAggregationSuite.scala
The file was removedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleWriter.scala
The file was removedsql/core/src/test/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleSuite.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala (diff)
The file was removedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/RPCContinuousShuffleReader.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
The file was removedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceExec.scala
The file was removedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
The file was removedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReader.scala
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala (diff)
The file was removedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleWriter.scala
The file was removedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala (diff)
Commit 2527fbc896dc8a26f5a281ed719fb59b5df8cd2f by dongjoon
Revert "[SPARK-32276][SQL] Remove redundant sorts before repartition
nodes"
This reverts commit af8e65fca989518cf65ec47f77eea2ce649bd6bb.
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was removedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsBeforeRepartitionSuite.scala
Commit e4499932da03743cb05c6bcc5d0149728380383a by dkbiswal
[SPARK-31480][SQL] Improve the EXPLAIN FORMATTED's output for DSV2's
Scan Node
### What changes were proposed in this pull request? Improve the EXPLAIN
FORMATTED output of DSV2 Scan nodes (file based ones).
**Before**
```
== Physical Plan ==
* Project (4)
+- * Filter (3)
  +- * ColumnarToRow (2)
     +- BatchScan (1)
(1) BatchScan Output [2]: [value#7, id#8] Arguments: [value#7, id#8],
ParquetScan(org.apache.spark.sql.test.TestSparkSession17477bbb,Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml,
__spark_hadoop_conf__.xml,org.apache.spark.sql.execution.datasources.InMemoryFileIndexa6c363ce,StructType(StructField(value,IntegerType,true)),StructType(StructField(value,IntegerType,true)),StructType(StructField(id,IntegerType,true)),[Lorg.apache.spark.sql.sources.Filter;40fee459,org.apache.spark.sql.util.CaseInsensitiveStringMapfeca1ec6,Vector(isnotnull(id#8),
(id#8 > 1)),List(isnotnull(value#7), (value#7 > 2)))
(2) ...
(3) ...
(4) ...
```
**After**
```
== Physical Plan ==
* Project (4)
+- * Filter (3)
  +- * ColumnarToRow (2)
     +- BatchScan (1)
(1) BatchScan Output [2]: [value#7, id#8] DataFilters:
[isnotnull(value#7), (value#7 > 2)] Format: parquet Location:
InMemoryFileIndex[....] PartitionFilters: [isnotnull(id#8), (id#8 > 1)]
PushedFilers: [IsNotNull(id), IsNotNull(value), GreaterThan(id,1),
GreaterThan(value,2)] ReadSchema: struct<value:int>
(2) ...
(3) ...
(4) ...
```
### Why are the changes needed? The old format is not very readable.
This improves the readability of the plan.
### Does this PR introduce any user-facing change? Yes. the explain
output will be different.
### How was this patch tested? Added a test case in ExplainSuite.
Closes #28425 from dilipbiswal/dkb_dsv2_explain.
Lead-authored-by: Dilip Biswal <dkbiswal@gmail.com> Co-authored-by:
Dilip Biswal <dkbiswal@apache.org> Signed-off-by: Dilip Biswal
<dkbiswal@apache.org>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileScan.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala (diff)
The file was addedsql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SupportsMetadata.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScan.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/csv/CSVScan.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScan.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala (diff)
The file was modifiedexternal/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala (diff)
Commit 8950dcbb1cafccc2ba8bbf030ab7ac86cfe203a4 by dongjoon
[SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for
ORDER BY in DISTRIBUTE BY
### What changes were proposed in this pull request?
This PR aims to add a test case to EliminateSortsSuite to protect a
valid use case which is using ORDER BY in DISTRIBUTE BY statement.
### Why are the changes needed?
```scala scala> scala.util.Random.shuffle((1 to 100000).map(x => (x % 2,
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
scala> sql("select * from (select * from t order by b) distribute by
a").write.orc("/tmp/master")
$ ls -al /tmp/master/ total 56 drwxr-xr-x  10 dongjoon  wheel  320 Jul
14 22:12 ./ drwxrwxrwt  15 root      wheel  480 Jul 14 22:12 ../
-rw-r--r--   1 dongjoon  wheel    8 Jul 14 22:12 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12
.part-00000-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel    0 Jul 14 22:12 _SUCCESS
-rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12
part-00000-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
```
The following was found during SPARK-32276. If Spark optimizer removes
the inner `ORDER BY`, the file size increases.
```scala scala> scala.util.Random.shuffle((1 to 100000).map(x => (x % 2,
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
scala> sql("select * from (select * from t order by b) distribute by
a").write.orc("/tmp/SPARK-32276")
$ ls -al /tmp/SPARK-32276/ total 632 drwxr-xr-x  10 dongjoon  wheel   
320 Jul 14 22:08 ./ drwxrwxrwt  14 root      wheel     448 Jul 14 22:08
../
-rw-r--r--   1 dongjoon  wheel       8 Jul 14 22:08 ._SUCCESS.crc
-rw-r--r--   1 dongjoon  wheel      12 Jul 14 22:08
.part-00000-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel    1188 Jul 14 22:08
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel    1188 Jul 14 22:08
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
-rw-r--r--   1 dongjoon  wheel       0 Jul 14 22:08 _SUCCESS
-rw-r--r--   1 dongjoon  wheel     119 Jul 14 22:08
part-00000-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
-rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
```
### Does this PR introduce _any_ user-facing change?
No. This only improves the test coverage.
### How was this patch tested?
Pass the GitHub Action or Jenkins.
Closes #29118 from dongjoon-hyun/SPARK-32318.
Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala (diff)
Commit cf22d947fb8f37aa4d394b6633d6f08dbbf6dc1c by tgraves
[SPARK-32036] Replace references to blacklist/whitelist language with
more appropriate terminology, excluding the blacklisting feature
### What changes were proposed in this pull request?
This PR will remove references to these "blacklist" and "whitelist"
terms besides the blacklisting feature as a whole, which can be handled
in a separate JIRA/PR.
This touches quite a few files, but the changes are straightforward
(variable/method/etc. name changes) and most quite self-contained.
### Why are the changes needed?
As per discussion on the Spark dev list, it will be beneficial to remove
references to problematic language that can alienate potential community
members. One such reference is "blacklist" and "whitelist". While it
seems to me that there is some valid debate as to whether these terms
have racist origins, the cultural connotations are inescapable in
today's world.
### Does this PR introduce _any_ user-facing change?
In the test file `HiveQueryFileTest`, a developer has the ability to
specify the system property `spark.hive.whitelist` to specify a list of
Hive query files that should be tested. This system property has been
renamed to `spark.hive.includelist`. The old property has been kept for
compatibility, but will log a warning if used. I am open to feedback
from others on whether keeping a deprecated property here is unnecessary
given that this is just for developers running tests.
### How was this patch tested?
Existing tests should be suitable since no behavior changes are expected
as a result of this PR.
Closes #28874 from xkrogen/xkrogen-SPARK-32036-rename-blacklists.
Authored-by: Erik Krogen <ekrogen@linkedin.com> Signed-off-by: Thomas
Graves <tgraves@apache.org>
The file was modifiedpython/run-tests.py (diff)
The file was modifiedresource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala (diff)
The file was modifiedR/pkg/tests/fulltests/test_context.R (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonOutputWriter.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala (diff)
The file was removedsql/hive/src/test/resources/ql/src/test/queries/clientpositive/alter_partition_with_whitelist.q
The file was addedsql/hive/src/test/resources/ql/src/test/queries/clientpositive/alter_partition_with_includelist.q
The file was addedsql/hive/src/test/resources/ql/src/test/queries/clientpositive/add_partition_no_includelist.q
The file was modifiedexternal/docker-integration-tests/src/test/scala/org/apache/spark/util/DockerUtils.scala (diff)
The file was modifiedsql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala (diff)
The file was modifieddev/sparktestsupport/modules.py (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala (diff)
The file was modifiedexamples/src/main/scala/org/apache/spark/examples/streaming/RecoverableNetworkWordCount.scala (diff)
The file was modifiedproject/SparkBuild.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/util/JsonProtocol.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala (diff)
The file was modifiedpython/pylintrc (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala (diff)
The file was modifiedexamples/src/main/python/streaming/recoverable_network_wordcount.py (diff)
The file was removedsql/hive/src/test/resources/ql/src/test/queries/clientpositive/add_partition_no_whitelist.q
The file was modifiedcore/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/AggregationQuerySuite.scala (diff)
The file was modifiedpython/pyspark/cloudpickle.py (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala (diff)
The file was modifiedstreaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala (diff)
The file was modifiedsql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala (diff)
The file was modifiedR/pkg/tests/run-all.R (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala (diff)
The file was modifiedsql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQueryFileTest.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala (diff)
The file was addedsql/core/src/test/resources/sql-tests/inputs/ignored.sql
The file was removedsql/core/src/test/resources/sql-tests/inputs/blacklist.sql
The file was modifiedresource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala (diff)
The file was modifiedcore/src/main/scala/org/apache/spark/scheduler/OutputCommitCoordinator.scala (diff)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedpython/pyspark/sql/pandas/typehints.py (diff)
The file was removedsql/hive/src/test/resources/ql/src/test/queries/clientpositive/add_partition_with_whitelist.q
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/security/HiveHadoopDelegationTokenManagerSuite.scala (diff)
The file was modifiedexamples/src/main/java/org/apache/spark/examples/streaming/JavaRecoverableNetworkWordCount.java (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullupCorrelatedPredicatesSuite.scala (diff)
The file was modifiedR/pkg/tests/fulltests/test_sparkSQL.R (diff)
The file was addedsql/hive/src/test/resources/ql/src/test/queries/clientpositive/add_partition_with_includelist.q
The file was modifiedcore/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala (diff)
The file was modifiedcommon/network-common/src/main/java/org/apache/spark/network/crypto/README.md (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ThreadAudit.scala (diff)
The file was modifieddocs/streaming-programming-guide.md (diff)
Commit b05f309bc9e51e8f7b480b5d176589773b5d59f7 by huaxing
[SPARK-32140][ML][PYSPARK] Add training summary to FMClassificationModel
### What changes were proposed in this pull request? Add training
summary for FMClassificationModel...
### Why are the changes needed? so that user can get the training
process status, such as loss value of each iteration and total iteration
number.
### Does this PR introduce _any_ user-facing change? Yes
FMClassificationModel.summary FMClassificationModel.evaluate
### How was this patch tested? new tests
Closes #28960 from huaxingao/fm_summary.
Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Huaxin Gao
<huaxing@us.ibm.com>
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/classification/FMClassifierSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/optimization/LBFGS.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/regression/FMRegressor.scala (diff)
The file was modifiedpython/pyspark/ml/classification.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_training_summary.py (diff)
Commit c28a6fa5112c9ba3839f52b737266f24fdfcf75b by dongjoon
[SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc)
for Scala 2.13 compilation
### What changes were proposed in this pull request?
Same as https://github.com/apache/spark/pull/29078 and
https://github.com/apache/spark/pull/28971 . This makes the rest of the
default modules (i.e. those you get without specifying `-Pyarn` etc)
compile under Scala 2.13. It does not close the JIRA, as a result. this
also of course does not demonstrate that tests pass yet in 2.13.
Note, this does not fix the `repl` module; that's separate.
### Why are the changes needed?
Eventually, we need to support a Scala 2.13 build, perhaps in Spark 3.1.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing tests. (2.13 was not tested; this is about getting it to
compile without breaking 2.12)
Closes #29111 from srowen/SPARK-29292.3.
Authored-by: Sean Owen <srowen@gmail.com> Signed-off-by: Dongjoon Hyun
<dongjoon@apache.org>
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/Word2Vec.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/clustering/BisectingKMeansSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Variance.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/recommendation/ALSSuite.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/util/NumericParser.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/clustering/GaussianMixture.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/Estimator.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Gini.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/HiveShowCreateTableSuite.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveOptions.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala (diff)
The file was modifiedexamples/src/main/scala/org/apache/spark/examples/SparkKMeans.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala (diff)
The file was modifiedexternal/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala (diff)
The file was modifiedmllib/src/test/scala/org/apache/spark/ml/feature/NormalizerSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/RobustScaler.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/tree/impurity/Entropy.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/param/params.scala (diff)
The file was modifiedsql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/clustering/GaussianMixture.scala (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala (diff)
The file was modifiedexamples/src/main/java/org/apache/spark/examples/ml/JavaTokenizerExample.java (diff)
The file was modifiedsql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala (diff)