SuccessChanges

Summary

  1. [SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0 (commit: eb74d55fb541cdb10c1772d58d0105e3d4bc38b3) (details)
  2. [SPARK-32319][PYSPARK] Disallow the use of unused imports (commit: 9fcf0ea71820f7331504073045c38820e50141c7) (details)
  3. [MINOR][DOCS] Fix typos at ExecutorAllocationManager.scala (commit: dc3fac81848f557e3dac3f35686af325a18d0291) (details)
  4. [SPARK-32555][SQL] Add unique ID on query execution (commit: 8062c1f777691d74c9c75f53f0a32cd0a3005fed) (details)
  5. [SPARK-32564][SQL][TEST][FOLLOWUP] Re-enable TPCDSQuerySuite with empty (commit: 1df855bef2b2dbe330cafb0d10e0b4af813a311a) (details)
  6. [SPARK-32462][WEBUI] Reset previous search text for datatable (commit: 34d9f1cf4c97647dc7ff6cb49ff067258c91ab45) (details)
Commit eb74d55fb541cdb10c1772d58d0105e3d4bc38b3 by gurwls223
[SPARK-32568][BUILD][SS] Upgrade Kafka to 2.6.0
### What changes were proposed in this pull request?
This PR aims to update Kafka client library to 2.6.0 for Apache Spark
3.1.0.
### Why are the changes needed?
This will bring client-side bug fixes like KAFKA-10134 and KAFKA-10223.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the existing tests.
Closes #29386 from dongjoon-hyun/SPARK-32568.
Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by:
HyukjinKwon <gurwls223@apache.org>
(commit: eb74d55fb541cdb10c1772d58d0105e3d4bc38b3)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala (diff)
The file was modifiedpom.xml (diff)
Commit 9fcf0ea71820f7331504073045c38820e50141c7 by dongjoon
[SPARK-32319][PYSPARK] Disallow the use of unused imports
Disallow the use of unused imports:
- Unnecessary increases the memory footprint of the application
- Removes the imports that are required for the examples in the
docstring from the file-scope to the example itself. This keeps the
files itself clean, and gives a more complete example as it also
includes the imports :)
``` fokkodriesprongFan spark % flake8 python | grep -i "imported but
unused" python/pyspark/cloudpickle.py:46:1: F401 'functools.partial'
imported but unused python/pyspark/cloudpickle.py:55:1: F401 'traceback'
imported but unused python/pyspark/heapq3.py:868:5: F401 '_heapq.*'
imported but unused python/pyspark/__init__.py:61:1: F401
'pyspark.version.__version__' imported but unused
python/pyspark/__init__.py:62:1: F401 'pyspark._globals._NoValue'
imported but unused python/pyspark/__init__.py:115:1: F401
'pyspark.sql.SQLContext' imported but unused
python/pyspark/__init__.py:115:1: F401 'pyspark.sql.HiveContext'
imported but unused python/pyspark/__init__.py:115:1: F401
'pyspark.sql.Row' imported but unused python/pyspark/rdd.py:21:1: F401
're' imported but unused python/pyspark/rdd.py:29:1: F401
'tempfile.NamedTemporaryFile' imported but unused
python/pyspark/mllib/regression.py:26:1: F401
'pyspark.mllib.linalg.SparseVector' imported but unused
python/pyspark/mllib/clustering.py:28:1: F401
'pyspark.mllib.linalg.SparseVector' imported but unused
python/pyspark/mllib/clustering.py:28:1: F401
'pyspark.mllib.linalg.DenseVector' imported but unused
python/pyspark/mllib/classification.py:26:1: F401
'pyspark.mllib.linalg.SparseVector' imported but unused
python/pyspark/mllib/feature.py:28:1: F401
'pyspark.mllib.linalg.DenseVector' imported but unused
python/pyspark/mllib/feature.py:28:1: F401
'pyspark.mllib.linalg.SparseVector' imported but unused
python/pyspark/mllib/feature.py:30:1: F401
'pyspark.mllib.regression.LabeledPoint' imported but unused
python/pyspark/mllib/tests/test_linalg.py:18:1: F401 'sys' imported but
unused python/pyspark/mllib/tests/test_linalg.py:642:5: F401
'pyspark.mllib.tests.test_linalg.*' imported but unused
python/pyspark/mllib/tests/test_feature.py:21:1: F401 'numpy.random'
imported but unused python/pyspark/mllib/tests/test_feature.py:21:1:
F401 'numpy.exp' imported but unused
python/pyspark/mllib/tests/test_feature.py:23:1: F401
'pyspark.mllib.linalg.Vector' imported but unused
python/pyspark/mllib/tests/test_feature.py:23:1: F401
'pyspark.mllib.linalg.VectorUDT' imported but unused
python/pyspark/mllib/tests/test_feature.py:185:5: F401
'pyspark.mllib.tests.test_feature.*' imported but unused
python/pyspark/mllib/tests/test_util.py:97:5: F401
'pyspark.mllib.tests.test_util.*' imported but unused
python/pyspark/mllib/tests/test_stat.py:23:1: F401
'pyspark.mllib.linalg.Vector' imported but unused
python/pyspark/mllib/tests/test_stat.py:23:1: F401
'pyspark.mllib.linalg.SparseVector' imported but unused
python/pyspark/mllib/tests/test_stat.py:23:1: F401
'pyspark.mllib.linalg.DenseVector' imported but unused
python/pyspark/mllib/tests/test_stat.py:23:1: F401
'pyspark.mllib.linalg.VectorUDT' imported but unused
python/pyspark/mllib/tests/test_stat.py:23:1: F401
'pyspark.mllib.linalg._convert_to_vector' imported but unused
python/pyspark/mllib/tests/test_stat.py:23:1: F401
'pyspark.mllib.linalg.DenseMatrix' imported but unused
python/pyspark/mllib/tests/test_stat.py:23:1: F401
'pyspark.mllib.linalg.SparseMatrix' imported but unused
python/pyspark/mllib/tests/test_stat.py:23:1: F401
'pyspark.mllib.linalg.MatrixUDT' imported but unused
python/pyspark/mllib/tests/test_stat.py:181:5: F401
'pyspark.mllib.tests.test_stat.*' imported but unused
python/pyspark/mllib/tests/test_streaming_algorithms.py:18:1: F401
'time.time' imported but unused
python/pyspark/mllib/tests/test_streaming_algorithms.py:18:1: F401
'time.sleep' imported but unused
python/pyspark/mllib/tests/test_streaming_algorithms.py:470:5: F401
'pyspark.mllib.tests.test_streaming_algorithms.*' imported but unused
python/pyspark/mllib/tests/test_algorithms.py:295:5: F401
'pyspark.mllib.tests.test_algorithms.*' imported but unused
python/pyspark/tests/test_serializers.py:90:13: F401 'xmlrunner'
imported but unused python/pyspark/tests/test_rdd.py:21:1: F401 'sys'
imported but unused python/pyspark/tests/test_rdd.py:29:1: F401
'pyspark.resource.ResourceProfile' imported but unused
python/pyspark/tests/test_rdd.py:885:5: F401 'pyspark.tests.test_rdd.*'
imported but unused python/pyspark/tests/test_readwrite.py:19:1: F401
'sys' imported but unused python/pyspark/tests/test_readwrite.py:22:1:
F401 'array.array' imported but unused
python/pyspark/tests/test_readwrite.py:309:5: F401
'pyspark.tests.test_readwrite.*' imported but unused
python/pyspark/tests/test_join.py:62:5: F401 'pyspark.tests.test_join.*'
imported but unused python/pyspark/tests/test_taskcontext.py:19:1: F401
'shutil' imported but unused
python/pyspark/tests/test_taskcontext.py:325:5: F401
'pyspark.tests.test_taskcontext.*' imported but unused
python/pyspark/tests/test_conf.py:36:5: F401 'pyspark.tests.test_conf.*'
imported but unused python/pyspark/tests/test_broadcast.py:148:5: F401
'pyspark.tests.test_broadcast.*' imported but unused
python/pyspark/tests/test_daemon.py:76:5: F401
'pyspark.tests.test_daemon.*' imported but unused
python/pyspark/tests/test_util.py:77:5: F401 'pyspark.tests.test_util.*'
imported but unused python/pyspark/tests/test_pin_thread.py:19:1: F401
'random' imported but unused
python/pyspark/tests/test_pin_thread.py:149:5: F401
'pyspark.tests.test_pin_thread.*' imported but unused
python/pyspark/tests/test_worker.py:19:1: F401 'sys' imported but unused
python/pyspark/tests/test_worker.py:26:5: F401 'resource' imported but
unused python/pyspark/tests/test_worker.py:203:5: F401
'pyspark.tests.test_worker.*' imported but unused
python/pyspark/tests/test_profiler.py:101:5: F401
'pyspark.tests.test_profiler.*' imported but unused
python/pyspark/tests/test_shuffle.py:18:1: F401 'sys' imported but
unused python/pyspark/tests/test_shuffle.py:171:5: F401
'pyspark.tests.test_shuffle.*' imported but unused
python/pyspark/tests/test_rddbarrier.py:43:5: F401
'pyspark.tests.test_rddbarrier.*' imported but unused
python/pyspark/tests/test_context.py:129:13: F401
'userlibrary.UserClass' imported but unused
python/pyspark/tests/test_context.py:140:13: F401 'userlib.UserClass'
imported but unused python/pyspark/tests/test_context.py:310:5: F401
'pyspark.tests.test_context.*' imported but unused
python/pyspark/tests/test_appsubmit.py:241:5: F401
'pyspark.tests.test_appsubmit.*' imported but unused
python/pyspark/streaming/dstream.py:18:1: F401 'sys' imported but unused
python/pyspark/streaming/tests/test_dstream.py:27:1: F401 'pyspark.RDD'
imported but unused
python/pyspark/streaming/tests/test_dstream.py:647:5: F401
'pyspark.streaming.tests.test_dstream.*' imported but unused
python/pyspark/streaming/tests/test_kinesis.py:83:5: F401
'pyspark.streaming.tests.test_kinesis.*' imported but unused
python/pyspark/streaming/tests/test_listener.py:152:5: F401
'pyspark.streaming.tests.test_listener.*' imported but unused
python/pyspark/streaming/tests/test_context.py:178:5: F401
'pyspark.streaming.tests.test_context.*' imported but unused
python/pyspark/testing/utils.py:30:5: F401 'scipy.sparse' imported but
unused python/pyspark/testing/utils.py:36:5: F401 'numpy as np' imported
but unused python/pyspark/ml/regression.py:25:1: F401
'pyspark.ml.tree._TreeEnsembleParams' imported but unused
python/pyspark/ml/regression.py:25:1: F401
'pyspark.ml.tree._HasVarianceImpurity' imported but unused
python/pyspark/ml/regression.py:29:1: F401
'pyspark.ml.wrapper.JavaParams' imported but unused
python/pyspark/ml/util.py:19:1: F401 'sys' imported but unused
python/pyspark/ml/__init__.py:25:1: F401 'pyspark.ml.pipeline' imported
but unused python/pyspark/ml/pipeline.py:18:1: F401 'sys' imported but
unused python/pyspark/ml/stat.py:22:1: F401
'pyspark.ml.linalg.DenseMatrix' imported but unused
python/pyspark/ml/stat.py:22:1: F401 'pyspark.ml.linalg.Vectors'
imported but unused
python/pyspark/ml/tests/test_training_summary.py:18:1: F401 'sys'
imported but unused
python/pyspark/ml/tests/test_training_summary.py:364:5: F401
'pyspark.ml.tests.test_training_summary.*' imported but unused
python/pyspark/ml/tests/test_linalg.py:381:5: F401
'pyspark.ml.tests.test_linalg.*' imported but unused
python/pyspark/ml/tests/test_tuning.py:427:9: F401
'pyspark.sql.functions as F' imported but unused
python/pyspark/ml/tests/test_tuning.py:757:5: F401
'pyspark.ml.tests.test_tuning.*' imported but unused
python/pyspark/ml/tests/test_wrapper.py:120:5: F401
'pyspark.ml.tests.test_wrapper.*' imported but unused
python/pyspark/ml/tests/test_feature.py:19:1: F401 'sys' imported but
unused python/pyspark/ml/tests/test_feature.py:304:5: F401
'pyspark.ml.tests.test_feature.*' imported but unused
python/pyspark/ml/tests/test_image.py:19:1: F401 'py4j' imported but
unused python/pyspark/ml/tests/test_image.py:22:1: F401
'pyspark.testing.mlutils.PySparkTestCase' imported but unused
python/pyspark/ml/tests/test_image.py:71:5: F401
'pyspark.ml.tests.test_image.*' imported but unused
python/pyspark/ml/tests/test_persistence.py:456:5: F401
'pyspark.ml.tests.test_persistence.*' imported but unused
python/pyspark/ml/tests/test_evaluation.py:56:5: F401
'pyspark.ml.tests.test_evaluation.*' imported but unused
python/pyspark/ml/tests/test_stat.py:43:5: F401
'pyspark.ml.tests.test_stat.*' imported but unused
python/pyspark/ml/tests/test_base.py:70:5: F401
'pyspark.ml.tests.test_base.*' imported but unused
python/pyspark/ml/tests/test_param.py:20:1: F401 'sys' imported but
unused python/pyspark/ml/tests/test_param.py:375:5: F401
'pyspark.ml.tests.test_param.*' imported but unused
python/pyspark/ml/tests/test_pipeline.py:62:5: F401
'pyspark.ml.tests.test_pipeline.*' imported but unused
python/pyspark/ml/tests/test_algorithms.py:333:5: F401
'pyspark.ml.tests.test_algorithms.*' imported but unused
python/pyspark/ml/param/__init__.py:18:1: F401 'sys' imported but unused
python/pyspark/resource/tests/test_resources.py:17:1: F401 'random'
imported but unused
python/pyspark/resource/tests/test_resources.py:20:1: F401
'pyspark.resource.ResourceProfile' imported but unused
python/pyspark/resource/tests/test_resources.py:75:5: F401
'pyspark.resource.tests.test_resources.*' imported but unused
python/pyspark/sql/functions.py:32:1: F401
'pyspark.sql.udf.UserDefinedFunction' imported but unused
python/pyspark/sql/functions.py:34:1: F401
'pyspark.sql.pandas.functions.pandas_udf' imported but unused
python/pyspark/sql/session.py:30:1: F401 'pyspark.sql.types.Row'
imported but unused python/pyspark/sql/session.py:30:1: F401
'pyspark.sql.types.StringType' imported but unused
python/pyspark/sql/readwriter.py:1084:5: F401 'pyspark.sql.Row' imported
but unused python/pyspark/sql/context.py:26:1: F401
'pyspark.sql.types.IntegerType' imported but unused
python/pyspark/sql/context.py:26:1: F401 'pyspark.sql.types.Row'
imported but unused python/pyspark/sql/context.py:26:1: F401
'pyspark.sql.types.StringType' imported but unused
python/pyspark/sql/context.py:27:1: F401
'pyspark.sql.udf.UDFRegistration' imported but unused
python/pyspark/sql/streaming.py:1212:5: F401 'pyspark.sql.Row' imported
but unused python/pyspark/sql/tests/test_utils.py:55:5: F401
'pyspark.sql.tests.test_utils.*' imported but unused
python/pyspark/sql/tests/test_pandas_map.py:18:1: F401 'sys' imported
but unused python/pyspark/sql/tests/test_pandas_map.py:22:1: F401
'pyspark.sql.functions.pandas_udf' imported but unused
python/pyspark/sql/tests/test_pandas_map.py:22:1: F401
'pyspark.sql.functions.PandasUDFType' imported but unused
python/pyspark/sql/tests/test_pandas_map.py:119:5: F401
'pyspark.sql.tests.test_pandas_map.*' imported but unused
python/pyspark/sql/tests/test_catalog.py:193:5: F401
'pyspark.sql.tests.test_catalog.*' imported but unused
python/pyspark/sql/tests/test_group.py:39:5: F401
'pyspark.sql.tests.test_group.*' imported but unused
python/pyspark/sql/tests/test_session.py:361:5: F401
'pyspark.sql.tests.test_session.*' imported but unused
python/pyspark/sql/tests/test_conf.py:49:5: F401
'pyspark.sql.tests.test_conf.*' imported but unused
python/pyspark/sql/tests/test_pandas_cogrouped_map.py:19:1: F401 'sys'
imported but unused
python/pyspark/sql/tests/test_pandas_cogrouped_map.py:21:1: F401
'pyspark.sql.functions.sum' imported but unused
python/pyspark/sql/tests/test_pandas_cogrouped_map.py:21:1: F401
'pyspark.sql.functions.PandasUDFType' imported but unused
python/pyspark/sql/tests/test_pandas_cogrouped_map.py:29:5: F401
'pandas.util.testing.assert_series_equal' imported but unused
python/pyspark/sql/tests/test_pandas_cogrouped_map.py:32:5: F401
'pyarrow as pa' imported but unused
python/pyspark/sql/tests/test_pandas_cogrouped_map.py:248:5: F401
'pyspark.sql.tests.test_pandas_cogrouped_map.*' imported but unused
python/pyspark/sql/tests/test_udf.py:24:1: F401 'py4j' imported but
unused python/pyspark/sql/tests/test_pandas_udf_typehints.py:246:5: F401
'pyspark.sql.tests.test_pandas_udf_typehints.*' imported but unused
python/pyspark/sql/tests/test_functions.py:19:1: F401 'sys' imported but
unused python/pyspark/sql/tests/test_functions.py:362:9: F401
'pyspark.sql.functions.exists' imported but unused
python/pyspark/sql/tests/test_functions.py:387:5: F401
'pyspark.sql.tests.test_functions.*' imported but unused
python/pyspark/sql/tests/test_pandas_udf_scalar.py:21:1: F401 'sys'
imported but unused
python/pyspark/sql/tests/test_pandas_udf_scalar.py:45:5: F401 'pyarrow
as pa' imported but unused
python/pyspark/sql/tests/test_pandas_udf_window.py:355:5: F401
'pyspark.sql.tests.test_pandas_udf_window.*' imported but unused
python/pyspark/sql/tests/test_arrow.py:38:5: F401 'pyarrow as pa'
imported but unused
python/pyspark/sql/tests/test_pandas_grouped_map.py:20:1: F401 'sys'
imported but unused
python/pyspark/sql/tests/test_pandas_grouped_map.py:38:5: F401 'pyarrow
as pa' imported but unused
python/pyspark/sql/tests/test_dataframe.py:382:9: F401
'pyspark.sql.DataFrame' imported but unused
python/pyspark/sql/avro/functions.py:125:5: F401 'pyspark.sql.Row'
imported but unused python/pyspark/sql/pandas/functions.py:19:1: F401
'sys' imported but unused
```
After:
``` fokkodriesprongFan spark % flake8 python | grep -i "imported but
unused" fokkodriesprongFan spark %
```
### What changes were proposed in this pull request?
Removing unused imports from the Python files to keep everything nice
and tidy.
### Why are the changes needed?
Cleaning up of the imports that aren't used, and suppressing the imports
that are used as references to other modules, preserving backward
compatibility.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Adding the rule to the existing Flake8 checks.
Closes #29121 from Fokko/SPARK-32319.
Authored-by: Fokko Driesprong <fokko@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
(commit: 9fcf0ea71820f7331504073045c38820e50141c7)
The file was modifieddev/tox.ini (diff)
The file was modifiedpython/pyspark/mllib/tests/test_linalg.py (diff)
The file was modifiedpython/pyspark/mllib/tests/test_stat.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_image.py (diff)
The file was modifiedpython/pyspark/tests/test_profiler.py (diff)
The file was modifiedpython/pyspark/tests/test_shuffle.py (diff)
The file was modifiedpython/pyspark/rdd.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_cogrouped_map.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_catalog.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_conf.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_linalg.py (diff)
The file was modifiedpython/pyspark/tests/test_worker.py (diff)
The file was modifiedpython/pyspark/ml/param/__init__.py (diff)
The file was modifiedexamples/src/main/python/status_api_demo.py (diff)
The file was modifiedpython/pyspark/mllib/feature.py (diff)
The file was modifiedpython/pyspark/streaming/dstream.py (diff)
The file was modifieddev/lint-python (diff)
The file was modifiedpython/pyspark/ml/tests/test_pipeline.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_types.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_udf_grouped_agg.py (diff)
The file was modifiedpython/pyspark/mllib/tests/test_util.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_udf.py (diff)
The file was modifiedpython/pyspark/sql/session.py (diff)
The file was modifiedpython/pyspark/tests/test_rdd.py (diff)
The file was modifiedpython/pyspark/resource/tests/test_resources.py (diff)
The file was modifieddev/pip-sanity-check.py (diff)
The file was modifiedexamples/src/main/python/sql/hive.py (diff)
The file was modifiedpython/pyspark/tests/test_pin_thread.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_tuning.py (diff)
The file was modifiedpython/pyspark/tests/test_broadcast.py (diff)
The file was modifiedpython/pyspark/tests/test_util.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_group.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_udf.py (diff)
The file was modifiedpython/pyspark/sql/pandas/functions.py (diff)
The file was modifiedpython/pyspark/streaming/tests/test_kinesis.py (diff)
The file was modifiedpython/pyspark/streaming/tests/test_dstream.py (diff)
The file was modifiedpython/pyspark/tests/test_context.py (diff)
The file was modifiedpython/pyspark/mllib/classification.py (diff)
The file was modifiedpython/pyspark/mllib/tests/test_feature.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_context.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_udf_typehints.py (diff)
The file was modifiedpython/pyspark/ml/regression.py (diff)
The file was modifiedpython/pyspark/mllib/clustering.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_map.py (diff)
The file was modifiedpython/pyspark/mllib/tests/test_algorithms.py (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_column.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_evaluation.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_feature.py (diff)
The file was modifiedpython/pyspark/tests/test_readwrite.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_algorithms.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_utils.py (diff)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_dataframe.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_udf_window.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_param.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_training_summary.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_serde.py (diff)
The file was modifiedpython/pyspark/streaming/tests/test_listener.py (diff)
The file was modifiedpython/pyspark/tests/test_serializers.py (diff)
The file was modifiedpython/pyspark/tests/test_daemon.py (diff)
The file was modifiedpython/pyspark/tests/test_taskcontext.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_wrapper.py (diff)
The file was modifiedpython/pyspark/testing/utils.py (diff)
The file was modifiedpython/pyspark/ml/pipeline.py (diff)
The file was modifieddev/create-release/releaseutils.py (diff)
The file was modifiedpython/pyspark/mllib/regression.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_arrow.py (diff)
The file was modifiedpython/pyspark/__init__.py (diff)
The file was modifiedpython/pyspark/mllib/tests/test_streaming_algorithms.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_grouped_map.py (diff)
The file was modifiedpython/pyspark/ml/stat.py (diff)
The file was modifiedpython/pyspark/sql/context.py (diff)
The file was modifiedpython/pyspark/streaming/tests/test_context.py (diff)
The file was modifiedpython/pyspark/ml/__init__.py (diff)
The file was modifiedpython/pyspark/ml/util.py (diff)
The file was modifiedpython/pyspark/sql/avro/functions.py (diff)
The file was modifiedpython/pyspark/tests/test_rddbarrier.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_persistence.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_streaming.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_readwriter.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_datasources.py (diff)
The file was modifiedpython/pyspark/sql/functions.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_base.py (diff)
The file was modifiedpython/pyspark/ml/tests/test_stat.py (diff)
The file was modifieddev/run-tests.py (diff)
The file was modifiedpython/pyspark/tests/test_join.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_functions.py (diff)
The file was modifiedpython/pyspark/tests/test_appsubmit.py (diff)
The file was modifiedpython/pyspark/tests/test_conf.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_session.py (diff)
The file was modifiedpython/pyspark/sql/tests/test_pandas_udf_scalar.py (diff)
Commit dc3fac81848f557e3dac3f35686af325a18d0291 by dongjoon
[MINOR][DOCS] Fix typos at ExecutorAllocationManager.scala
### What changes were proposed in this pull request?
This PR fixes some typos in
<code>core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala</code>
file.
### Why are the changes needed?
<code>spark.dynamicAllocation.sustainedSchedulerBacklogTimeout</code>
(N) is used only after the
<code>spark.dynamicAllocation.schedulerBacklogTimeout</code> (M) is
exceeded.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
No test needed.
Closes #29351 from JoeyValentine/master.
Authored-by: JoeyValentine <rlaalsdn0506@naver.com> Signed-off-by:
Dongjoon Hyun <dongjoon@apache.org>
(commit: dc3fac81848f557e3dac3f35686af325a18d0291)
The file was modifiedcore/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala (diff)
Commit 8062c1f777691d74c9c75f53f0a32cd0a3005fed by dongjoon
[SPARK-32555][SQL] Add unique ID on query execution
### What changes were proposed in this pull request?
This PR adds unique ID on QueryExecution, so that listeners can leverage
the ID to deduplicate redundant calls.
### Why are the changes needed?
I've observed that Spark calls QueryExecutionListener multiple times on
same QueryExecution instance (even same funcName for onSuccess). There's
no unique ID on QueryExecution, hence it's a bit tricky if the listener
would like to deal with same query execution only once.
Note that streaming query has both query ID and run ID which can be
leveraged as unique ID.
### Does this PR introduce _any_ user-facing change?
Yes for who uses query execution listener - they'll see `id` field in
QueryExecution and leverage it.
### How was this patch tested?
Manually tested. I think the change is obvious hence don't think it
warrants a new UT. StreamingQueryListener has been using UUID as
`queryId` and `runId` so it should work for the same.
Closes #29372 from HeartSaVioR/SPARK-32555.
Authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan.opensource@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(commit: 8062c1f777691d74c9c75f53f0a32cd0a3005fed)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala (diff)
Commit 1df855bef2b2dbe330cafb0d10e0b4af813a311a by dongjoon
[SPARK-32564][SQL][TEST][FOLLOWUP] Re-enable TPCDSQuerySuite with empty
tables
### What changes were proposed in this pull request?
This is the follow-up PR of #29384 to address the cloud-fan comment:
https://github.com/apache/spark/pull/29384#issuecomment-670595111 This
PR re-enables `TPCDSQuerySuite` with empty tables for better test
coverages.
### Why are the changes needed?
For better test coverage.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests.
Closes #29391 from maropu/SPARK-32564-FOLLOWUP.
Authored-by: Takeshi Yamamuro <yamamuro@apache.org> Signed-off-by:
Dongjoon Hyun <dongjoon@apache.org>
(commit: 1df855bef2b2dbe330cafb0d10e0b4af813a311a)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/TPCDSSchema.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/TPCDSQuerySuite.scala (diff)
Commit 34d9f1cf4c97647dc7ff6cb49ff067258c91ab45 by dongjoon
[SPARK-32462][WEBUI] Reset previous search text for datatable
### What changes were proposed in this pull request?
This PR proposes to change the behavior of DataTable for stage-page and
executors-page not to save the previous search text.
### Why are the changes needed?
DataTable is used in stage-page and executors-page for pagination and
filter tasks/executors by search text. In the current implementation,
search text is saved so if we visit stage-page for a job, the previous
search text is filled in the textbox and the task table is filtered. I'm
sometimes surprised by this behavior as the stage-page lists no tasks
because tasks are filtered by the previous search text. I think, it's
not useful.
### Does this PR introduce _any_ user-facing change?
Yes. Search text is no longer saved.
### How was this patch tested?
New testcase with the following command.
```
$ build/sbt -Dguava.version=27.0-jre -Dtest.default.exclude.tags=
-Dspark.test.webdriver.chrome.driver=/path/to/chromedriver "testOnly
org.apache.spark.ui.ChromeUISeleniumSuite -- -z Search"
```
Closes #29265 from sarutak/fix-search-box.
Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by:
Dongjoon Hyun <dongjoon@apache.org>
(commit: 34d9f1cf4c97647dc7ff6cb49ff067258c91ab45)
The file was modifiedcore/src/test/scala/org/apache/spark/ui/ChromeUISeleniumSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ui/UISeleniumSuite.scala (diff)
The file was modifiedcore/src/test/scala/org/apache/spark/ui/RealBrowserUISeleniumSuite.scala (diff)
The file was modifiedcore/src/main/resources/org/apache/spark/ui/static/utils.js (diff)