SuccessChanges

Summary

  1. [SPARK-27990][SPARK-29903][PYTHON] Add recursiveFileLookup option to (commit: 3dd3a623f293bc7fd4937c95f06b967fa187b0f1) (details)
  2. [SPARK-30109][ML] PCA use BLAS.gemv for sparse vectors (commit: 5496e980e9a4dc20e84db1fa6c4b5426dce60b19) (details)
  3. [SPARK-30111][K8S] Apt-get update to fix debian issues (commit: 708cf16be9bb131ba8980c494ca497a4d187e160) (details)
Commit 3dd3a623f293bc7fd4937c95f06b967fa187b0f1 by gurwls223
[SPARK-27990][SPARK-29903][PYTHON] Add recursiveFileLookup option to
Python DataFrameReader
### What changes were proposed in this pull request?
As a follow-up to #24830, this PR adds the `recursiveFileLookup` option
to the Python DataFrameReader API.
### Why are the changes needed?
This PR maintains Python feature parity with Scala.
### Does this PR introduce any user-facing change?
Yes.
Before this PR, you'd only be able to use this option as follows:
```python spark.read.option("recursiveFileLookup",
True).text("test-data").show()
```
With this PR, you can reference the option from within the
format-specific method:
```python spark.read.text("test-data", recursiveFileLookup=True).show()
```
This option now also shows up in the Python API docs.
### How was this patch tested?
I tested this manually by creating the following directories with dummy
data:
``` test-data
├── 1.txt
└── nested
  └── 2.txt test-parquet
├── nested
│  ├── _SUCCESS
│  ├── part-00000-...-.parquet
├── _SUCCESS
├── part-00000-...-.parquet
```
I then ran the following tests and confirmed the output looked good:
```python spark.read.parquet("test-parquet",
recursiveFileLookup=True).show() spark.read.text("test-data",
recursiveFileLookup=True).show() spark.read.csv("test-data",
recursiveFileLookup=True).show()
```
`python/pyspark/sql/tests/test_readwriter.py` seems pretty sparse. I'm
happy to add my tests there, though it seems we have been deferring
testing like this to the Scala side of things.
Closes #26718 from nchammas/SPARK-27990-recursiveFileLookup-python.
Authored-by: Nicholas Chammas <nicholas.chammas@gmail.com>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(commit: 3dd3a623f293bc7fd4937c95f06b967fa187b0f1)
The file was modifiedpython/pyspark/sql/readwriter.py (diff)
The file was modifiedpython/pyspark/sql/streaming.py (diff)
Commit 5496e980e9a4dc20e84db1fa6c4b5426dce60b19 by ruifengz
[SPARK-30109][ML] PCA use BLAS.gemv for sparse vectors
### What changes were proposed in this pull request? When PCA was first
impled in
[SPARK-5521](https://issues.apache.org/jira/browse/SPARK-5521), at that
time Matrix.multiply(BLAS.gemv internally) did not support sparse
vector. So worked around it by applying a sparse matrix multiplication.
Since [SPARK-7681](https://issues.apache.org/jira/browse/SPARK-7681),
BLAS.gemv supported sparse vector. So we can directly use
Matrix.multiply now.
### Why are the changes needed? for simplity
### Does this PR introduce any user-facing change? No
### How was this patch tested? existing testsuites
Closes #26745 from zhengruifeng/pca_mul.
Authored-by: zhengruifeng <ruifengz@foxmail.com> Signed-off-by:
zhengruifeng <ruifengz@foxmail.com>
(commit: 5496e980e9a4dc20e84db1fa6c4b5426dce60b19)
The file was modifiedmllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala (diff)
The file was modifiedmllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala (diff)
Commit 708cf16be9bb131ba8980c494ca497a4d187e160 by incomplete
[SPARK-30111][K8S] Apt-get update to fix debian issues
### What changes were proposed in this pull request? Added apt-get
update as per [docker
best-practices](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#apt-get)
### Why are the changes needed? Builder is failing because: Without
doing apt-get update, the APT lists get outdated and begins referring to
package versions that no longer exist, hence the 404 trying to download
them (Debian does not keep old versions in the archive when a package is
updated).
### Does this PR introduce any user-facing change? no
### How was this patch tested? k8s builder
Closes #26753 from ifilonenko/SPARK-30111.
Authored-by: Ilan Filonenko <ifilonenko@bloomberg.net> Signed-off-by:
shane knapp <incomplete@gmail.com>
(commit: 708cf16be9bb131ba8980c494ca497a4d187e160)
The file was modifiedresource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/R/Dockerfile (diff)
The file was modifiedresource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile (diff)