1. [SPARK-30452][ML][PYSPARK] Add predict and numFeatures in Python (details)
  2. [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider (details)
  3. [SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in data (details)
Commit c88124a2460149ba6129778a332878c552791fd5 by srowen
[SPARK-30452][ML][PYSPARK] Add predict and numFeatures in Python
### What changes were proposed in this pull request? Add ```predict```
and ```numFeatures``` in Python ```IsotonicRegressionModel```
### Why are the changes needed?
```IsotonicRegressionModel``` doesn't extend ```JavaPredictionModel```,
so it doesn't get ```predict``` and ```numFeatures``` from the super
### Does this PR introduce any user-facing change? Yes. Python version
``` IsotonicRegressionModel.predict IsotonicRegressionModel.numFeatures
### How was this patch tested? doctest
Closes #27122 from huaxingao/spark-30452.
Authored-by: Huaxin Gao <> Signed-off-by: Sean Owen
The file was modifiedpython/pyspark/ml/ (diff)
Commit f8d59572b014e5254b0c574b26e101c2e4157bdd by brkyvz
[SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
### What changes were proposed in this pull request?
This PR introduces `SupportsCatalogOptions` as an interface for
`TableProvider`. Through `SupportsCatalogOptions`, V2 DataSources can
implement the two methods `extractIdentifier` and `extractCatalog` to
support the creation, and existence check of tables without requiring a
formal TableCatalog implementation.
We currently don't support all SaveModes for DataSourceV2 in The idea here is that eventually File based tables
can be written with `` will create a
PathIdentifier where the name is `path`, and the V2SessionCatalog will
be able to perform FileSystem checks at `path` to support ErrorIfExists
and Ignore SaveModes.
### Why are the changes needed?
To support all Save modes for V2 data sources with DataFrameWriter.
Since we can now support table creation, we will be able to provide
partitioning information when first creating the table as well.
### Does this PR introduce any user-facing change?
Introduces a new interface
### How was this patch tested?
Will add tests once interface is vetted.
Closes #26913 from brkyvz/catalogOptions.
Lead-authored-by: Burak Yavuz <> Co-authored-by: Burak
Yavuz <> Signed-off-by: Burak Yavuz
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala (diff)
The file was addedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was addedsql/core/src/test/scala/org/apache/spark/sql/connector/SupportsCatalogOptionsSuite.scala
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/TestV2SessionCatalogBase.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala (diff)
The file was modifiedexternal/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala (diff)
Commit c0e9f9ffb1eefa1fdbc2ba731a2f01d0e370343e by
[SPARK-30459][SQL] Fix ignoreMissingFiles/ignoreCorruptFiles in data
source v2
### What changes were proposed in this pull request?
Fix ignoreMissingFiles/ignoreCorruptFiles in DSv2:
When `FilePartitionReader` finds a missing or corrupt file, it should
just skip and continue to read next file rather than stop with current
### Why are the changes needed?
ignoreMissingFiles/ignoreCorruptFiles in DSv2 is wrong comparing to
### Does this PR introduce any user-facing change?
### How was this patch tested?
Updated existed test for `ignoreMissingFiles`. Note I didn't update
tests for `ignoreCorruptFiles`, because  there're various datasources
has tests for `ignoreCorruptFiles`. So I'm not sure if it's worth to
touch all those tests since the basic logic of `ignoreCorruptFiles`
should be same with `ignoreMissingFiles`.
Closes #27136 from Ngone51/improve-missing-files.
Authored-by: yi.wu <> Signed-off-by: Gengliang Wang
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FilePartitionReader.scala (diff)