FailedChanges

Summary

  1. [SPARK-33623][SQL] Add canDeleteWhere to SupportsDelete (details)
  2. [SPARK-33634][SQL][TESTS] Use Analyzer in PlanResolutionSuite (details)
Commit aa13e207c9091e24aae1edcf3bb5cd35d3a27cbb by dongjoon
[SPARK-33623][SQL] Add canDeleteWhere to SupportsDelete

### What changes were proposed in this pull request?

This PR provides us with a way to check if a data source is going to reject the delete via `deleteWhere` at planning time.

### Why are the changes needed?

The only way to support delete statements right now is to implement ``SupportsDelete``. According to its Javadoc, that interface is meant for cases when we can delete data without much effort (e.g. like deleting a complete partition in a Hive table).

This PR actually provides us with a way to check if a data source is going to reject the delete via `deleteWhere` at planning time instead of just getting an exception during execution. In the future, we can use this functionality to decide whether Spark should rewrite this delete and execute a distributed query or it can just pass a set of filters.

Consider an example of a partitioned Hive table. If we have a delete predicate like `part_col = '2020'`, we can just drop the matching partition to satisfy this delete. In this case, the data source should return `true` from `canDeleteWhere` and use the filters it accepts in `deleteWhere` to drop the partition. I consider this as a delete without significant effort. At the same time, if we have a delete predicate like `id = 10`, Hive tables would not be able to execute this delete using a metadata only operation without rewriting files. In that case, the data source should return `false` from `canDeleteWhere` and we should use a more sophisticated row-level API to find out which records should be removed (the API is yet to be discussed, but we need this PR as a basis).

If we decide to support subqueries and all delete use cases by simply extending the existing API, this will mean all data sources will have to implement a lot of Spark logic to determine which records changed. I don't think we want to go that way as the Spark logic to determine which records should be deleted is independent of the underlying data source. So the assumption is that Spark will execute a plan to find which records must be deleted for data sources that return `false` from `canDeleteWhere`.
### Does this PR introduce _any_ user-facing change?

Yes but it is backward compatible.

### How was this patch tested?

This PR comes with a new test.

Closes #30562 from aokolnychyi/spark-33623.

Authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala (diff)
The file was modifiedsql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsDelete.java (diff)
The file was modifiedsql/catalyst/src/test/scala/org/apache/spark/sql/connector/InMemoryTable.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala (diff)
Commit 63f9d474b9ec4b66741fcca1d3c3865c32936a85 by dongjoon
[SPARK-33634][SQL][TESTS] Use Analyzer in PlanResolutionSuite

### What changes were proposed in this pull request?

Instead of using several analyzer rules, this PR uses the actual analyzer to run tests in `PlanResolutionSuite`.

### Why are the changes needed?

Make the test suite to match reality.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

test-only

Closes #30574 from cloud-fan/test.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala (diff)