SuccessChanges

Summary

  1. [SPARK-32364][SQL][2.4] Use CaseInsensitiveMap for (details)
Commit 6a653a2faeb05c1d0f91cbbcaf3c8e37b0d6e0bc by dongjoon
[SPARK-32364][SQL][2.4] Use CaseInsensitiveMap for
DataFrameReader/Writer options
### What changes were proposed in this pull request?
This PR is a backport of SPARK-32364
(https://github.com/apache/spark/pull/29160,
https://github.com/apache/spark/pull/29191).
When a user have multiple options like `path`, `paTH`, and `PATH` for
the same key `path`, `option/options` is indeterministic because
`extraOptions` is `HashMap`. This PR aims to use `CaseInsensitiveMap`
instead of `HashMap` to fix this bug fundamentally.
Like the following, DataFrame's `option/options` have been
non-deterministic in terms of case-insensitivity because it stores the
options at `extraOptions` which is using `HashMap` class.
```scala spark.read
.option("paTh", "1")
.option("PATH", "2")
.option("Path", "3")
.option("patH", "4")
.load("5")
... org.apache.spark.sql.AnalysisException: Path does not exist:
file:/.../1;
```
Also, this PR adds the following.
1. Add an explicit document to `DataFrameReader/DataFrameWriter`.
2. Add `toMap` to `CaseInsensitiveMap` in order to return `originalMap:
Map[String, T]` because it's more consistent with the existing
`case-sensitive key names` behavior for the existing code pattern like
`AppendData.byName(..., extraOptions.toMap)`. Previously, it was
`HashMap.toMap`.
3. During (2), we need to change the following to keep the original
logic using `CaseInsensitiveMap.++`.
```scala
- val params = extraOptions.toMap ++ connectionProperties.asScala.toMap
+ val params = extraOptions ++ connectionProperties.asScala
```
4. Additionally, use `.toMap` in the following because
`dsOptions.asCaseSensitiveMap()` is used later.
```scala
- val options = sessionOptions ++ extraOptions
+ val options = sessionOptions.filterKeys(!extraOptions.contains(_)) ++
extraOptions.toMap
val dsOptions = new CaseInsensitiveStringMap(options.asJava)
```
`extraOptions.toMap` is used in several places (e.g. `DataFrameReader`)
to hand over `Map[String, T]`. In this case, `CaseInsensitiveMap[T]
private (val originalMap: Map[String, T])` had better return
`originalMap`.
### Why are the changes needed?
This will fix indeterministic behavior.
### Does this PR introduce _any_ user-facing change?
Yes.
### How was this patch tested?
Pass the Jenkins with the existing tests and newly add test cases.
Closes #29209 from dongjoon-hyun/SPARK-32364-2.4.
Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon
Hyun <dongjoon@apache.org>
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala (diff)
The file was modifiedsql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/CaseInsensitiveMap.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)