1. [SPARK-25684][SQL] Organize header related codes in CSV datasource (details)
Commit 39872af882e3d73667acfab93c9de962c9c8939d by hyukjinkwon
[SPARK-25684][SQL] Organize header related codes in CSV datasource
## What changes were proposed in this pull request?
1. Move `CSVDataSource.makeSafeHeader` to `CSVUtils.makeSafeHeader` (as
    - Historically and at the first place of refactoring (which I did),
I intended to put all CSV specific handling (like options), filtering,
extracting header, etc.
    - See `JsonDataSource`. Now `CSVDataSource` is quite consistent with
`JsonDataSource`. Since CSV's code path is quite complicated, we might
better match them as possible as we can.
2. Create `CSVHeaderChecker` and put `enforceSchema` logics into that.
    - The checking header and column pruning stuff were added (per and but some of codes such as are duplicated
    - Also, checking header code is basically here and there. We better
put them in a single place, which was quite error-prone. See
3. Move `CSVDataSource.checkHeaderColumnNames` to
`CSVHeaderChecker.checkHeaderColumnNames` (as is).
    - Similar reasons above with 1.
## How was this patch tested?
Existing tests should cover this.
Closes #22676 from HyukjinKwon/refactoring-csv.
Authored-by: hyukjinkwon <> Signed-off-by:
hyukjinkwon <>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVUtils.scala (diff)
The file was addedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVHeaderChecker.scala
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala (diff)
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVFileFormat.scala (diff)