1. [SPARK-25669][SQL] Check CSV header only when it exists (details)
Commit 46fe40838aa682a7073dd6f1373518b0c8498a94 by hyukjinkwon
[SPARK-25669][SQL] Check CSV header only when it exists
## What changes were proposed in this pull request?
Currently the first row of dataset of CSV strings is compared to field
names of user specified or inferred schema independently of presence of
CSV header. It causes false-positive error messages. For example,
parsing `"1,2"` outputs the error:
```java java.lang.IllegalArgumentException: CSV header does not conform
to the schema.
Header: 1, 2
Schema: _c0, _c1 Expected: _c0 but found: 1
In the PR, I propose:
- Checking CSV header only when it exists
- Filter header from the input dataset only if it exists
## How was this patch tested?
Added a test to `CSVSuite` which reproduces the issue.
Closes #22656 from MaxGekk/inferred-header-check.
Authored-by: Maxim Gekk <> Signed-off-by:
hyukjinkwon <>
The file was modifiedsql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala (diff)
The file was modifiedsql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala (diff)