SuccessChanges

Summary

  1. [SPARK-25539][BUILD] Upgrade lz4-java to 1.5.0 get speed improvement (details)
Commit fba722e319e356113a69c54f59e23150017634ae by sean.owen
[SPARK-25539][BUILD] Upgrade lz4-java to 1.5.0 get speed improvement
## What changes were proposed in this pull request?
This PR upgrade `lz4-java` to 1.5.0 get speed improvement.
**General speed improvements**
LZ4 decompression speed has always been a strong point. In v1.8.2, this
gets even better, as it improves decompression speed by about 10%,
thanks in a large part to suggestion from svpv .
For example, on a Mac OS-X laptop with an Intel Core i7-5557U CPU
3.10GHz, running lz4 -bsilesia.tar compiled with default compiler llvm
v9.1.0:
Version | v1.8.1 | v1.8.2 | Improvement
-- | -- | -- | -- Decompression speed | 2490 MB/s | 2770 MB/s | +11%
Compression speeds also receive a welcomed boost, though improvement is
not evenly distributed, with higher levels benefiting quite a lot more.
Version | v1.8.1 | v1.8.2 | Improvement
-- | -- | -- | -- lz4 -1 | 504 MB/s | 516 MB/s | +2% lz4 -9 | 23.2 MB/s
| 25.6 MB/s | +10% lz4 -12 | 3.5 Mb/s | 9.5 MB/s | +170%
More details: https://github.com/lz4/lz4/releases/tag/v1.8.3
**Below is my benchmark result** set
`spark.sql.parquet.compression.codec` to `lz4` and disable orc
benchmark, then run `FilterPushdownBenchmark`. lz4-java 1.5.0:
```
[success] Total time: 5585 s, completed Sep 26, 2018 5:22:16 PM
``` lz4-java 1.4.0:
```
[success] Total time: 5591 s, completed Sep 26, 2018 5:22:24 PM
``` Some benchmark result:
``` lz4-java 1.5.0 Select 1 row with 500 filters:           Best/Avg
Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized                            1953 / 1980          0.0
1952502908.0       1.0X Parquet Vectorized (Pushdown)               
2541 / 2585          0.0  2541019869.0       0.8X
lz4-java 1.4.0 Select 1 row with 500 filters:           Best/Avg
Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized                            1979 / 2103          0.0
1979328144.0       1.0X Parquet Vectorized (Pushdown)               
2596 / 2909          0.0  2596222118.0       0.8X
``` Complete benchmark result:
https://issues.apache.org/jira/secure/attachment/12941360/FilterPushdownBenchmark-lz4-java-140-results.txt
https://issues.apache.org/jira/secure/attachment/12941361/FilterPushdownBenchmark-lz4-java-150-results.txt
## How was this patch tested?
manual tests
Closes #22551 from wangyum/SPARK-25539.
Authored-by: Yuming Wang <yumwang@ebay.com> Signed-off-by: Sean Owen
<sean.owen@databricks.com>
The file was modifiedpom.xml (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.6 (diff)
The file was modifieddev/deps/spark-deps-hadoop-2.7 (diff)
The file was modifieddev/deps/spark-deps-hadoop-3.1 (diff)