Use lzo compressor/decompressor in Apache Spark 2.x/3.x

Posted by Sakina Shaikh August 27, 2020

Use lzo compressor/decompressor in Apache Spark 2.x/3.x

Below are the steps to include lzo in the spark application

1. Clone the hadoop-lzo project from Github

git clone https://github.com/twitter/hadoop-lzo.git

cd hadoop-lzo/

2. Package the project using maven

mvn clean package -Dmaven.test.skip=true

3. This will create a jar file in the target folder of the project

e.g. hadoop-lzo-0.4.21-SNAPSHOT.jar

4. Include the above jar as a dependency in the spark application like below

libraryDependencies += "com.hadoop.gplcompression" % "hadoop-lzo" % "0.4.21-SNAPSHOT" from "file:///Users/xyz/hadoop-lzo/target/hadoop-lzo-0.4.21-SNAPSHOT.jar"

The above steps can solve the below exceptions

1. java.lang.RuntimeException: native-lzo library not available

2. java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path

Search This Blog

Java Development Support Zone

Use lzo compressor/decompressor in Apache Spark 2.x/3.x

Comments

Post a Comment

Popular Posts

Session Replication in Wildfly 10

Working with json using EvaluateJsonPath in Apache Nifi