Use lzo compressor/decompressor in Apache Spark 2.x/3.x
Below are the steps to include lzo in the spark application
1. Clone the hadoop-lzo project from Github
git clone https://github.com/twitter/hadoop-lzo.git
cd hadoop-lzo/
2. Package the project using maven
mvn clean package -Dmaven.test.skip=true
3. This will create a jar file in the target folder of the project
e.g. hadoop-lzo-0.4.21-SNAPSHOT.jar
4. Include the above jar as a dependency in the spark application like below
libraryDependencies += "com.hadoop.gplcompression" % "hadoop-lzo" % "0.4.21-SNAPSHOT" from "file:///Users/xyz/hadoop-lzo/target/hadoop-lzo-0.4.21-SNAPSHOT.jar"
The above steps can solve the below exceptions
1. java.lang.RuntimeException: native-lzo library not available
2. java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
Comments
Post a Comment