基于 amd64 编译 Hadoop 3.3.4 Native Libraries 完整套件
后知后觉 暂无评论

官方的预构建包对很多 Native Libraries 功能扩展支持不是很完善,需要重新进行构建,本文演示 Debian 11 基于 amd64 架构环境编译。

过程

基础环境 Debian 11 (amd64) 最小化安装实例。可以手动配置使用国内源,具体可参考清华源说明,建议编译前完整升级一遍系统并重启后继续操作。

deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye main
deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye main

#deb https://security.debian.org/debian-security bullseye-security main
#deb-src https://security.debian.org/debian-security bullseye-security main

deb https://mirrors.tuna.tsinghua.edu.cn/debian-security bullseye-security main
deb-src https://mirrors.tuna.tsinghua.edu.cn/debian-security bullseye-security main

# bullseye-updates, to get updates before a point release is made;
# see https://www.debian.org/doc/manuals/debian-reference/ch02.en.html#_updates_and_backports
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-updates main
deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bullseye-updates main

# This system was installed using small removable media
# (e.g. netinst, live or single CD). The matching "deb cdrom"
# entries were disabled at the end of the installation process.
# For information about how to configure apt package sources,
# see the sources.list(5) manual.

升级完毕后检查系统版本(如果提示命令未找到,手动安装 lsb-release

$ lsb_release -a
No LSB modules are available.
Distributor ID:    Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye

环境

Oracle JDK

因为 Hadoop 及相关套件都是基于 Java 编写的,先安装基础环境。理论上 OpenJDK 亦可使用,不过谨慎起见,在 Oracle JDK 官网下载 JDK 安装包。

## 若下载版本为 8u341
sudo tar xf jdk-8u341-linux-x64.tar.gz -C /opt/
echo -e 'export JAVA_HOME="/opt/jdk1.8.0_341"\nexport PATH=$JAVA_HOME/bin:$PATH' | sudo tee /etc/profile.d/maven.sh

安装后检查版本

$ java -version
java version "1.8.0_341"
Java(TM) SE Runtime Environment (build 1.8.0_341-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.341-b10, mixed mode)

Maven

然后部署 Maven ,提供 Java 构建环境。

wget https://archive.apache.org/dist/maven/maven-3/3.8.6/binaries/apache-maven-3.8.6-bin.tar.gz
sudo tar xf apache-maven-3.8.6-bin.tar.gz -C /opt/

添加全局环境变量

cat <<"EOF" | sudo tee -a /etc/profile.d/maven.sh
export M2_HOME="/opt/apache-maven-3.8.7"
export MAVEN_HOME="/opt/apache-maven-3.8.7"
export PATH='$M2_HOME/bin:$PATH'
EOF

System Depends

然后准备 Native Libraries 编译环境,官方演示编译环境为 Ubuntu,包名相同按要求进行安装即可。

sudo apt -y install build-essential autoconf automake libtool zlib1g-dev pkg-config libssl-dev libsasl2-dev
sudo apt -y install g++-9 gcc-9

因为 Debian 官方仓库中的 cmake 版本较低,不满足要求,因此手动编译安装:

wget https://cmake.org/files/v3.21/cmake-3.21.7.tar.gz
tar xf cmake-3.21.7.tar.gz
cd cmake-3.21.7/
./bootstrap
make -j$(nproc)
sudo make install

然后安装 Native Depends

sudo apt -y install libbz2-dev \
                    libfuse-dev \
                    libprotobuf-dev \
                    libsasl2-dev \
                    libssl-dev \
                    libzstd-dev \
                    libsnappy-dev \
                    zlib1g-dev

Build Tools

接下来安装编译工具

## Protocol Buffers 3.7.1 (required to build native code)
curl -L -s -S https://github.com/protocolbuffers/protobuf/releases/download/v3.7.1/protobuf-java-3.7.1.tar.gz -o protobuf-3.7.1.tar.gz
mkdir protobuf-3.7-src
tar xf protobuf-3.7.1.tar.gz --strip-components 1 -C protobuf-3.7-src && cd protobuf-3.7-src
./configure
make -j$(nproc)
sudo make install
## Boost 1.72.0
wget https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download -O boost_1_72_0.tar.bz2
tar xf boost_1_72_0.tar.bz2
cd boost_1_72_0/
./bootstrap.sh --prefix=/usr/
./b2 --without-python
sudo ./b2 --without-python install

安装完成后检查系统组件版本

$ cmake --version
cmake version 3.21.7

CMake suite maintained and supported by Kitware (kitware.com/cmake).
$ protoc --version
libprotoc 3.7.1
小贴士:如果执行报错,需要先刷新一下库目录缓存 sudo ldconfig

构建

下载 Hadoop 3.3.4 版本源码包,解压并开始构建

wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4-src.tar.gz
tar xf hadoop-3.3.4-src.tar.gz
cd hadoop-3.3.4-src/
mvn clean package -Pdist,native, -DskipTests -Dtar -Dmaven.javadoc-skip=true -X

看到以下提示即为构建成功:

[INFO] No site descriptor found: nothing to attach.
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for Apache Hadoop Main 3.3.4:
[INFO] 
[INFO] Apache Hadoop Main ................................. SUCCESS [02:01 min]
[INFO] Apache Hadoop Build Tools .......................... SUCCESS [01:01 min]
[INFO] Apache Hadoop Project POM .......................... SUCCESS [ 37.537 s]
[INFO] Apache Hadoop Annotations .......................... SUCCESS [ 16.711 s]
[INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.047 s]
[INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [ 50.656 s]
[INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [01:33 min]
[INFO] Apache Hadoop MiniKDC .............................. SUCCESS [ 35.818 s]
[INFO] Apache Hadoop Auth ................................. SUCCESS [02:48 min]
[INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  5.573 s]
[INFO] Apache Hadoop Common ............................... SUCCESS [02:42 min]
[INFO] Apache Hadoop NFS .................................. SUCCESS [  7.504 s]
[INFO] Apache Hadoop KMS .................................. SUCCESS [  8.095 s]
[INFO] Apache Hadoop Registry ............................. SUCCESS [  5.720 s]
[INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.019 s]
[INFO] Apache Hadoop HDFS Client .......................... SUCCESS [01:41 min]
[INFO] Apache Hadoop HDFS ................................. SUCCESS [ 57.229 s]
[INFO] Apache Hadoop HDFS Native Client ................... SUCCESS [01:12 min]
[INFO] Apache Hadoop HttpFS ............................... SUCCESS [  7.278 s]
[INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [  1.293 s]
[INFO] Apache Hadoop HDFS-RBF ............................. SUCCESS [ 14.657 s]
[INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.028 s]
[INFO] Apache Hadoop YARN ................................. SUCCESS [  0.045 s]
[INFO] Apache Hadoop YARN API ............................. SUCCESS [ 21.863 s]
[INFO] Apache Hadoop YARN Common .......................... SUCCESS [ 41.097 s]
[INFO] Apache Hadoop YARN Server .......................... SUCCESS [  0.025 s]
[INFO] Apache Hadoop YARN Server Common ................... SUCCESS [ 17.717 s]
[INFO] Apache Hadoop YARN NodeManager ..................... SUCCESS [ 47.475 s]
[INFO] Apache Hadoop YARN Web Proxy ....................... SUCCESS [  4.568 s]
[INFO] Apache Hadoop YARN ApplicationHistoryService ....... SUCCESS [  7.874 s]
[INFO] Apache Hadoop YARN Timeline Service ................ SUCCESS [  3.912 s]
[INFO] Apache Hadoop YARN ResourceManager ................. SUCCESS [  9.536 s]
[INFO] Apache Hadoop YARN Server Tests .................... SUCCESS [  0.583 s]
[INFO] Apache Hadoop YARN Client .......................... SUCCESS [  5.542 s]
[INFO] Apache Hadoop YARN SharedCacheManager .............. SUCCESS [  1.138 s]
[INFO] Apache Hadoop YARN Timeline Plugin Storage ......... SUCCESS [  1.066 s]
[INFO] Apache Hadoop YARN TimelineService HBase Backend ... SUCCESS [  0.036 s]
[INFO] Apache Hadoop YARN TimelineService HBase Common .... SUCCESS [ 20.700 s]
[INFO] Apache Hadoop YARN TimelineService HBase Client .... SUCCESS [ 42.930 s]
[INFO] Apache Hadoop YARN TimelineService HBase Servers ... SUCCESS [  0.018 s]
[INFO] Apache Hadoop YARN TimelineService HBase Server 1.2  SUCCESS [  1.809 s]
[INFO] Apache Hadoop YARN TimelineService HBase tests ..... SUCCESS [ 41.403 s]
[INFO] Apache Hadoop YARN Router .......................... SUCCESS [  1.524 s]
[INFO] Apache Hadoop YARN TimelineService DocumentStore ... SUCCESS [ 41.767 s]
[INFO] Apache Hadoop YARN Applications .................... SUCCESS [  0.068 s]
[INFO] Apache Hadoop YARN DistributedShell ................ SUCCESS [  1.118 s]
[INFO] Apache Hadoop YARN Unmanaged Am Launcher ........... SUCCESS [  0.819 s]
[INFO] Apache Hadoop MapReduce Client ..................... SUCCESS [  0.194 s]
[INFO] Apache Hadoop MapReduce Core ....................... SUCCESS [  5.284 s]
[INFO] Apache Hadoop MapReduce Common ..................... SUCCESS [  2.601 s]
[INFO] Apache Hadoop MapReduce Shuffle .................... SUCCESS [  1.258 s]
[INFO] Apache Hadoop MapReduce App ........................ SUCCESS [  5.154 s]
[INFO] Apache Hadoop MapReduce HistoryServer .............. SUCCESS [  1.950 s]
[INFO] Apache Hadoop MapReduce JobClient .................. SUCCESS [  2.218 s]
[INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  0.494 s]
[INFO] Apache Hadoop YARN Services ........................ SUCCESS [  0.096 s]
[INFO] Apache Hadoop YARN Services Core ................... SUCCESS [  6.723 s]
[INFO] Apache Hadoop YARN Services API .................... SUCCESS [  0.769 s]
[INFO] Apache Hadoop YARN Application Catalog ............. SUCCESS [  0.082 s]
[INFO] Apache Hadoop YARN Application Catalog Webapp ...... SUCCESS [05:20 min]
[INFO] Apache Hadoop YARN Application Catalog Docker Image  SUCCESS [  0.023 s]
[INFO] Apache Hadoop YARN Application MaWo ................ SUCCESS [  0.082 s]
[INFO] Apache Hadoop YARN Application MaWo Core ........... SUCCESS [  1.033 s]
[INFO] Apache Hadoop YARN Site ............................ SUCCESS [  0.017 s]
[INFO] Apache Hadoop YARN Registry ........................ SUCCESS [  0.241 s]
[INFO] Apache Hadoop YARN UI .............................. SUCCESS [  0.064 s]
[INFO] Apache Hadoop YARN CSI ............................. SUCCESS [ 37.246 s]
[INFO] Apache Hadoop YARN Project ......................... SUCCESS [  7.038 s]
[INFO] Apache Hadoop MapReduce HistoryServer Plugins ...... SUCCESS [  0.807 s]
[INFO] Apache Hadoop MapReduce NativeTask ................. SUCCESS [ 15.014 s]
[INFO] Apache Hadoop MapReduce Uploader ................... SUCCESS [  0.840 s]
[INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  1.622 s]
[INFO] Apache Hadoop MapReduce ............................ SUCCESS [  2.737 s]
[INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [  6.104 s]
[INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  1.699 s]
[INFO] Apache Hadoop Client Aggregator .................... SUCCESS [  0.994 s]
[INFO] Apache Hadoop Dynamometer Workload Simulator ....... SUCCESS [  1.170 s]
[INFO] Apache Hadoop Dynamometer Cluster Simulator ........ SUCCESS [  1.490 s]
[INFO] Apache Hadoop Dynamometer Block Listing Generator .. SUCCESS [  0.930 s]
[INFO] Apache Hadoop Dynamometer Dist ..................... SUCCESS [  3.411 s]
[INFO] Apache Hadoop Dynamometer .......................... SUCCESS [  0.083 s]
[INFO] Apache Hadoop Archives ............................. SUCCESS [  0.887 s]
[INFO] Apache Hadoop Archive Logs ......................... SUCCESS [  1.075 s]
[INFO] Apache Hadoop Rumen ................................ SUCCESS [  1.840 s]
[INFO] Apache Hadoop Gridmix .............................. SUCCESS [  1.439 s]
[INFO] Apache Hadoop Data Join ............................ SUCCESS [  1.114 s]
[INFO] Apache Hadoop Extras ............................... SUCCESS [  1.082 s]
[INFO] Apache Hadoop Pipes ................................ SUCCESS [  1.907 s]
[INFO] Apache Hadoop OpenStack support .................... SUCCESS [  1.298 s]
[INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [ 33.725 s]
[INFO] Apache Hadoop Kafka Library support ................ SUCCESS [  7.707 s]
[INFO] Apache Hadoop Azure support ........................ SUCCESS [  8.595 s]
[INFO] Apache Hadoop Aliyun OSS support ................... SUCCESS [ 15.269 s]
[INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  1.651 s]
[INFO] Apache Hadoop Resource Estimator Service ........... SUCCESS [  9.306 s]
[INFO] Apache Hadoop Azure Data Lake support .............. SUCCESS [ 13.255 s]
[INFO] Apache Hadoop Image Generation Tool ................ SUCCESS [  1.232 s]
[INFO] Apache Hadoop Tools Dist ........................... SUCCESS [  9.532 s]
[INFO] Apache Hadoop Tools ................................ SUCCESS [  0.017 s]
[INFO] Apache Hadoop Client API ........................... SUCCESS [01:16 min]
[INFO] Apache Hadoop Client Runtime ....................... SUCCESS [01:02 min]
[INFO] Apache Hadoop Client Packaging Invariants .......... SUCCESS [  3.033 s]
[INFO] Apache Hadoop Client Test Minicluster .............. SUCCESS [01:46 min]
[INFO] Apache Hadoop Client Packaging Invariants for Test . SUCCESS [  0.076 s]
[INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [  5.577 s]
[INFO] Apache Hadoop Distribution ......................... SUCCESS [ 26.695 s]
[INFO] Apache Hadoop Client Modules ....................... SUCCESS [  0.040 s]
[INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [  0.171 s]
[INFO] Apache Hadoop Tencent COS Support .................. SUCCESS [  5.319 s]
[INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [  0.014 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  38:03 min
[INFO] Finished at: 2022-12-24T10:47:07+08:00
[INFO] ------------------------------------------------------------------------

生成的安装包在 hadoop-dist/target/ 目录下 hadoop-3.3.4.tar.gz 为最终编译的产品包。


可选项

ISA-L Support

ISA-L (Intelligent Storage Acceleration Library) 是 Intel 开发的智能存储加速库,可以为 HDFS 提高性能,可以在 ARMv8(aarch64) 和 AMD64(x86_64) 架构上编译。

此组件在编译时会自动检测并添加支持,因此请在编译前按下述步骤安装 ISA-L 库,然后进行编译即可原生支持 ISA-L。

## 安装依赖
sudo apt -y install nasm help2man libtool
## 克隆源码
git clone https://github.com/intel/isa-l
cd isa-l/
./autogen.sh
./configure --prefix=/usr --libdir=/usr/lib
make
sudo make install

PMDK Support

PMDK(Persistent Memory Development Kit) 利用 PMDK 用户态编程库进行数据读写,减小用户态、内核态切换与文件系统开销,提高集群的读写性能。

PMDK 扩展与其他扩展不同,即便系统内检测到相关依赖库也不会默认编译支持,需要在编译时增加参数重新进行编译,增加支持,安装所需依赖。

然后使用命令进行编译

mvn clean package -Pdist,native, -DskipTests -Dtar -Dmaven.javadoc-skip=true -Drequire.pmdk -X

编译后使用新安装包部署后重新执行检查

$ hadoop checknative
2022-12-25 02:10:45,702 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
2022-12-25 02:10:45,703 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
2023-01-03 02:10:45,729 INFO nativeio.NativeIO: The native code was built with PMDK support, and PMDK libs were loaded successfully.
Native library checking:
hadoop:  true /opt/hadoop-3.3.4/lib/native/libhadoop.so.1.0.0
zlib:    true /lib/x86_64-linux-gnu/libz.so.1
zstd  :  true /lib/x86_64-linux-gnu/libzstd.so.1
bzip2:   true /lib/x86_64-linux-gnu/libbz2.so.1
openssl: true /lib/x86_64-linux-gnu/libcrypto.so.1.1
ISA-L:   true /lib/libisal.so.2
PMDK:    true /usr/lib/x86_64-linux-gnu/libpmem.so.1.0.0

常见问题

a) 提示部分库下载失败

b) help2man: command not found

编译 isa-l 安装时报错如下:

/bin/bash: line 1: help2man: command not found
make[2]: [Makefile:4791: programs/igzip.1] Error 127 (ignored)

这是因为没有按照要求安装依赖,导致缺失 help2man ,手动补充安装。

c) libisal.so.2: cannot open shared object file

编译安装 isa-l 后依然提示 libisal.so.2: cannot open shared object file: No such file or directory

这是因为在 Debian 和 RHEL 系列的发行版内 64 位库文件的默认存放位置不同,Debian 在 /lib/ 内,RedHat 在 /lib64/ 内,手动软链一下即可:

sudo ln -s /usr/lib64/libisal.so.2 /usr/lib/libisal.so.2

d) checknative 检查原生组件部分组件报 false

比如

$ hadoop checknative
Native library checking:
hadoop:  true /opt/hadoop-3.3.4/lib/native/libhadoop.so.1.0.0
zlib:    true /lib/x86_64-linux-gnu/libz.so.1
zstd  :  true /lib/x86_64-linux-gnu/libzstd.so.1
bzip2:   true /lib/x86_64-linux-gnu/libbz2.so.1
openssl: false Cannot load libcrypto.so (libcrypto.so: cannot open shared object file: No such file or directory)!
ISA-L:   false Loading ISA-L failed: Failed to load libisal.so.2 (libisal.so.2: cannot open shared object file: No such file or directory)
PMDK:    false The native code was built without PMDK support.

其中的组件及其对应的包名如下表:

Object NamePackage NameSource Name
zlibzlib1g-dev/
zstdlibzstd-dev/
bzip2libbz2-dev/
openssllibssl-dev/
ISA-L/https://github.com
PMDK/https://pmem.io

附录

参考链接

本文撰写于一年前,如出现图片失效或有任何问题,请在下方留言。博主看到后将及时修正,谢谢!
禁用 / 当前已拒绝评论,仅可查看「历史评论」。