Apache Hive 是一个可实现大规模分析的分布式容错数据仓库系统。本文记录一下旧版本的安装和部署流程。
Hive
配置后端数据库在所有版本上操作一致。
配置后端存储
Hive 为数据创建的索引信息和一些其他数据需要存储在后端数据库中,数据库允许使用多种类型的关系型数据库,既可外置,也可使用内置,因 MySQL 数据库的 SQL 较为简单,并且受众较多,因此使用 MySQL 为例。
安装 MySQL 5.7 服务端(其他版本也可以,本文以 MySQL 5.7 为例,可以在官网下载安装包。)
sudo dnf install mysql-server
启动数据库
sudo systemctl start mysqld
sudo systemctl enable mysqld
登录数据库(默认如果没有密码,按两次回车即可。)
mysql -uroot -p
小贴士:后续的版本密码在启动服务时会随机生成,可以使用命令获取。
sudo grep -i 'password' /var/log/mysqld.log
执行 SQL 命令创建 Hive 存储所使用的数据库和专用用户,不建议配置及使用 root 用户。
> CREATE USER 'hive'@'%' IDENTIFIED BY 'Uos@2022';
> GRANT ALL ON *.* TO 'hive'@'%' IDENTIFIED BY 'Uos@2022';
> FLUSH PRIVILEGES;
常见问题
a) 如果错误提示如下
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement.
这是因为官方源安装的 MySQL 需要先重置默认密码,然后才能执行命令
SET PASSWORD = PASSWORD('Uosuos123!');
b) 如果错误提示如下
ERROR 1819 (HY000): Your password does not satisfy the current policy requirements
这是因为官方默认的密码策略较为严格,需要包含大小写字母和半角符号,请修改密码为高强度密码。
下载
下载 Hive 2.3.9 Stable 版本
wget https://dlcdn.apache.org/hive/hive-2.3.9/apache-hive-2.3.9-bin.tar.gz
解压
sudo tar xf apache-hive-2.3.9-bin.tar.gz -C /opt/
授权
sudo chown -R $USER:$USER /opt/apache-hive-2.3.9-bin/
下载 MySQL 的 Java-Connector 连接驱动
cd $HIVE_HOME/lib
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.18/mysql-connector-java-8.0.18.jar
配置环境变量
sudo vim /etc/profile.d/hadoop.sh
## 加入以下内容
export HIVE_HOME=/opt/apache-hive-2.3.9-bin
export PATH=$HIVE_HOME/bin:$PATH
生效环境变量或者重新登录终端
## 重载环境变量
source /etc/profile
配置 Hive
创建 Hive 配置文件
cd $HIVE_HOME
vim conf/hive-site.xml
写入以下内容
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop1:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false&serverTimezone=GMT</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>Uos@2022</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.server2.webui.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.server2.webui.port</name>
<value>10002</value>
</property>
<property>
<name>hive.server2.webui.max.threads</name>
<value>50</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.schema.verification.record.version</name>
<value>false</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.metastore.port</name>
<value>9083</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop1:9083</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
</property>
</configuration>
初始化
创建 HDFS 的 Hive 临时目录和存储目录,并授权
hdfs dfs -mkdir /tmp
hdfs dfs -chmod g+w /tmp
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -chmod g+w /user/hive/warehouse
检查 Hive 的 HDFS 路径是否创建成功
$ hdfs dfs -ls /
Found 2 items
drwxrwxr-x - uos supergroup 0 2022-12-02 12:48 /tmp
drwxr-xr-x - uos supergroup 0 2022-12-02 12:48 /user
初始化 Hive 存储
schematool -initSchema -dbType mysql
然后会看到大量的提示信息,稍等片刻后显示下面的关键字即为初始化数据成功。
Initialization script completed
schemaTool completed
启动服务
启动服务(调试状态下可以多创建两个窗口执行)
hive --service metastore
hive --service hiveserver2
测试无误后,可以通过命令后台执行
nohup hive --service metastore > /dev/null 2>&1 &
nohup hive --service hiveserver2 > /dev/null 2>&1 &
测试服务
测试 hive-cli 连接是否正常(双井号后是实际执行的 hive 命令,注意甄别)
## 进入 cli
$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.9-bin/lib/hive-common-2.3.9.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive (default)> [kane@hadoop1 apache-hive-2.3.9-bin]$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.9-bin/lib/hive-common-2.3.9.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
## 查询表
hive (default)> show databases;
OK
database_name
default
Time taken: 0.473 seconds, Fetched: 1 row(s)
测试 beeline 连接是否正常
$ beeline shell
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.9-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.9 by Apache Hive
## 连接数据库(替换地址为集群实际 hadoop1 的地址)
beeline> !connect jdbc:hive2://172.16.16.231:10000
Connecting to jdbc:hive2://172.16.16.231:10000
Enter username for jdbc:hive2://172.16.16.231:10000: hive
Enter password for jdbc:hive2://172.16.16.231:10000: **********
Connected to: Apache Hive (version 2.3.9)
Driver: Hive JDBC (version 2.3.9)
Transaction isolation: TRANSACTION_REPEATABLE_READ
## 查询表
0: jdbc:hive2://172.16.16.231:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
+----------------+
1 row selected (0.6 seconds)
使用浏览器访问 http://hadoop1:10002
可以看到 HiveServer2 管理页面
附录
参考链接
- Installing Hive from a Stable Release - Apache Hive
- Step By Step Guide on How to install Hive on Ubuntu
本文由 柒 创作,采用 知识共享署名4.0
国际许可协议进行许可。
转载本站文章前请注明出处,文章作者保留所有权限。
最后编辑时间: 2022-12-02 12:09 PM