hadoop安装配置 Linux安装使用vagrant
vagrantfile
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Vagrant.configure("2") do |config| (1..3).each do |i| config.vm.define "hadoop-node#{i}" do |node| node.vm.box="centos7" node.vm.hostname="hadoop-node#{i}" node.vm.network "private_network", ip: "192.168.33.#{110+i}", netmask: "255.255.255.0" node.vm.provider "virtualbox" do |v| v.name = "hadoop-node#{i}" v.memory = 4096 v.cpus = 4 end end end end
集群规划
框架
hadoop-node1
hadoop-node2
hadoop-node3
HDFS
NameNode、DataNode
DataNode
SecondaryNameNode、DataNode
YARN
NodeManager
NodeManager
NodeManager、ResourceManager
集群配置 Hadoop集群配置 = HDFS集群配置 + MapReduce集群配置 + Yarn集群配置
HDFS集群配置
将JDK路径明确配置给HDFS(修改hadoop-env.sh
指定NameNode节点以及数据存储目录(修改core-site.xml)
指定SecondaryNameNode节点(修改hdfs-site.xml)
指定DataNode从节点(修改etc/hadoop/slaves文件,每个节点配置信息占一行)
MapReduce集群配置
将JDK路径明确配置给MapReduce(修改mapred-env.sh)
指定MapReduce计算框架运行Yarn资源调度框架(修改mapred-site.xml)
Yarn集群配置
将JDK路径明确配置给Yarn(修改yarn-env.sh)
指定ResourceManager老大节点所在计算机节点(修改yarn-site.xml)
指定NodeManager节点(会通过slaves文件内容确定)
HDFS集群配置
配置:hadoop-env.sh,将JDK路径明确配置给HDFS
指定NameNode节点以及数据存储目录(修改core-site.xml)
1 2 3 4 5 6 7 8 9 10 11 <property > <name > fs.defaultFS</name > <value > hdfs://hadoop-node1:9000</value > </property > <property > <name > hadoop.tmp.dir</name > <value > /opt/servers/hadoop-2.9.2/data/tmp</value > </property >
指定secondarynamenode节点(修改hdfs-site.xml)
1 2 3 4 5 6 7 8 9 10 11 <property > <name > dfs.namenode.secondary.http-address</name > <value > hadoop-node3:50090</value > </property > <property > <name > dfs.replication</name > <value > 3</value > </property >
指定datanode从节点(修改slaves文件,每个节点配置信息占一行)
1 2 3 hadoop-node1 hadoop-node2 hadoop-node3
MapReduce集群配置
指定MapReduce使用的jdk路径(修改mapred-env.sh)
指定MapReduce计算框架运行Yarn资源调度框架(修改mapred-site.xml)
1 2 3 4 5 6 <property > <name > mapreduce.framework.name</name > <value > yarn</value > </property >
Yarn集群配置
指定JDK路径
指定ResourceMnager的master节点信息(修改yarn-site.xml)
1 2 3 4 5 6 7 8 9 10 <property > <name > yarn.resourcemanager.hostname</name > <value > hadoop-node3</value > </property > <property > <name > yarn.nodemanager.aux-services</name > <value > mapreduce_shuffle</value > </property >
指定NodeManager节点(slaves文件已修改)
配置历史服务器
1 2 3 4 5 6 7 8 9 10 11 <property > <name > mapreduce.jobhistory.address</name > <value > hadoop-node1:10020</value > </property > <property > <name > mapreduce.jobhistory.webapp.address</name > <value > hadoop-node1:19888</value > </property >
1 sbin/mr-jobhistory-daemon.sh start historyserver
配置日志的聚集 日志聚集:应用(Job)运行完成以后,将应用运行日志信息从各个task汇总上传到HDFS系统上。
日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。
注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和 HistoryManager。
1 2 3 4 5 6 7 8 9 10 11 <property > <name > yarn.log-aggregation-enable</name > <value > true</value > </property > <property > <name > yarn.log-aggregation.retain-seconds</name > <value > 604800</value > </property >
hive配置 先安装mysql
创建hive用户
1 2 3 4 CREATE USER 'hive' @'%' IDENTIFIED BY '12345678' ;GRANT ALL ON * .* TO 'hive' @'%' ;FLUSH PRIVILEGES;
hive配置:
vim hive-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <?xml-stylesheet type="text/xsl" href="configuration.xsl" ?> <configuration > <property > <name > javax.jdo.option.ConnectionURL</name > <value > jdbc:mysql://hadoop-node3:3306/hivemetadata?createDatabaseIfNotExist=true& useSSL=false</value > <description > JDBC connect string for a JDBC metastore</description > </property > <property > <name > javax.jdo.option.ConnectionDriverName</name > <value > com.mysql.jdbc.Driver</value > <description > Driver class name for a JDBC metastore</description > </property > <property > <name > javax.jdo.option.ConnectionUserName</name > <value > hive</value > <description > username to use against metastore database</description > </property > <property > <name > javax.jdo.option.ConnectionPassword</name > <value > 123456</value > <description > password to use against metastore database</description > </property > <property > <name > hive.metastore.warehouse.dir</name > <value > /user/hive/warehouse</value > <description > location of default database for the warehouse</description > </property > <property > <name > hive.cli.print.current.db</name > <value > true</value > <description > Whether to include the current database in the Hive prompt.</description > </property > <property > <name > hive.cli.print.header</name > <value > true</value > </property > <property > <name > hive.exec.mode.local.auto</name > <value > true</value > <description > Let Hive determine whether to run in local mode automatically</description > </property > </configuration >
将MySQL jdbc驱动拷贝到hive的lib下
初始化元数据:
1 schematool -dbType mysql -initSchema
安装hue
编译安装
到官方网站下载 hue-release-4.3.0.zip;上传至服务器,并解压缩
1 2 yum install unzip unzip hue-release-4.3.0.zip
安装依赖
1 2 3 4 5 6 python --version yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel yum install -y rsync
备注:
以上依赖仅适用CentOS/RHEL 7.X,其他情况请参考https://docs.gethue.com/administrator/installation/dependencies/ 安装Hue的节点上最好没有安装过MySQL,否则可能有版本冲突 安装过程中需要联网,网络不好会有各种奇怪的问题
编译 Hue 还需要 Maven 环境,因此在编译前需要安装 Maven。 下载 apache-maven-3.6.3-bin.tar.gz,上传虚拟机解压缩,添加环境变量
1 2 3 4 5 6 7 8 9 10 vim /etc/profile export MAVEN_HOME=/opt/lagou/servers/apache-maven-3.6.3export PATH=$PATH :$MAVEN_HOME /binsource /etc/profilemvn --version
编译
1 2 3 4 5 6 7 8 9 10 11 cd /opt/software/hue-release-4.3.0PREFIX=/opt/lagou/servers make install cd /opt/lagou/serversrm app.regrm -r buildmake apps
修改hadoop配置
在hdfs-site.xml增加配置
1 2 3 4 5 6 7 8 9 <property > <name > dfs.webhdfs.enabled</name > <value > true</value > </property > <property > <name > dfs.permissions.enabled</name > <value > false</value > </property >
在core-site.xml增加配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 <property > <name > hadoop.proxyuser.hue.hosts</name > <value > *</value > </property > <property > <name > hadoop.proxyuser.hue.groups</name > <value > *</value > </property > <property > <name > hadoop.proxyuser.hdfs.hosts</name > <value > *</value > </property > <property > <name > hadoop.proxyuser.hdfs.groups</name > <value > *</value > </property >
编辑httpfs-site.xml
1 2 3 4 5 6 7 8 9 10 11 12 <configuration > <property > <name > httpfs.proxyuser.hue.hosts</name > <value > *</value > </property > <property > <name > httpfs.proxyuser.hue.groups</name > <value > *</value > </property > </configuration >
hue配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 cd /opt/servers/huecd desktop/confcp pseudo-distributed.ini.tmpl pseudo-distributed.inivim pseudo-distributed.ini http_host=hadoop-node2 http_port=8000 is_hue_4=true time_zone=Asia/Shanghai dev=true server_user=hue server_group=hue default_user=hue app_blacklist=search engine=mysql host=hadoop-node3 port=3306 user=hive password=123456 name=hue hadoop_conf_dir=/opt/servers/hadoop-2.9.2/etc/hadoop mysql -uhive -p12345678 mysql> create database hue; build/env/bin/hue syncdb build/env/bin/hue migrate
启动hue服务
1 2 3 4 5 6 groupadd hue useradd -g hue hue build/env/bin/supervisor
web地址:hadoop-node2:8000
Hue整合Hadoop、Hive
修改参数文件 /opt/servers/hue/desktop/conf/pseudo-distributed.ini
集成HDFS、YARN
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 app_blacklist=search fs_defaultfs=hdfs://hadoop-node1:9000 webhdfs_url=http://hadoop-node1:50070/webhdfs/v1 hadoop_conf_dir=/opt/servers/hadoop-2.9.2/etc/hadoop resourcemanager_host=hadoop-node3 resourcemanager_port=8032 submit_to=True resourcemanager_api_url=http://hadoop-node3:8088 proxy_api_url=http://hadoop-node3:8088 history_server_api_url=http://hadoop-node3:19888
集成Hive
集成Hive需要启动 Hiveserver2 服务,在linux123节点上启动 Hiveserver2
1 2 3 4 hive_server_host=hadoop-node3 hive_server_port=10000 hive_conf_dir=/opt/lagou/hive-2.3.7/conf
集成MySQL
1 2 3 4 5 6 7 8 9 10 [[[mysql]]] nice_name="My SQL DB" name=hue engine=mysql host=hadoop-node3 port=3306 user=hive password=123456
备注:name是数据库名,即 database 的名称