1、说明 HAWQ在github上的地址为:https://github.com/apache/hawq
在安装pxf插件之前,可以先查看一下基础软件对应的版本信息:在hawq目录下的pxf/gradle.properties文件中
因我在安装pxf之前,已经把hadoop及hawq安装完,在后期所需低版本的hdfs,需要重新指定低版本的路径(主要是jar包的路径)
使用的hadoop版本为2.9.0,hawq版本2.4,hbase版本1.4.3
2、下载源码
git clone https://github.com/apache/hawq.git
3、编译PXF
cd /hawq/pxf #进入pxf源码路径中
make #编译
在编译过程中,若有错误提示信息,将对应行的注释信息删除即可
4、安装PXF
mkdir -p /opt/pxf #创建pxf的安装目录
export PXF_HOME=/opt/pxf #指定环境变量
make install #安装pxf到指定的目录/opt/pxf中
5、修改配置文件
1)修改pxf-env.sh
export LD_LIBRARY_PATH=/usr/local/hadoop-2.7.1/lib/native:${LD_LIBRARY_PATH ---hadoop的lib/native存放目录
export PXF_LOGDIR=/opt/pxf/logs ---pxf日志存放目录
export PXF_RUNDIR=/opt/pxf ---pxf安装目录
export PXF_USER=${PXF_USER:-pxf ---pxf所属用户
export PXF_PORT=${PXF_PORT:-51200 ---pxf端口号
export PXF_JVM_OPTS="-Xmx512M -Xss256K" ---JVM_OPTS参数
export HADOOP_DISTRO=CUSTOM
export HADOOP_ROOT=/usr/local/hadoop-2.7.1 ---所需hadoop的版本路径
2)修改pxf-log4j.properties
log4j.appender.ROLLINGFILE.File=/opt/pxf/logs/pxf-service.log ---/opt/pxf/logs/日志存储路径
建议,使用绝对路径,不要使用环境变量
3)修改pxf-private.classpath
# PXF Configuration
/opt/pxf/conf
# PXF Libraries
/opt/pxf/lib/pxf-hbase.jar
/opt/pxf/lib/pxf-hdfs.jar
/opt/pxf/lib/pxf-hive.jar
/opt/pxf/lib/pxf-json.jar
/opt/pxf/lib/pxf-jdbc.jar
/opt/pxf/lib/pxf-ignite.jar
# Hadoop/Hive/HBase configurations
/usr/local/hadoop-2.7.1/etc/hadoop
#/usr/local/hadoop/hive/conf
/usr/local/hbase/conf
/usr/local/hadoop-2.7.1/share/hadoop/common/hadoop-common-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/hadoop-auth-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/asm-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/avro-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-cli-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-codec-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-collections-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-configuration-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-io-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-lang-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-logging-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-compress-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/guava-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/htrace-core*.jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/jetty-*.jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/jersey-core-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/jersey-server-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/log4j-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/protobuf-java-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/slf4j-api-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/snappy-java-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/gson-*[0-9].jar
# Pick Jackson 1.9 jars from hdfs dir for HDP tar and from mapreduce1 for CDH tar
/usr/local/hadoop-2.7.1/share/hadoop/[hdfs|madreduce1]/lib/jackson-core-asl-1.9*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/[hdfs|madreduce1]/lib/jackson-mapper-asl-1.9*[0-9].jar
# Hive Libraries
# HBase Libraries
/usr/local/hbase/lib/hbase-client*.jar
/usr/local/hbase/lib/hbase-common*.jar
/usr/local/hbase/lib/hbase-protocol*.jar
/usr/local/hbase/lib/htrace-core*.jar
/usr/local/hbase/lib/netty*.jar
/usr/local/hbase/lib/zookeeper*.jar
/usr/local/hbase/lib/metrics-core*.jar
注意,要写上hadoop/hbase对应的绝对路径,因我没有安装hive,把对应hive的路径都注释掉了,不然后期会提示缺少对应jar或者路径错误等信息
4)将下图中的配置文件中的路径与上图的路径修改成一致
pxf-privatebigtop.classpath
pxf-privatehdp.classpath
pxf-privatephd.classpath
pxf-public.classpath
6、初始化PXF
cd /opt/pxf/bin #进入pxf的安装目录中
执行命令 pxf init
7、启动PXF
pxf start
8、访问HDFS文件数据
1)hadoop dfs -mkdir -p /data/pxf_examples #创建HDFS文件目录
2)创建文本数据文件pxf_hdfs_simple.txt
echo 'Prague,Jan,101,4875.33
Rome,Mar,87,1557.39
Bangalore,May,317,8936.99
Beijing,Jul,411,11600.67' > /tmp/pxf_hdfs_simple.txt
3)将数据文件添加到hdfs
hadoop dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/
4)查看存储在hdfs中的文件信息:
hadoop dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
5)使用HdfsTextSimple配置文件从pxf_hdfs_simple.txt中创建可查询的HAWQ外部表:
gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
FORMAT 'TEXT' (delimiter=E',');
6)查询
gpadmin=# SELECT * FROM pxf_hdfs_textsimple;