HAWQ上安装PXF插件,并访问HDFS文件数据

1、说明 HAWQ在github上的地址为:https://github.com/apache/hawq

在安装pxf插件之前,可以先查看一下基础软件对应的版本信息:在hawq目录下的pxf/gradle.properties文件中

因我在安装pxf之前,已经把hadoop及hawq安装完,在后期所需低版本的hdfs,需要重新指定低版本的路径(主要是jar包的路径)

使用的hadoop版本为2.9.0,hawq版本2.4,hbase版本1.4.3

2、下载源码

git clone https://github.com/apache/hawq.git

3、编译PXF

cd /hawq/pxf  #进入pxf源码路径中
make          #编译

在编译过程中,若有错误提示信息,将对应行的注释信息删除即可

4、安装PXF

mkdir -p /opt/pxf  #创建pxf的安装目录
export PXF_HOME=/opt/pxf #指定环境变量
make install #安装pxf到指定的目录/opt/pxf中

5、修改配置文件

1)修改pxf-env.sh

export LD_LIBRARY_PATH=/usr/local/hadoop-2.7.1/lib/native:${LD_LIBRARY_PATH  ---hadoop的lib/native存放目录
export PXF_LOGDIR=/opt/pxf/logs    ---pxf日志存放目录
export PXF_RUNDIR=/opt/pxf         ---pxf安装目录
export PXF_USER=${PXF_USER:-pxf    ---pxf所属用户
export PXF_PORT=${PXF_PORT:-51200         ---pxf端口号
export PXF_JVM_OPTS="-Xmx512M -Xss256K"   ---JVM_OPTS参数
export HADOOP_DISTRO=CUSTOM
export HADOOP_ROOT=/usr/local/hadoop-2.7.1    ---所需hadoop的版本路径

2)修改pxf-log4j.properties

log4j.appender.ROLLINGFILE.File=/opt/pxf/logs/pxf-service.log    ---/opt/pxf/logs/日志存储路径
建议,使用绝对路径,不要使用环境变量

3)修改pxf-private.classpath

# PXF Configuration
/opt/pxf/conf

# PXF Libraries
/opt/pxf/lib/pxf-hbase.jar
/opt/pxf/lib/pxf-hdfs.jar
/opt/pxf/lib/pxf-hive.jar
/opt/pxf/lib/pxf-json.jar
/opt/pxf/lib/pxf-jdbc.jar
/opt/pxf/lib/pxf-ignite.jar

# Hadoop/Hive/HBase configurations
/usr/local/hadoop-2.7.1/etc/hadoop
#/usr/local/hadoop/hive/conf
/usr/local/hbase/conf
/usr/local/hadoop-2.7.1/share/hadoop/common/hadoop-common-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/hadoop-auth-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/asm-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/avro-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-cli-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-codec-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-collections-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-configuration-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-io-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-lang-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-logging-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/commons-compress-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/guava-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/htrace-core*.jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/jetty-*.jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/jersey-core-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/jersey-server-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/log4j-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/protobuf-java-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/slf4j-api-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/snappy-java-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-common-*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/common/lib/gson-*[0-9].jar

# Pick Jackson 1.9 jars from hdfs dir for HDP tar and from mapreduce1 for CDH tar
/usr/local/hadoop-2.7.1/share/hadoop/[hdfs|madreduce1]/lib/jackson-core-asl-1.9*[0-9].jar
/usr/local/hadoop-2.7.1/share/hadoop/[hdfs|madreduce1]/lib/jackson-mapper-asl-1.9*[0-9].jar

# Hive Libraries

# HBase Libraries
/usr/local/hbase/lib/hbase-client*.jar
/usr/local/hbase/lib/hbase-common*.jar
/usr/local/hbase/lib/hbase-protocol*.jar
/usr/local/hbase/lib/htrace-core*.jar
/usr/local/hbase/lib/netty*.jar
/usr/local/hbase/lib/zookeeper*.jar
/usr/local/hbase/lib/metrics-core*.jar

注意,要写上hadoop/hbase对应的绝对路径,因我没有安装hive,把对应hive的路径都注释掉了,不然后期会提示缺少对应jar或者路径错误等信息

4)将下图中的配置文件中的路径与上图的路径修改成一致

pxf-privatebigtop.classpath
pxf-privatehdp.classpath
pxf-privatephd.classpath
pxf-public.classpath

6、初始化PXF

cd /opt/pxf/bin #进入pxf的安装目录中
执行命令 pxf init

7、启动PXF

pxf start

8、访问HDFS文件数据

1)hadoop dfs -mkdir -p /data/pxf_examples    #创建HDFS文件目录
2)创建文本数据文件pxf_hdfs_simple.txt
echo 'Prague,Jan,101,4875.33
Rome,Mar,87,1557.39
Bangalore,May,317,8936.99
Beijing,Jul,411,11600.67' > /tmp/pxf_hdfs_simple.txt
3)将数据文件添加到hdfs
hadoop dfs -put /tmp/pxf_hdfs_simple.txt /data/pxf_examples/
4)查看存储在hdfs中的文件信息:
hadoop dfs -cat /data/pxf_examples/pxf_hdfs_simple.txt
5)使用HdfsTextSimple配置文件从pxf_hdfs_simple.txt中创建可查询的HAWQ外部表:
gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders int, total_sales float8)
            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
          FORMAT 'TEXT' (delimiter=E',');
6)查询
gpadmin=# SELECT * FROM pxf_hdfs_textsimple;

猜你喜欢

转载自blog.csdn.net/xuexi_39/article/details/83932170