hbase 二级索引方案

目录

hbase+es

OBServer

Endpoint

Coprocessor加载

   静态加载

   动态加载

phoenix+hbase构建二级索引方案

phoenix使用

Global Indexing的二级索引

Local Indexing的二级索引


本文介绍基于hbase+es 和 phoenix两种方案。 

hbase+es

hbase+es索引方案即可根据协处理器在数据写入hbase前或者写入hbase后进行同步到es.可以选择所有数据进行同步,空间换时间。在数据量大的情况下也可通过具体逻辑获取rowkey和需要查询的字段保存到es.通过es查找要查询的字段,锁定rowkey在去hbase种获取。

Hbase Coprocessor分为2种。 OBServer 和 Endpoint. 

OBServer

Observer类似于Mysql的触发器。当发生某些事件的时候这类协处理器会被server调用。本质就是一些散布在Hbase Server端代码种的hook钩子,在固定的事件时被调用。

比如: 1. put操作之前有钩子函数prePut, put操作之后有钩子函数postPut函数

         2.get操作之前有钩子函数preGet,get操作之后有钩子函数postGet函数等等

 Hbase为我们提供了4种Observer接口。

  1. RegionObserver   主要监听region相关的事件,比如get,put,scan,delete以及flush等。
  2. RegionServerObserver  主要监听RegionServer相关事件,比如RegionServer启动,关闭,或者执行Region合并等事件。
  3. WALObserver  主要监听WAL相关的事件,比如WAL写入,滚动等。
  4. MasterObserver  主要监听Master相关事件,比如建表,删表,以及修改表结构等。

Endpoint

Endpoint类似于mysql的存储过程,客户端可以在调用这些Endpoint处理器执行一段Server端代码,也就是可以将计算逻辑下推到RegionServer执行。

Coprocessor加载

   静态加载

   在hbase-site.xml添加对应的协处理器类,并将jar添加到hbse lib目录下。重启集群。

例:

<property>
	<name>hbase.coprocessor.region.classes</name>
	<value>org.myname.hbase.coprocessor.AggregateImplementation</value>
</property>

   动态加载

两种动态加载的方式。

  1. hbase shell.
  •  disable 'mytable'
  • alter 'mytable', METHOD => 'table_att', 'Coprocessor'=>'hdfs://namenode:por/
    user/hadoop-user/coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823|
    arg1=1,arg2=2'
  • enable 'mytable'

卸载协处理器的方式: 主需要把第二步替换成: alter 'mytable',METHOD=>'table_att_unset',NAME=>'coprocessor$1'

     2.通过api编码

  

phoenix+hbase构建二级索引方案

全局索引。适用于==读多写少==的业务场景。

本地索引。适用于==写操作频繁==以及空间受限制的场景。

针对本地索引和全局索引。又分为不可变索引(适用于数据==只增加不更新并且按照时间先后顺序存储 )

                                                       可变索引(适用于数据有==增删改==的场景。默认可变 )。

phoenix使用

操作phoenix 批量脚本执行

(批量任务脚本执行)


1. 编写创表脚本 //注意表名不加双引号,会全部大写
vim user_phoenix.sql
create table if not exists user_phoenix (
state varchar(10) NOT NULL,  
city varchar(20) NOT NULL, 
population BIGINT  CONSTRAINT my_pk PRIMARY KEY (state, city));

2.//准备数据
vim user_phoenix.csv 
NY,New York,8143197
CA,Los Angeles,3844829
IL,Chicago,2842518
TX,Houston,2016582
PA,Philadelphia,1463281
AZ,Phoenix,1461575
TX,San Antonio,1256509
CA,San Diego,1255540
TX,Dallas,1213825
CA,San Jose,912332


3.编写查看脚本
vim user_phoenix_query.sql

select state as "userState",count(city) as "City Count",sum(population) as "Population Sum" FROM user_phoenix GROUP BY state; 

4.脚本批量执行。使用psql.py在phoenix安装目录bin/
psql.py /home/hadoop/user_phoenix.sql /home/hadoop/user_phoenix.csv  /home/hadoop/user_phoenix_query.sql


 

结果:

操作phoenix第二种方式: shell

1.进入hbase  hbase shell

2.hbase添加数据
create 'employee','company','family'
put 'employee','row1','company:name','ted'
put 'employee','row1','company:position','worker'
put 'employee','row1','family:tel','13600912345'
put 'employee','row1','family:age','18'
put 'employee','row2','company:name','michael'
put 'employee','row2','company:position','manager'
put 'employee','row2','family:tel','1894225698'
put 'employee','row2','family:age','20'

3.建立hhbase到phoenix的映射表
/usr/hdp/current/phoenix-client/bin/sqlline.py

CREATE TABLE IF NOT EXISTS "employee" (
"no" VARCHAR(10) NOT NULL PRIMARY KEY,
"company"."name" VARCHAR(30),
"company"."position" VARCHAR(20), 
"family"."tel" VARCHAR(20), 
"family"."age" VARCHAR(20)) column_encoded_bytes=0;

结果:

操作phoenix第三种方式:API

public class PhoenixSearch {
    private Connection connection;
    private Statement statement;
    private ResultSet rs;
    
    @BeforeTest
    public void init() throws SQLException {
        //定义phoenix的连接url地址
        String url="jdbc:phoenix:node01:2181";
        connection = DriverManager.getConnection(url);
        //构建Statement对象
        statement = connection.createStatement();
    }
    @Test
    public void queryTable() throws SQLException {
        //定义查询的sql语句,注意大小写
        String sql="select * from US_POPULATION";
        //执行sql语句
        try {
            rs=statement.executeQuery(sql);
            while(rs.next()){
                System.out.println("state:"+rs.getString("state"));
                System.out.println("city:"+rs.getString("city"));
                System.out.println("population:"+rs.getInt("population"));
                System.out.println("-------------------------");
            }
        } catch (SQLException e) {
            e.printStackTrace();
        }finally {
            if(connection!=null){
                connection.close();
            }
        }
    }
}

二级索引使用:

1.bin/sqlline.py

2.
create  table user (
"session_id" varchar(100) not null primary key, 
"f"."cookie_id" varchar(100), 
"f"."visit_time" varchar(100), 
"f"."user_id" varchar(100), 
"f"."age" varchar(100), 
"f"."sex" varchar(100), 
"f"."visit_url" varchar(100), 
"f"."visit_os" varchar(100), 
"f"."browser_name" varchar(100),
"f"."visit_ip" varchar(100), 
"f"."province" varchar(100),
"f"."city" varchar(100),
"f"."page_id" varchar(100), 
"f"."goods_id" varchar(100),
"f"."shop_id" varchar(100)) column_encoded_bytes=0;


3.准备数据
vim  user50w.csv
d0a13bf9-debe-414e-a7ec-1f4dc6ffec81,d0a13bf9-debe-414e,2016-12-10 07:35:02,e8d3e1e9-7-0,75,male,a.jsp,Unix,Chrome,61.235.105.44,海南省,三亚,page_65,goods_522,shop_319
fe46f7bf-68e0-42c3-8b4b-5181ae992e6b,fe46f7bf-68e0-42c3,2016-12-10 09:46:36,82844fff-0-1,35,male,i.jsp,Unix,Safari,222.67.119.139,云南省,昆明,page_79,goods_961,shop_224
af64005e-d8ef-4523-952b-a797ccc87eae,af64005e-d8ef-4523,2016-12-10 08:59:52,cc13d758-5-2,47,female,f.jsp,window,Chrome,182.90.120.126,福建省,厦门,page_94,goods_355,shop_844
79ccd15c-5eec-41e9-948f-a8c90e2ab561,79ccd15c-5eec-41e9,2016-12-10 09:17:47,8e174007-f-3,16,female,f.jsp,window,Safari,61.235.158.199,河北省,保定,page_77,goods_661,shop_983
acf9cf99-c02b-4e58-a543-592d5ddd15da,acf9cf99-c02b-4e58,2016-12-10 08:35:24,d0205151-b-4,76,male,f.jsp,window,Opera,182.87.170.107,江西省,南昌,page_58,goods_917,shop_321
76507d6a-7e8c-429c-9c44-51235fa50e42,76507d6a-7e8c-429c,2016-12-10 08:33:07,7e3e9aa5-8-5,26,male,f.jsp,Unix,IE,171.13.22.69,四川省,成都,page_73,goods_831,shop_652
31b4e67d-2603-46a2-9152-ef08d51e5b44,31b4e67d-2603-46a2,2016-12-10 07:36:16,7a21a9dd-1-6,17,male,b.jsp,Linux,Firefox,171.8.205.253,湖北省,黄冈,page_79,goods_624,shop_73
ee79a7b8-0440-46d3-a5a6-e7cf973a59bc,ee79a7b8-0440-46d3,2016-12-10 08:07:06,eca940cb-3-7,61,male,a.jsp,Unix,Firefox,182.89.187.67,海南省,三亚,page_77,goods_213,shop_500
4ee59f5e-7a27-40cf-acdc-9bf7252fdaae,4ee59f5e-7a27-40cf,2016-12-10 08:17:39,092a7454-9-8,66,male,i.jsp,Linux,Firefox,123.234.240.235,江西省,南昌,page_75,goods_479,shop_952
6c63e315-ac55-4968-b803-4620654a4cf2,6c63e315-ac55-4968,2016-12-10 08:25:23,72ec3e29-a-9,23,male,g.jsp,window,Firefox,123.235.233.65,河北省,保定,page_21,goods_870,shop_846
241ced55-fe21-46e5-8171-1dc8903e645a,241ced55-fe21-46e5,2016-12-10 07:35:36,9abb589e-0-10,52,male,d.jsp,window,360,222.71.166.211,山西省,太原,page_66,goods_565,shop_381
74afd1ac-3bb4-4001-a6a6-5a87b6d034cd,74afd1ac-3bb4-4001,2016-12-10 09:06:50,ecab0619-1-11,26,female,g.jsp,Unix,360,36.58.133.237,河北省,石家庄,page_92,goods_720,shop_102
6cd8b9a5-ec5d-4ed3-9fd2-a5b6b2a6c0f8,6cd8b9a5-ec5d-4ed3,2016-12-10 09:40:39,430f6b45-8-12,9,female,b.jsp,window,Firefox,61.235.63.41,江西省,南昌,page_0,goods_302,shop_964
37836cbb-5edc-42b1-b567-96fd373c49a1,37836cbb-5edc-42b1,2016-12-10 08:25:39,fc177b4f-6-13,65,male,a.jsp,Unix,IE,182.83.197.94,上海,上海,page_35,goods_671,shop_589
0eaf16e3-455b-4a6f-aae8-8338c560233a,0eaf16e3-455b-4a6f,2016-12-10 07:59:58,f9c54b17-a-14,66,female,f.jsp,Unix,Chrome,182.91.164.116,河南省,郑州,page_88,goods_670,shop_521
a4452fbf-df1c-4a0f-81f5-6ea2f29ce37c,a4452fbf-df1c-4a0f,2016-12-10 09:43:39,ffc3f499-7-15,14,male,c.jsp,window,Opera,171.14.65.223,海南省,三亚,page_84,goods_506,shop_539
542f9fd1-d43a-417f-bd35-067fc809e97e,542f9fd1-d43a-417f,2016-12-10 07:41:36,6f12025e-f-16,78,female,g.jsp,window,Safari,121.76.65.104,河北省,石家庄,page_32,goods_233,shop_115
9a728dae-4246-4b85-8163-4340312f8513,9a728dae-4246-4b85,2016-12-10 07:55:07,82fc4ce6-f-17,50,male,i.jsp,window,360,106.85.80.251,河北省,保定,page_30,goods_558,shop_514
4191896a-5d49-4639-b686-6ea5bd4adc21,4191896a-5d49-4639,2016-12-10 09:54:04,b13ff532-3-18,10,female,f.jsp,window,Firefox,139.214.246.126,湖北省,武汉,page_27,goods_790,shop_952
e9cf805e-9dd0-4f6f-a557-2d4dd77787d9,e9cf805e-9dd0-4f6f,2016-12-10 07:25:28,fa96f2e8-3-19,12,female,i.jsp,window,Chrome,139.197.132.59,福建省,泉州,page_56,goods_859,shop_829
e640a4b8-e3e8-4243-8288-10174a27ebcd,e640a4b8-e3e8-4243,2016-12-10 07:11:22,b0602438-6-20,4,female,c.jsp,Linux,Opera,182.81.203.77,湖北省,武汉,page_60,goods_572,shop_599
98908358-6786-4bc1-aec0-95df25de8625,98908358-6786-4bc1,2016-12-10 08:05:06,90aea264-9-21,71,male,g.jsp,Unix,Chrome,210.46.30.153,海南省,三亚,page_83,goods_684,shop_164
1b59b298-ffcd-48b0-be87-cfd3616a9eb6,1b59b298-ffcd-48b0,2016-12-10 09:52:49,3cca179f-3-22,16,female,c.jsp,window,Opera,121.76.38.151,海南省,三亚,page_61,goods_935,shop_892
7ca8f087-e1a8-40a2-af98-2b6acde257cd,7ca8f087-e1a8-40a2,2016-12-10 08:08:20,ddb9a1e1-f-23,48,male,h.jsp,Unix,Firefox,61.233.133.84,湖北省,黄冈,page_3,goods_800,shop_499
522ee5e4-df0c-4f44-9386-e636f71be21a,522ee5e4-df0c-4f44,2016-12-10 07:13:10,e608db19-8-24,77,female,c.jsp,Unix,Chrome,123.232.74.186,云南省,昆明,page_47,goods_633,shop_193
9162d621-bafd-4320-a563-d9414735feb7,9162d621-bafd-4320,2016-12-10 10:15:37,0e6dc16d-9-25,8,male,f.jsp,window,Chrome,182.85.84.124,湖北省,武汉,page_75,goods_780,shop_261
3fb5df6e-ffab-4d65-a630-96aa411aaea5,3fb5df6e-ffab-4d65,2016-12-10 07:16:20,badf55e0-8-26,40,female,b.jsp,Linux,360,171.10.101.56,黑龙江,哈尔滨,page_80,goods_393,shop_807
8920282e-d47b-4898-a7d8-50003111b790,8920282e-d47b-4898,2016-12-10 10:32:48,ed726c62-7-27,61,male,h.jsp,Unix,Opera,139.205.253.141,湖南省,长沙,page_22,goods_956,shop_757
605dce05-de7c-40ca-b580-71eb5d53fb88,605dce05-de7c-40ca,2016-12-10 08:53:47,ab2ea182-2-28,28,female,d.jsp,Linux,Opera,139.205.238.221,四川省,成都,page_62,goods_377,shop_9


4./usr/hdp/current/phoenix-client/bin/psql.py -t USER  /home/hadoop/phoenixData/user50w.csv

现在表中已经有数据了

Global Indexing的二级索引

建立索引的时候还要多写一份include字段,一个索引表对应一个独立的Hbase表,读的时候直接从索引表定位并读出信息。所以这种表的应用场景定位是写的慢,读得快

添加索引
create index USER_COOKIE_ID_INDEX on USER ("f"."cookie_id"); 

注意:不是索引字段,是不会使用到索引表去查询的。

如果有字段不是索引字段是不会走索引查询,此时可以使用强制索引表。

创建covered index 

 create index on USER_COOKIE_ID_INDEX(user_id) include(sex);

Local Indexing的二级索引

 针对具体字段添加索引表。 这种索引设计方式只写索引数据,省了不少空间的占用,根据索引信息拿到原rowkey后再根据rowkey到原数据里面获取其它信息。所以这种表的应用场景定位是写的快,读得慢

创建local Indexing 索引:
create local index USER_USER_ID_INDEX on USER ("f"."user_id");

注意:本地索引查询字段如果不再索引表中,也可以走索引表查询。

删除某个表的某张索引:
drop index USER_COOKIE_ID_INDEX on user;
索引重建
alter index USER_COOKIE_ID_INDEX on user rebuild;

对phoenix映射的hbase表创建索引,直接使用hbase API写Hbase表,索引不会同步
只有通过phoenix语法写入才能使用二级索引同步机制。

猜你喜欢

转载自blog.csdn.net/weixin_40954192/article/details/107062808