Hive内置函数和自定义函数UDF

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/weixin_43215250/article/details/84837350
  1. Hive内置函数

    Hive为我们提供了很多内置函数

    查看方式:

    1. 官方地址
    2. 启动hive后通过命令查看函数
      SHOW FUNCTIONS;
      DESCRIBE FUNCTION <function_name>;
      DESCRIBE FUNCTION EXTENDED <function_name>;
      
      hive> show functions;
      ...
      Time taken: 1.933 seconds, Fetched: 220 row(s)
      hive> DESCRIBE FUNCTION case;
      OK
      CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f
      Time taken: 0.009 seconds, Fetched: 1 row(s)
      hive>  DESCRIBE FUNCTION EXTENDED case;
      OK
      CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END - When a = b, returns c; when a = d, return e; else return f
      Example:
       SELECT
       CASE deptno
         WHEN 1 THEN Engineering
         WHEN 2 THEN Finance
         ELSE admin
       END,
       CASE zone
         WHEN 7 THEN Americas
         ELSE Asia-Pac
       END
       FROM emp_details
      Time taken: 0.005 seconds, Fetched: 13 row(s)
      
  2. 自定义函数

    临时函数

    1. 自定义UDF需要继承org.apache.hadoop.hive.ql.UDF

    2. 需要实现evaluate函数,evaluate函数支持重载。(注意:在一个库中创建的UDF函数,不能在另一个库中使用 )

      package com.cnyimi.udf;
      import org.apache.hadoop.hive.ql.exec.Description;
      import org.apache.hadoop.hive.ql.exec.UDF;
      
      @Description(name = "myadd",
      		value = "myadd(int a , int b) ==> return a + b ",
      		extended = "Example:\n"
      				+ " myadd(1,1) ==> 2 \n"
      				+ " myadd(1,2,3) ==> 6;")
      public class AddUDF extends UDF {
      
      	public int evaluate(int a ,int b) {
      		return a + b ;
      	}
      
      	public int evaluate(int a ,int b , int c) {
      		return a + b + c;
      	}
      }
      
    3. 将程序打成jar包放到目标机器上去

      cmd>cd {classes所在目录}
      cmd>jar cvf HiveDemo.jar -C x/x/x/x/classes/ .
      

      或 使用maven插件打包

    4. 进入hive客户端,添加jar包

      hive> add jar /root/test/HiveDemo.jar;(使用hdfs)
      hive> add jar hdfs://nameservice1/user/hive/udf/HiveDemo.jar;
      
    5. 创建临时函数

      hive> CREATE TEMPORARY FUNCTION add_example AS 'com.cnyimi.udf.AddUDF';
      
    6. 查询HQL语句

      SELECT add_example(8, 9) FROM scores;
      SELECT add_example(scores.math, scores.art) FROM scores;
      SELECT add_example(6, 7, 8, 6.8) FROM scores;
      
    7. 销毁临时函数

      hive> DROP TEMPORARY FUNCTION add_example;
      

    注意UDF只能实现一进一出的操作,如果需要实现多进一出,则需要实现UDAF
    一进多出,则需要实现UDTF

    永久函数

    1. 自定义UDF需要继承org.apache.hadoop.hive.ql.UDF。(注意该类的包名必须是org.apache.hadoop.hive.ql.udf)
    2. 需要实现evaluate函数,evaluate函数支持重载。
    3. 修改FunctionRegistry这个类,注册定义的udf
    4. 把udf函数编译成class放到hive-exec-0.12.0-cdh5.0.0.jar中org\apache\hadoop\hive\ql\udf 路径下面
    5. 将新的FunctionRegistry编译后的class文件替换hive-exec-0.12.0-cdh5.0.0.jar中的org.apache.hadoop.hive.ql.exec

    1. 将jar包上传到hdfs上
      [root@node00 ~]# hdfs dfs -put /root/test/substring_index_UDF.jar /user/hive/udf
      [root@node00 ~]#  hdfs dfs -ls /user/hive/udf
      Found 1 items
      -rw-r--r--   3 yarn hive      24859 2018-12-21 20:48 /user/hive/udf/substring_index_UDF.jar
      
    2. 创建永久函数
      hive> CREATE FUNCTION substring_index AS 'com.hnb.data.hive.SubStringIndexUDF' using jar 'hdfs://nameservice1/user/hive/udf/substring_index_UDF.jar';
      

    注意:执行这条语句创建永久函数,show functiuons 会加上默认的数据库名在函数名前。(default.say_hello1)

    1. 在MySQL中查询创建的自定义函数
      mysql> use hive;
      Reading table information for completion of table and column names
      You can turn off this feature to get a quicker startup with -A
      
      Database changed
      mysql> select * from funcs;
      ERROR 1146 (42S02): Table 'hive.funcs' doesn't exist
      mysql> SELECT * FROM FUNCS;
      +---------+-------------------------------------+-------------+-------+-----------------+-----------+------------+------------+
      | FUNC_ID | CLASS_NAME                          | CREATE_TIME | DB_ID | FUNC_NAME       | FUNC_TYPE | OWNER_NAME | OWNER_TYPE |
      +---------+-------------------------------------+-------------+-------+-----------------+-----------+------------+------------+
      |       1 | com.hnb.data.hive.SubStringIndexUDF |  1545016176 |     1 | substring_index |         1 | NULL       | USER       |
      +---------+-------------------------------------+-------------+-------+-----------------+-----------+------------+------------+
      1 row in set (0.00 sec)
      
      mysql> SELECT * FROM FUNC_RU;
      +---------+---------------+-----------------------------------------------------------+-------------+
      | FUNC_ID | RESOURCE_TYPE | RESOURCE_URI                                              | INTEGER_IDX |
      +---------+---------------+-----------------------------------------------------------+-------------+
      |       1 |             1 | hdfs://nameservice1/user/hive/udf/substring_index_UDF.jar |           0 |
      +---------+---------------+-----------------------------------------------------------+-------------+
      1 row in set (0.00 sec)
      
      mysql> 
      

猜你喜欢

转载自blog.csdn.net/weixin_43215250/article/details/84837350