Hive之——RegexSerDe来处理标准格式Apache Web日志

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/l1028386804/article/details/88617592

转载请注明出处:https://blog.csdn.net/l1028386804/article/details/88617592

这里,我使用的Hive版本是2.3.4。

SerDe是序列化/反序列化的简写。在内部Hive引擎使用定义的InputFormat来读取一行数据记录。这行记录之后会被传递给SerDe.deserialize()方法进行处理。
下面这个例子使用一个RegexSerde来处理标准格式的ApacheWeb日志。这个RegexSerDe作为Hive分支的标准功能使用:

首先我们在Hive中创建表serde_regex

CREATE TABLE serde_regex( 
host STRING,    
identity STRING ,
username STRING,
time STRING,
request STRING,
status STRING,
size STRING,
referer STRING,
agent STRING )
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
WITH SERDEPROPERTIES (
"input.regex" = "([\\d|.]+)\\s+([^ ]+)\\s+([^ ]+)\\s+\\[(.+)\\]\\s+\"([^ ]+)\\s(.+)\\s([^ ]+)\"\\s+([^ ]+)\\s+([^ ]+)\\s+\"(.+)\"\\s+\"(.+)\"?",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s %9$s"
)
STORED AS TEXTFILE ;

然后载入数据查询:

hive> load data local inpath '/usr/local/src/apache_log.txt' into table serde_regex;
hive> add jar /usr/local/hive-2.3.4/lib/hive-contrib-2.3.4.jar;
hive> select * from serde_regex order by time limit 10;
Query ID = root_20190317125919_86f6f8d9-ddb5-44e2-95fb-595efc165a68
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1552794249837_0003, Tracking URL = http://liuyazhuang11:8088/proxy/application_1552794249837_0003/
Kill Command = /usr/local/hadoop-2.9.2/bin/hadoop job  -kill job_1552794249837_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-03-17 12:59:28,518 Stage-1 map = 0%,  reduce = 0%
2019-03-17 12:59:36,797 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.71 sec
2019-03-17 12:59:42,971 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.6 sec
MapReduce Total cumulative CPU time: 7 seconds 600 msec
Ended Job = job_1552794249837_0003
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 7.6 sec   HDFS Read: 15991118 HDFS Write: 1281 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 600 msec
OK
61.160.224.138  -       -       11/Jul/2014:01:01:13 +0800      GET     /home.php?mod=space&do=notice&view=system       HTTP/1.0        200     7519
61.160.224.138  -       -       11/Jul/2014:01:01:15 +0800      GET     /home.php?mod=spacecp&ac=follow&op=checkfeed&rand=1405011673    HTTP/1.0        200     815
61.160.224.138  -       -       11/Jul/2014:01:01:15 +0800      GET     /home.php?mod=spacecp&ac=pm&op=checknewpm&rand=1405011673       HTTP/1.0        200     709
113.17.174.44   -       -       11/Jul/2014:01:01:17 +0800      POST    /api/manyou/my.php      HTTP/1.0        200     653
61.160.224.145  -       -       11/Jul/2014:01:01:18 +0800      GET     /static/image/feed/favorite_b.png       HTTP/1.0        200     2581
61.160.224.143  -       -       11/Jul/2014:01:01:18 +0800      GET     /static/image/feed/thread_b.png HTTP/1.0        200     2947
61.160.224.144  -       -       11/Jul/2014:01:01:18 +0800      GET     /static/image/feed/friend_b.png HTTP/1.0        200     2887
61.160.224.138  -       -       11/Jul/2014:01:01:18 +0800      GET     /forum.php?mod=ajax&action=forumjump&jfid=0&inajax=1&ajaxtarget=fjump_menu      HTTP/1.0        200     1670
61.160.224.145  -       -       11/Jul/2014:01:01:21 +0800      GET     /static/image/feed/magic_b.png  HTTP/1.0        200     2909
61.160.224.138  -       -       11/Jul/2014:01:01:22 +0800      GET     /forum.php      HTTP/1.0        200     16169
Time taken: 24.908 seconds, Fetched: 10 row(s)

附加:

Apache标准日志格式如下:

61.160.224.145 - - [11/Jul/2014:15:41:50 +0800] "GET /data/attachment/forum/201405/08/204852bmroqgmzigwqkaqm.png HTTP/1.0" 200 25136 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"
61.160.224.144 - - [11/Jul/2014:15:41:50 +0800] "GET /data/attachment/forum/201405/08/204710dfaifzt954lk4atu.png HTTP/1.0" 200 26939 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"
61.160.224.143 - - [11/Jul/2014:15:41:50 +0800] "GET /data/attachment/forum/201405/08/204901ry3om753yt330sq7.png HTTP/1.0" 200 16326 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"
61.160.224.145 - - [11/Jul/2014:15:41:50 +0800] "GET /data/attachment/forum/201405/08/204851xfgog3tbiofypoog.png HTTP/1.0" 200 34306 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"
61.160.224.144 - - [11/Jul/2014:15:41:50 +0800] "GET /data/attachment/forum/201405/08/205513bf020h08bo033jb4.png HTTP/1.0" 200 64326 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"
61.160.224.145 - - [11/Jul/2014:15:41:50 +0800] "GET /data/attachment/forum/201405/08/205810ma0aa7cfbwlbeb0b.png HTTP/1.0" 200 63042 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"
61.160.224.137 - - [11/Jul/2014:15:41:50 +0800] "GET /uc_server/avatar.php?uid=4579&size=middle HTTP/1.0" 301 426 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"
61.160.224.143 - - [11/Jul/2014:15:41:50 +0800] "GET /data/attachment/forum/201405/08/204859ubv2zsz25lz0elrf.png HTTP/1.0" 200 42997 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"
61.160.224.145 - - [11/Jul/2014:15:41:50 +0800] "GET /data/attachment/forum/201405/08/204110xrvv9s74r9gr4427.png HTTP/1.0" 200 136137 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"
61.160.224.137 - - [11/Jul/2014:15:41:50 +0800] "GET /uc_server/avatar.php?uid=5216&size=middle HTTP/1.0" 301 426 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"
61.160.224.137 - - [11/Jul/2014:15:41:50 +0800] "GET /uc_server/avatar.php?uid=6165&size=middle HTTP/1.0" 301 426 "http://www.aboutyun.com/thread-7648-1-1.html" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36 SE 2.X MetaSr 1.0"

温馨提示:

大家可以到链接https://download.csdn.net/download/l1028386804/11029052 下载Apache日志文件,进行测试

猜你喜欢

转载自blog.csdn.net/l1028386804/article/details/88617592