Presto 一篇不太正经的全面学习笔记

写在前面

本文内容基本来源于官网,然后我结合我自己实际工作中的一些场景 把我认为 重要 且常用 的一些操作给记录下来了。目前presto是拆分为了两个项目 prestodbprestosql,有兴趣的小伙伴可以看一下,这里选用的是prestodb。

如果有任何不对的地方,欢迎小伙伴指正!笔芯~~
咱们开始把 !

#01 概述

不是数据库!()
Presto是一种分布式高效 大量数据的即席查询工具,简单来说就是对于 TB 或 PB 级数据,能够比较快的得到查询结果。

数据可以是传统关系型数据库,也可以是Hive表数据。

也可以在Join多个 不同的库的数据

是Facebook开源和社区共同维护发展的。

目前在公司的场景中,对于一些临时的数据需求,可以很快方便的得到查询结果,
这里是区别于Kylin这个框架,Kylin是提前把所有维度的情况的结果 提前计算好,需要使用的时候,直接查询的是结果 所以会比较快。

#02 核心概念

There are two types of Presto servers: coordinators and workers
在这里插入图片描述

1.Coordinators# :协调器 主节点

Coordinators communicate with workers and clients using a REST API.
用途:
0.接收客户端提交过来的语句
1.parsing statements(解析语句)
2.planning queries (生成查询计划)
3.managing Presto worker nodes(管理从节点)

具体:
1.creates a logical model(query involving a series of stages )
创建一个逻辑模型(涉及一系列的 stages )
2.translated into a series of connected tasks running on a cluster of Presto workers
转化为一系列的 tasks 运行在 worker 节点上
3.fetching results from the workers and returning the final results to the client

2.Worker# :工作节点 从节点

Workers communicate with other workers and Presto coordinators using a REST API.
用途:
executing tasks and processing data.
starts up,启动从节点之后,它会自爆自己使得主节点发现它 ,这样主节点才能管理到该从节点
具体:

  1. fetch data from connectors and exchange intermediate(中间) data with each other

3.Data Sources

整个Presto的数据源模型有以下概念:
connector, catalog, schema, and table.

4.Connector

1.You can think of a connector the same way you think of a driver for a database
(简单可以类比为比如MySQL数据库的驱动程序)
2.Presto contains several built-in connectors
(内置连接器)
3.Many third-party developers have contributed connectors so that Presto can access data in a variety of data sources.
(第三方开发的connectors )

4.Every catalog is associated with a specific connector.

catalog configuration file(contains property connector.name )
你会发现每个catalog的配置文件都会包含 connector.name 这个属性

---------------pro.college_pro.test_table-------------

5.Catalog

catalog contains schemas and references a data source via a connector
包含 schemas 并且通过 连接器反射 得到数据库的数据
Schema/Table

When a statement is executed, Presto creates a query along with a query plan that is then distributed across a series of Presto workers.

#03 Data Types

1.Boolean

values true and false.

2.Integer

TINYINT ==> minimum value of -2^7 and a maximum value of 2^7 - 1
SMALLINT ==> minimum value of -2^15 and a maximum value of 2^15 - 1
INTEGER/INT  ==> minimum value of -2^31 and a maximum value of 2^31 - 1
BIGINT ==> minimum value of -2^63 and a maximum value of 2^63 - 1

3.Floating-Point 浮点型

REAL 不常用
DOUBLE 

4.Fixed-Precision 固定精度

DECIMAL -> A fixed precision decimal number. Precision up to 38 digits is supported but performance is best up to 18 digits.
DECIMAL(10,3), DECIMAL(20) 总位数 小数后面几位

5.String

VARCHAR -> Variable length character data
CHAR 不常用
VARBINARY -> Variable length binary data.

6.JSON

JSON value type, which can be a JSON object, a JSON array, a JSON number, a JSON string, true, false or null.

7.Date and Time

DATE -> DATE '2001-08-22'
TIME -> TIME '01:02:03.456'
TIME WITH TIME ZONE  -> TIME '01:02:03.456 America/Los_Angeles'
TIMESTAMP -> TIMESTAMP '2001-08-22 03:04:05.321'
TIMESTAMP WITH TIME ZONE -> TIMESTAMP '2001-08-22 03:04:05.321 America/Los_Angeles'

INTERVAL '3' MONTH
INTERVAL '2' DAY

8.Structural

ARRAY -> ARRAY[1, 2, 3]
MAP -> MAP(ARRAY['foo', 'bar'], ARRAY[1, 2])
ROW -> A structure made up of named fields. The fields may be of any SQL type, and are accessed with field reference operator .
Example: CAST(ROW(1, 2.0) AS ROW(x BIGINT, y DOUBLE))

可以通过下面的命令查看所有的函数 。不同的presto版本支持的函数也是不同的。
SHOW FUNCTIONS
SHOW FUNCTIONS
SHOW FUNCTIONS

#04 Functions and Operators

1.Logical Operators# 逻辑运算函数

AND  OR   NOT
null and true -> null
null and false -> false
null or true -> true
null or false -> null

2.Comparison Functions and Operators 比较类函数

2.1Comparison Operators#
	< > <= >= <> != 
2.2Range Operator: BETWEEN#
	SELECT 3 BETWEEN 2 AND 6   等价于 SELECT 3 >= 2 AND 3 <= 6
2.3IS NULL and IS NOT NULL#
	SELECT NULL IS NULL; -- true
	SELECT 3.0 IS NULL; -- false
2.4Quantified Comparison Predicates: ALL, ANY and SOME#  【可以关注下】
	SELECT 'hello' = ANY (VALUES 'hello', 'world'); -- true
	SELECT 21 < ALL (VALUES 19, 20, 21); -- false
	SELECT 42 >= SOME (SELECT 41 UNION ALL SELECT 42 UNION ALL SELECT 43); -- true
2.5 LIKE#

3.Conditional Expressions# 条件表达式

3.1CASE
3.2IF
3.3COALESCE
3.4NULLIF
3.5TRY 不常用

4.Lambda Expressions#

x -> x + 1
(x, y) -> x + y
x -> regexp_like(x, 'a+')
x -> x[1] / x[2]
x -> IF(x > 0, x, -x)
x -> COALESCE(x, 0)
x -> CAST(x AS JSON)
x -> x + TRY(1 / 0)

5.Conversion Functions# 类型转换函数

很坑的点:Presto不会自动将bigint值转换为等效的varchar

语法:
cast(value AS type) → type#
try_cast(value AS type) 和cast类似,但如果投射失败,则返回null。

#05 Mathematical Functions and Operators

1.Mathematical Operators#

+ - * / % 

2.Mathematical Functions# 这块的数学函数有很多,可以去官网查阅 ,这里仅展示常用的

abs(x) -> 返回x的绝对值
ceil(x)/ceiling(x) ->  向上取整
mod(n, m) -> n 对 m 取模 
random()/rand() -> 随机  0.0 <= x < 1.0
round(x) ->  x rounded to the nearest integer
round(x, d)  -> Returns x rounded to d decimal places

---------------Bitwise Functions#--------------
这块没看 但是感觉很重要

#06 String Functions and Operators

1.String Functions#

concat(...) /  operator (||) -> 字符连接
length(string)  -> Returns the length of string in characters.
lower(string) → varchar : Converts string to lowercase / upper(string) → varchar#

lpad(string, size, padstring) → varchar / rpad(string, size, padstring) → varchar# 
ltrim(string) → varchar# : 删除左边的空格 / rtrim(string) → varchar# / trim(string) → varchar 
replace(string, search) → varchar# 

split(string, delimiter)# -> array

split_to_map(string, entryDelimiter, keyValueDelimiter) → map<varchar, varchar>#
demo:SELECT(split_to_map(‘a:1;b:2;a:3’, ‘;’, ‘:’, (k, v1, v2) -> v1));{
    
    “a”: “1”, “b”: “2”} SELECT(split_to_map(‘a:1;b:2;a:3’, ‘;’, ‘:’, (k, v1, v2) -> CONCAT(v1, v2)));{
    
    “a”: “13”, “b”: “2”}

position(substring IN string) → bigint# 指定字符在字符中的位置,1开始,如果未找到返回0 

substr(string, start) → varchar#
substr(string, start, length) → varchar#

#07 Regular Expression Functions

正则表达式
所有正则表达式函数都使用Java模式语法,具有一些显着的例外
1.regexp_extract_all(string, pattern)# 返回所有匹配项

SELECT regexp_extract_all('1a 2b 14m', '\d+'); -- [1, 2, 14]
SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2); -- ['a', 'b', 'm']

2.regexp_extract(string, pattern) → varchar# 返回第一个匹配值

SELECT regexp_extract('1a 2b 14m', '\d+'); -- 1
SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2); -- 'a'

3.regexp_like(string, pattern) → boolean# 区别是返回true or false

SELECT regexp_like('1a 2b 14m', '\d+b'); -- true

4.regexp_replace(string, pattern) → varchar# 匹配上的值直接删除

regexp_replace(string, pattern, replacement) → varchar# 匹配上的值直接替换为后面的值 

SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m'
SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 '); -- '3ca 3cb 14m'
SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2])); --'New York'

5.regexp_split(string, pattern)# 没看懂

SELECT regexp_split('1a 2b 14m', '\s*[a-z]+\s*'); -- [1, 2, 14, ]

#08 JSON Functions and Operators

基本的数据类型都是可以cast与Json类型互相转换的
JSON Functions#

1.json_array_contains(json, value) → boolean#

SELECT json_array_contains('[1, 2, 3]', 2);

2.json_array_get(json_array, index) → json#

SELECT json_array_get('["a", [3, 9], "c"]', 1); -- JSON '[3,9]'

3.json_array_length(json) → bigint#

SELECT json_array_length('[1, 2, 3]');

4.json_extract(json, json_path) → json#

SELECT json_extract(json, '$.store.book');

5.json_format(json) → varchar#

SELECT json_format(JSON '[1, 2, 3]'); -- '[1,2,3]'

6.json_parse(string) → json#

SELECT json_parse('[1, 2, 3]'); -- JSON '[1,2,3]'

7.json_size(json, json_path) → bigint#

SELECT json_size('{"x": [1, 2, 3]}', '$.x'); -- 3

#09 Date and Time Functions and Operators

重点

Aggregate Functions#
Window Functions#

#10 Array Functions and Operators#

array_distinct(x) → array# 删除重复值
cardinality(x) → bigint#
contains(x, element) → boolean#
element_at(array(E), index)  
array_sum(array(T)) → bigint/double#

filter(array(T), function(T, boolean)) -> array(T)#
SELECT filter(ARRAY [5, -6, NULL, 7], x -> x > 0); -- [5, 7]


transform(array(T), function(T, U)) -> array(U)#
SELECT transform(ARRAY [5, NULL, 6], x -> COALESCE(x, 0) + 1); -- [6, 1, 7]

#11 Map Functions and Operators

cardinality(x) → bigint
element_at(map(K, V), key)
map() → map<unknown, unknown>#


map(array(K), array(V)) -> map(K, V)#
SELECT map(ARRAY[1,3], ARRAY[2,4]); -- {
    
    1 -> 2, 3 -> 4}


map_entries(map(K, V)) -> array(row(K, V))#  Returns an array of all entries in the given map.

map_keys(x(K, V)) -> array(K)#

map_values(x(K, V)) -> array(V)#

#12 其他

SHOW TABLES [ FROM schema ]
SHOW STATS FOR table  : 返回表的统计信息  
SHOW SCHEMAS [ FROM catalog ] [ LIKE pattern ]
SHOW FUNCTIONS

to_char(timestamp, format) → varchar#
to_timestamp(string, format) → timestamp#
to_date(string, format)date#

猜你喜欢

转载自blog.csdn.net/liuge36/article/details/121508074