ClickHouse系列6-CK简单使用

一. 环境准备

MySQL端:

mysql> select count(*) from fact_sale;
+-----------+
| count(*)  |
+-----------+
| 767830000 |
+-----------+
1 row in set (3 min 8.65 sec)

mysql> desc fact_sale;
+------------+--------------+------+-----+-------------------+-----------------------------+
| Field      | Type         | Null | Key | Default           | Extra                       |
+------------+--------------+------+-----+-------------------+-----------------------------+
| id         | bigint(8)    | NO   | PRI | NULL              | auto_increment              |
| sale_date  | timestamp    | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| prod_name  | varchar(200) | NO   |     | NULL              |                             |
| sale_nums  | int(11)      | YES  |     | NULL              |                             |
| reserverd1 | varchar(100) | YES  |     | NULL              |                             |
+------------+--------------+------+-----+-------------------+-----------------------------+
5 rows in set (0.01 sec)

CK端:

CREATE TABLE fact_sale (
  id UInt32,
  sale_date Datetime,
  prod_name String,
  sale_nums UInt32
) ENGINE = MergeTree
PARTITION BY toYYYYMMDD(sale_date)
PRIMARY KEY id
ORDER BY id;

二.将MySQL大表数据复制到CK

2.1 初步测试同步数据

代码:

-- 登陆clickhouse
clickhouse-client -m --password abc123 -d test

-- 模板
mysql('host:port', 'database', 'table', 'user', 'password'[, replace_query, 'on_duplicate_clause']);

-- 实际执行
INSERT INTO fact_sale SELECT id,sale_date,prod_name,sale_nums FROM mysql('10.31.1.122:3306', 'test', 'fact_sale', 'root', 'abc123');

超时:
Timeout exceeded while receiving data from server. Waited for 300 seconds, timeout is 300 seconds.
Cancelling query.
image.png

调整参数
结果依旧无效

set max_execution_time=3600;
set distributed_ddl_task_timeout = 1800;

然后修改脚本
依旧无效。

set max_execution_time=3600;
set distributed_ddl_task_timeout = 1800;
INSERT INTO fact_sale SELECT id,sale_date,prod_name,sale_nums FROM mysql('10.31.1.122:3306?socket_timeout=3600000', 'test', 'fact_sale', 'root', 'abc123');

查找值为300的参数

 select * from system.settings where value = '300';
set receive_timeout = 3600;
set send_timeout = 3600;
set external_storage_rw_timeout_sec=3600;
INSERT INTO fact_sale SELECT id,sale_date,prod_name,sale_nums FROM mysql('10.31.1.122:3306?socket_timeout=3600000', 'test', 'fact_sale', 'root', 'abc123');

最后在mysql端kill掉进程,才发现是因为单个insert不能超过100个分区

hp5 :) INSERT INTO fact_sale SELECT id,sale_date,prod_name,sale_nums FROM mysql('10.31.1.122:3306?socket_timeout=3600000', 'test', 'fact_sale', 'root', 'abc123');

INSERT INTO fact_sale SELECT
    id,
    sale_date,
    prod_name,
    sale_nums
FROM mysql('10.31.1.122:3306?socket_timeout=3600000', 'test', 'fact_sale', 'root', 'abc123')

Query id: 56c07ef3-fc5b-4b41-9bcd-b67183516a31

Timeout exceeded while receiving data from server. Waited for 300 seconds, timeout is 300 seconds.
Cancelling query.
Query was cancelled.

0 rows in set. Elapsed: 480.769 sec. 

Received exception from server (version 22.2.2):
Code: 252. DB::Exception: Received from localhost:9000. DB::Exception: Too many partitions for single INSERT block (more than 100). The limit is controlled by 'max_partitions_per_insert_block' setting. Large number of partitions is a common misconception. It will lead to severe negative performance impact, including slow server startup, slow INSERT queries and slow SELECT queries. Recommended total number of partitions for a table is under 1000..10000. Please note, that partitioning is not intended to speed up SELECT queries (ORDER BY key is sufficient to make range queries fast). Partitions are intended for data manipulation (DROP PARTITION, etc).. (TOO_MANY_PARTS)

修改单个查询的分区数,发现这个界面才是正常的

set max_partitions_per_insert_block = 3600;

image.png

2.2 调整了所有参数后再次运行

代码

-- 建表语句
CREATE TABLE fact_sale_new (
  id UInt32,
  sale_date Datetime,
  prod_name String,
  sale_nums UInt32
) ENGINE = MergeTree
PARTITION BY toYYYYMMDD(sale_date)
PRIMARY KEY id
ORDER BY id;

-- 登陆CK
clickhouse-client -m --password abc123 -d test

-- 修改参数
set max_execution_time=3600;
set distributed_ddl_task_timeout = 1800;
set receive_timeout = 3600;
set send_timeout = 3600;
set external_storage_rw_timeout_sec=3600;
set max_partitions_per_insert_block = 3600;

-- 导入数据
INSERT INTO fact_sale_new SELECT id,sale_date,prod_name,sale_nums FROM mysql('10.31.1.122:3306?socket_timeout=3600000', 'test', 'fact_sale', 'root', 'abc123');

报错:
DB::Exception: Received from localhost:9000. DB::Exception: Memory limit (for query) exceeded: would use 9.32 GiB (attempt to allocate chunk of 4223199 bytes), maximum: 9.31 GiB. (MEMORY_LIMIT_EXCEEDED)

image.png

网上搜了一下,然后调整如下参数,继续

SET max_memory_usage = 128000000000;

同样报错,预估是我机器的内存太小了,只有8G,所以没办法加了。
image.png

2.3 修改表结构

代码

-- 修改表结构
CREATE TABLE fact_sale (
  id UInt32,
  sale_date Datetime,
  prod_name String,
  sale_nums UInt32
) ENGINE = MergeTree
PARTITION BY prod_name
PRIMARY KEY prod_name
ORDER BY prod_name;

-- 登陆CK
clickhouse-client -m --password abc123 -d test

-- 同步数据
INSERT INTO fact_sale SELECT id,sale_date,prod_name,sale_nums FROM mysql('10.31.1.122:3306?socket_timeout=3600000', 'test', 'fact_sale', 'root', 'abc123');

进度一直在刷新,这个挺好的
image.png

20分钟左右就同步了7亿数据,这个速度不错
image.png

最终查询数据与源数据一致
image.png

参考:

  1. https://help.aliyun.com/document_detail/197622.html

猜你喜欢

转载自blog.csdn.net/u010520724/article/details/123984223