ClickHouse 读"写"分离

写隔离 ,因为ClickHouse 都是本地表写入,所有用户通过system.clusters 来设计不同的写入规则即可

所以我们这里介绍可以匹配的写得查询隔离

Row 1:
──────
cluster:                 test_cluster_two_shards
shard_num:               1
shard_weight:            1
replica_num:             1
host_name:               127.0.0.1
host_address:            127.0.0.1
port:                    9000
is_local:                1
user:                    default
default_database:
errors_count:            0
slowdowns_count:         0
estimated_recovery_time: 0
​
Row 2:
──────
cluster:                 test_cluster_two_shards
shard_num:               2
shard_weight:            1
replica_num:             1
host_name:               127.0.0.2
host_address:            127.0.0.2
port:                    9000
is_local:                0
user:                    default
default_database:
errors_count:            0
slowdowns_count:         0
estimated_recovery_time: 0

一 基础知识

1 load_balancing

Specifies the algorithm of replicas selection that is used for distributed query processing.

ClickHouse supports the following algorithms of choosing replicas:

See also:

Random (by Default)
load_balancing = random

The number of errors is counted for each replica. The query is sent to the replica with the fewest errors, and if there are several of these, to anyone of them. Disadvantages: Server proximity is not accounted for; if the replicas have different data, you will also get different data.

Nearest Hostname
load_balancing = nearest_hostname

The number of errors is counted for each replica. Every 5 minutes, the number of errors is integrally divided by 2. Thus, the number of errors is calculated for a recent time with exponential smoothing. If there is one replica with a minimal number of errors (i.e. errors occurred recently on the other replicas), the query is sent to it. If there are multiple replicas with the same minimal number of errors, the query is sent to the replica with a hostname that is most similar to the server’s hostname in the config file (for the number of different characters in identical positions, up to the minimum length of both hostnames).

For instance, example01-01-1 and example01-01-2 are different in one position, while example01-01-1 and example01-02-2 differ in two places. This method might seem primitive, but it does not require external data about network topology, and it does not compare IP addresses, which would be complicated for our IPv6 addresses.

Thus, if there are equivalent replicas, the closest one by name is preferred. We can also assume that when sending a query to the same server, in the absence of failures, a distributed query will also go to the same servers. So even if different data is placed on the replicas, the query will return mostly the same results.

In Order
load_balancing = in_order

Replicas with the same number of errors are accessed in the same order as they are specified in the configuration. This method is appropriate when you know exactly which replica is preferable.

First or Random
load_balancing = first_or_random

This algorithm chooses the first replica in the set or a random replica if the first is unavailable. It’s effective in cross-replication topology setups, but useless in other configurations.

The first_or_random algorithm solves the problem of the in_order algorithm. With in_order, if one replica goes down, the next one gets a double load while the remaining replicas handle the usual amount of traffic. When using the first_or_random algorithm, the load is evenly distributed among replicas that are still available.

It's possible to explicitly define what the first replica is by using the setting load_balancing_first_offset. This gives more control to rebalance query workloads among replicas.

Round Robin
load_balancing = round_robin

This algorithm uses a round-robin policy across replicas with the same number of errors (only the queries with round_robin policy is accounted).

2 distributed_replica_max_ignored_errors

  • Type: unsigned int
  • Default value: 0

The number of errors that will be ignored while choosing replicas (according to load_balancing algorithm).

See also:

3 priority

        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>false</internal_replication>
            <replica>
                <!-- Optional. Priority of the replica for load balancing (see also load_balancing setting). Default: 1 (less value has more priority). -->
                <priority>1</priority>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>

4 load_balancing in users.xml

<?xml version="1.0"?>
<clickhouse>
    <profiles replace="replace">
        <default>
            <load_balancing>first_or_random</load_balancing>
        </default>
    </profiles>
</clickhouse>

5 load_balancing in metrika.xml

        <shard>
            <weight>1</weight>
            <internal_replication>false</internal_replication>
            <replica_list load_balancing="first_or_random">
            <replica>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>

6 vcluster

<remote_servers>
    <cluster1>
    </cluster1>
    <cluster2>
    </cluster2>
</remote_servers>

7 Replicated + mutli cdw cluster + auxiliary_zookeepers

CREATE TABLE table_name
(
    EventDate DateTime,
    CounterID UInt32,
    UserID UInt32
) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{layer}-{shard}/table_name', '{replica}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID)
<auxiliary_zookeepers>
    <zookeeper2>
        <node>
            <host>example_2_1</host>
            <port>2181</port>
        </node>
        <node>
            <host>example_2_2</host>
            <port>2181</port>
        </node>
        <node>
            <host>example_2_3</host>
            <port>2181</port>
        </node>
    </zookeeper2>
    <zookeeper3>
        <node>
            <host>example_3_1</host>
            <port>2181</port>
        </node>
    </zookeeper3>
</auxiliary_zookeepers>

二 方案

1 priority

查询弱隔离 调整 不同 replica priority 即可

        <shard>
            <!-- Optional. Shard weight when writing data. Default: 1. -->
            <weight>1</weight>
            <!-- Optional. Whether to write data to just one of the replicas. Default: false (write data to all replicas). -->
            <internal_replication>false</internal_replication>
            <replica>
                <!-- Optional. Priority of the replica for load balancing (see also load_balancing setting). Default: 1 (less value has more priority). -->
                <priority>1</priority>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>

2 user.xml load_balancing

查询强隔离 可设置到不同 profile 粒度, 设置 first_or_random

<?xml version="1.0"?>
<clickhouse>
    <profiles replace="replace">
        <default>
            <load_balancing>first_or_random</load_balancing>
        </default>
    </profiles>
</clickhouse>

3 metika.xml load_balancing

    <shard>
        <weight>1</weight>
        <internal_replication>false</internal_replication>
        <replica_list load_balancing="first_or_random">
        <replica>
            <host>example01-01-1</host>
            <port>9000</port>
        </replica>
        <replica>
            <host>example01-01-2</host>
            <port>9000</port>
        </replica>
    </shard>

4 vcluster

读集群

写集群

创建Schema集群

<remote_servers>
    <read_cluster1>
         <shard>
            <replica>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
        </shard>
    </read_cluster1>
    <write_cluster1>
        <shard>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>
    </write_cluster1>
    <schema_cluster1>
         <shard>
            <replica>
                <host>example01-01-1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>example01-01-2</host>
                <port>9000</port>
            </replica>
        </shard>
    </schema_cluster1>
</remote_servers>

5 multi cluster

Cluster 1 + Cluster2 + auxiliary_zookeepers

On shard use on zookeeper

cluster1
<remote_servers>
    <write_cluster>
         <shard>
            <replica>
                <host>cluster1-shard1-replica1</host>
                <port>9000</port>
            </replica>
        </shard>
    </write_cluster>
    <cluster>
         <shard>
            <replica>
                <host>cluster1-shard1-replica1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>cluster2-shard1-replica1</host>
                <port>9000</port>
            </replica>
        </shard>
    </cluster>
</remote_servers>
​
​
cluster2 
<remote_servers>
    <write_cluster>
         <shard>
            <replica>
                <host>cluster2-shard1-replica1</host>
                <port>9000</port>
            </replica>
        </shard>
    </write_cluster>
    <cluster>
         <shard>
            <replica>
                <host>cluster1-shard1-replica1</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>cluster2-shard1-replica1</host>
                <port>9000</port>
            </replica>
        </shard>
    </cluster>
</remote_servers>

6 其他

以上方案组合拳

原文地址:https://cloud.tencent.com/developer/article/1957660

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


学习编程是顺着互联网的发展潮流,是一件好事。新手如何学习编程?其实不难,不过在学习编程之前你得先了解你的目的是什么?这个很重要,因为目的决定你的发展方向、决定你的发展速度。
IT行业是什么工作做什么?IT行业的工作有:产品策划类、页面设计类、前端与移动、开发与测试、营销推广类、数据运营类、运营维护类、游戏相关类等,根据不同的分类下面有细分了不同的岗位。
女生学Java好就业吗?女生适合学Java编程吗?目前有不少女生学习Java开发,但要结合自身的情况,先了解自己适不适合去学习Java,不要盲目的选择不适合自己的Java培训班进行学习。只要肯下功夫钻研,多看、多想、多练
Can’t connect to local MySQL server through socket \'/var/lib/mysql/mysql.sock问题 1.进入mysql路径
oracle基本命令 一、登录操作 1.管理员登录 # 管理员登录 sqlplus / as sysdba 2.普通用户登录
一、背景 因为项目中需要通北京网络,所以需要连vpn,但是服务器有时候会断掉,所以写个shell脚本每五分钟去判断是否连接,于是就有下面的shell脚本。
BETWEEN 操作符选取介于两个值之间的数据范围内的值。这些值可以是数值、文本或者日期。
假如你已经使用过苹果开发者中心上架app,你肯定知道在苹果开发者中心的web界面,无法直接提交ipa文件,而是需要使用第三方工具,将ipa文件上传到构建版本,开...
下面的 SQL 语句指定了两个别名,一个是 name 列的别名,一个是 country 列的别名。**提示:**如果列名称包含空格,要求使用双引号或方括号:
在使用H5混合开发的app打包后,需要将ipa文件上传到appstore进行发布,就需要去苹果开发者中心进行发布。​
+----+--------------+---------------------------+-------+---------+
数组的声明并不是声明一个个单独的变量,比如 number0、number1、...、number99,而是声明一个数组变量,比如 numbers,然后使用 nu...
第一步:到appuploader官网下载辅助工具和iCloud驱动,使用前面创建的AppID登录。
如需删除表中的列,请使用下面的语法(请注意,某些数据库系统不允许这种在数据库表中删除列的方式):
前不久在制作win11pe,制作了一版,1.26GB,太大了,不满意,想再裁剪下,发现这次dism mount正常,commit或discard巨慢,以前都很快...
赛门铁克各个版本概览:https://knowledge.broadcom.com/external/article?legacyId=tech163829
实测Python 3.6.6用pip 21.3.1,再高就报错了,Python 3.10.7用pip 22.3.1是可以的
Broadcom Corporation (博通公司,股票代号AVGO)是全球领先的有线和无线通信半导体公司。其产品实现向家庭、 办公室和移动环境以及在这些环境...
发现个问题,server2016上安装了c4d这些版本,低版本的正常显示窗格,但红色圈出的高版本c4d打开后不显示窗格,
TAT:https://cloud.tencent.com/document/product/1340