跳到主要内容

Manage a Databend Meta Service Cluster

提示

Expected deployment time: 5 minutes ⏱

At any time a databend-meta node can be added or removed without service downtime.

1. Add Node

1.1 Create databend-meta-n.toml for the new node

The new node has to have a unique id and unique listening addresses. E.g., to add a new node with id 7, the config toml would look like:

databend-meta-7.toml
log_dir            = "metadata/_logs7"
admin_api_address = "0.0.0.0:28701"
grpc_api_address = "0.0.0.0:28702"

[raft_config]
id = 7
raft_dir = "metadata/datas7"
raft_api_port = 28703
raft_listen_host = "127.0.0.1"
raft_advertise_host = "localhost"
join = ["localhost:28103"]

The arg join specifies a list of raft addresses(<raft_advertise_host>:<raft_api_port>) of nodes in the existing cluster it wants to be joined to.

Databend-meta will skip join argument if it's already joined to a cluster. It check whether the committed membership contains its id to decide if to join. The explanation of this policy:(but you do not really have to read it:)

  • It can not rely on if there are logs. It's possible the leader has setup a replication to this new node but not yet added it as a voter. In such a case, this node will never be added into the cluster automatically.

  • It must detect if there is a committed membership config that includes this node. Thus only when a node has already joined to a cluster(leader committed the membership and has replicated it to this node), it skips the join process.

Why skip checking membership in raft logs:

A leader may have replicated non-committed membership to this node and the crashed. Then the next leader does not know about this new node.

Only when the membership is committed, this node can be sure it is in a cluster.

1.2 Start the new node

./databend-meta -c ./databend-meta-7.toml > meta7.log 2>&1 &

2. Remove Node

Remove a node with: databend-meta --leave-id <node_id_to_remove> --leave-via <node_addr_1> <node_addr_2>...

This command can be used anywhere there is a databend-meta installed. It will send a leave request to the first <node_addr_i> it could connect to. As part of the command, the node will be blocked from interacting with the cluster until the Leave request has been completed or an error has occurred.

databend-meta --leave-via will quit at once when the leave RPC is done.

  • --leave-via specifies a list of the node advertise addresses to send the leave request to. See: --raft-advertise-host

  • --leave-id specifies the node id to leave. It can be any id in a cluster.

3. Examine cluster members

At every step of adding or removing a node, the cluster state should be checked to ensure everything goes well.

The admin-api-address defined in the config provides a administration HTTP service to examine cluster state: E.g., curl -s localhost:28101/v1/cluster/nodes will display the members in a cluster:

[
{
"name": "1",
"endpoint": {
"addr": "localhost",
"port": 28103
}
},
{
"name": "2",
"endpoint": {
"addr": "localhost",
"port": 28203
}
}
]