cdr define shardCollection
The cdr define shardCollection command creates a sharding definition for distributing a table or collection across multiple shard servers.
Syntax
>>-cdr define shardCollection-----------------------------------> >--definition_name--database--:--user--.--+-collection-+--------> '-table------' .-delete--------. >--+------------------------+-- --type--=--+-keep----------+----> | (1) | '-informational-' | | '-informational_noer-' '-| Connect Option |-----' >-- --key--=--+-column-----------+------------------------------> '-"--expression--"-' >--+------------------------------+-----------------------------> '- --versionCol--=--+-field--+-' '-column-' .----------------------------------. V | >-- --strategy--=--+-expression----ER_group--"--WHERE_expression--"-+--ER_group--REMAINDER-+->< | .----------. | | V | | +- chash----ER_group-+--+------------------------------+----------------+ | '- --partitions--=--partitions-' | | .----------. | | V | | '- hash----ER_group-+---------------------------------------------------'
Element | Description | Restrictions |
---|---|---|
collection | The name of the collection that is distributed across database servers. | Must be the name of an existing collection. |
column | The name of a table column. | Must be the name of an existing column. |
database | The name of the database that contains the table or collection that is distributed across database servers. | Must be the name of an existing database. |
definition_name | The name of the sharding definition that is created. | |
ER_group | The ER-group name of a database server that receives sharded data. | Must be the ER-group name of an existing database server. |
expression | The WHERE-clause expression that is used to select rows or documents by shard key value. | |
field | The name of a collection field. | Must be the name of an existing field. |
REMAINDER | Specifies the database server that receives rows or documents with shard key value that is not selected by the other expressions. | |
partitions | The number of hashing partitions to create on each shard server. | Must be a positive integer. |
table | The name of the table that is distributed across database servers. | Must be the name of an existing table. |
user | The owner of the table or collection that is distributed across database servers. | Must be the name of an existing user. |
The following table describes the cdr define shardCollection parameters.
Long Form | Short Form | Description |
---|---|---|
--key= | -k | Defines the shard key on all database servers. Possible
values are:
All database servers in a shard cluster must use the same column or expression as the shard key. |
--partitions= | -p | Specifies the number of hashing partitions to create on each shard server when the sharding strategy is consistent hashing. Default is 3. The more hashing partitions, the more evenly the data is distributed among shard servers. However, if you specify more than 10 hashing partitions, the resulting SQL statement to create the sharded table might fail because it exceeds the maximum character limit for an SQL statement. |
--strategy= | -s | Specifies the method for determining which database
server an inserted row or document is distributed to. Possible
values are:
|
--type= | -t | Specifies action on the shard server where a row or document was inserted:
|
--versionCol= | -v | When --type=delete is specified in the sharding definition,
Enterprise Replication must verify that a source row or document was not updated before it can
delete the row or document on the shard server. Possible values are:
If --type=delete is set in the sharding definition, but --versionCol=column is not, changes made to rows and documents can be lost during the replication process. This parameter is required if any rows have out-of-row data, such as data stored in smart large object, or if collections have BSON documents that have sizes larger than 4 KB. |
Usage
Use the cdr define shardCollection command to create a sharding definition for distributing a table or document across multiple shard servers. The replicates that are created as part of the cdr define shard command are mastered and use always apply and row scope. You cannot specify that triggers fire.
Multiple sharding definitions are not allowed on the same table or collection.
You cannot manually define an Enterprise Replication replicate for a table that is sharded.
Return codes
A return code of 0 indicates that the command was successful.
If the command is not successful, one of the following error codes is returned: 3, 18, 39, 52, 83, 99, 125, 196, 215, 229.
For information about these error codes, see Return Codes for the cdr Utility.
Example: Creating a sharding definition that uses a consistent hash algorithm
The following example creates a sharding definition that is named collection_1. Rows that are inserted on any of the shard servers are distributed, based on a consistent hash algorithm, to the appropriate shard server. Enterprise Replication must verify that a replicated row or document was not updated before the row or document can be deleted on the source server. The b column in the customers table that is owned by user john is the shard key. Each shard server has three hashing partitions.
cdr define shardCollection collection_1 db_1:john.customers
--type=delete --key=b --strategy=chash --partitions=3 --versionCol=column_3
g_shard_server_1 g_shard_server_2 g_shard_server_3
The partition range for each shard server is calculated based on the server group name. The data is distributed according to the following sharding definition:
g_shard_server_1 (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 4019 and 5469)
or (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 5719 and 6123)
or (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 2113 and 2652)
g_shard_server_2 (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 6124 and 7415)
or (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 5470 and 5718)
or (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 7416 and 7873)
g_shard_server_3 (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 2653 and 3950)
or mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) >= 7874
or mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) < 2113
or (mod(abs(ifx_checksum(b::LVARCHAR, 0)), 10000) between 3951 and 40
Example: Creating a sharding definition that uses a hash algorithm
The following example creates a sharding definition that is named collection_1. Rows that are inserted on any of the shard servers are distributed, based on a hash algorithm, to the appropriate shard server. Enterprise Replication must verify that a replicated row or document was not updated before the row or document can be deleted on the source server. The state column in the customers table that is owned by user john is the shard key.
cdr define shardCollection collection_1 db_1:john.customers
--type=delete --key=state --strategy=hash --versionCol=version
g_shard_server_A g_shard_server_B g_shard_server_C g_shard_server_D
Example: Creating a sharding definition that uses an IN expression
The following example creates a sharding definition that is named collection_2. The state column in the clients table that is owned by user joe is the shard key. Rows that are inserted on any of the shard servers are distributed, based on the defined expression, to the appropriate shard server. Replication acknowledgment must verify that a replicated row or document was not updated before the row or document can be deleted on the source shard server.
cdr define shardCollection collection_2 db_2:joe.clients
--type=delete --key=state --strategy=expression –-versionCol=version
g_shard_server_A "IN ('TX','OK')"
g_shard_server_B "IN ('NY','NJ')"
g_shard_server_C "IN ('AL','GA')"
g_shard_server_D REMAINDER
- Inserted rows that have a value of AL in the state column are sent to g_shard_server_C.
- Inserted rows that have a value of NJ in the state column are sent to g_shard_server_B.
- Inserted rows that have a value of CA in the state column are sent to g_shard_server_D.
Example: Creating a sharding definition that uses a BETWEEN expression
The following example creates a definition that is named collection_3. The age column in the users table that is owned by user charles is the shard key. Rows that are inserted on any of the shard servers are distributed, based on the defined expression, to the appropriate shard server. Replication acknowledgment must verify that a replicated row or document was not updated before the row or document can be deleted on the source shard server.
cdr define shardCollection collection_3 db_3:charles.users
--type=delete --key=age --strategy=expression –-versionCol=version
g_shard_server_A "BETWEEN 0 and 20"
g_shard_server_B "BETWEEN 21 and 62"
g_shard_server_C "BETWEEN 63 and 100"
g_shard_server_D REMAINDER
- Inserted rows that have a value of 35 in the age column are sent to g_shard_server_B.
- Inserted rows that have a value of 102 in the age column are sent to g_shard_server_D.
- Inserted rows that have a value of 15 in the age column are sent to g_shard_server_A.
Example: Creating a sharding definition that defines a shard key by function
cdr define shardCollection collection_4 db_4:mike.cars
–t delete -k "bson_value_lvarchar(data,'COLOR')" -s expression –v version
g_shard_server_E "IN ('blue','green')"
g_shard_server_F "IN ('black','white')"
g_shard_server_G "IN ('brown','gray')"
g_shard_server_H "IN ('red','yellow')"
g_shard_server_I REMAINDER
- Inserted documents that have a value of yellow in the COLOR key are sent to g_shard_server_H.
- Inserted documents that have a value of blue in the COLOR key are sent to g_shard_server_E.
- Inserted documents that have a value of pink in the COLOR key are sent to g_shard_server_I.