Creating your Shard Map Manager for Elastic Scale

Fri Jul 17, 2015 by Jan de Vries in Azure, SQL

When implementing a sharding solution, you will need something which knows in what shard a specific shardlet exists. This is something you will want to store in a single location, so you know for sure you are always using the most recent information. When using the Elastic Scale libraries this is called the Shard Map Manager. The Shard Map Manager keeps track of the location & state of the shardlets and shards. As you can imagine this is quite an important aspect of the sharding solution.

In a perfect world you will generate the Shard Map Manager (SMM) once, telling it which which shardlets reside in a specific shard and never update it again. Since the Shard Map Manager only exists in one location and hardly ever changes, it’s a great candidate for caching. This is why the Elastic Scale libraries are making sure the content of the Shard Map Manager is cached right after the first call to the database. This way the latency between the SMM and the remote location will only be hit once, after this first call it will be in-memory of the invoking location.

In the real world however, the SMM will get some changes from time to time. For example, if you are sharding by continent you might decide you want to narrow them down a bit by changing US to West US and East US. When sharding with ranges (0..100, 100..200, etc.) you might have to add some new ranges from time to time.

When such updates happen, you want these changes to be reflected on all remote locations. When using the Elastic Scale libraries this cache invalidation and fetching new data for the cache is done automatically for you. This way you only have to focus on your data access code and all this managing is handled for you.

When using Elastic Scale, one of the first steps is creating a SMM and a shard. This can be done by calling the appropriate library methods like in the example below.

// Get shard map manager database connection string
// Try to get a reference to the Shard Map Manager in the Shard Map Manager database. If it doesn't already exist, then create it.
ShardMapManager shardMapManager;
var shardMapManagerExists = ShardMapManagerFactory.TryGetSqlShardMapManager(shardMapManagerConnectionString, ShardMapManagerLoadPolicy.Lazy, out shardMapManager);
if (shardMapManagerExists)
{
	ConsoleUtils.WriteInfo("Shard Map Manager already exists");
}
else
{
	// The Shard Map Manager does not exist, so create it
	shardMapManager = ShardMapManagerFactory.CreateSqlShardMapManager(shardMapManagerConnectionString);
	ConsoleUtils.WriteInfo("Created Shard Map Manager");
}

This will create your first SMM for the specified connection string (shardMapManagerConnectionString).

The newly created SMM is still empty as it doesn’t has any reference to a shard database and doesn’t know about any shard map (routing table to tell where a specific shardlet resides).

The first thing you want to do is adding a new shard map. This shard map is necessary to add new shards and will also contain information about shardlets. A shard map is created like so:

var shardMap = this.shardMapManager.CreateOrGetListShardMap("MyExampleShardMap");

A new shard map with the name “MyExampleShardMap” is created and can be used to add new shards.

Adding a new, empty, shard is also done by using the appropriate library calls. The sample below is en excerpt from my own management application.

// Choose the shard name
var databaseName = string.Format(CreateShard.ShardNameFormat, shardMap.GetShards().Count());
var serverName = this.configurationManager.GetAppSetting(ConfigurationManager.AppSettings.Database.ShardMapManagerServerName);
// Only create the database if it doesn't already exist. It might already exist if
// we tried to create it previously but hit a transient fault.
if (!this.sqlDatabaseUtils.DatabaseExists(serverName, databaseName))
{
	this.sqlDatabaseUtils.CreateDatabase(serverName, databaseName, this.log);
}
// Create schema and populate reference data on that database
// The initialize script must be idempotent, in case it was already run on this database
// and we failed to add it to the shard map previously
this.sqlDatabaseUtils.ExecuteSqlScript(serverName, databaseName, CreateShard.InitializeShardScriptFile, this.log);

// Add it to the shard map
var shardLocation = new ShardLocation(serverName, databaseName);
var shard = this.shardManagementUtils.CreateOrGetShard(shardMap, shardLocation);

The basis of this code can also be found in the sample application of Elastic Scale.

As you can see, the first check is if a database already exists. If not, it is created.

Once it is confirmed the empty database exists a T-SQL script is executed which populates the database with all necessary tables, stored procedures, functions, etc. This new database (shard) is then added to the specified shard map.

At this time, the system still doesn’t know which shardlets have to be stored in a specific shard. You still have to create (several) mappings (routes) in your shard map.

When using a ListShardMap you have to create a PointMapping.

shardMap.CreatePointMapping(sampleId, shard);

This code will add a new mapping to the specified shard with the specified identifier. Creating a mapping for a RangeShardMap works about the same.

var sampleRange = new Range<int>(0, 100);
var mappingForNewShard = shardMap.CreateRangeMapping(sampleRange, shard);

After you have added all mappings to the shard map, there is still one thing to do. Adding the database schema info to the shard map manager.

As I’ve written earlier in this post and the previous post, when using Elastic Scale you have the power to distribute your data across multiple databases. You also have the opportunity to split and merge existing shard maps. For example, if you have a shard map defined with the range 0..100, you are able to change this on-the-fly to a 0..50 and a 50..100 shard map. All data which has to move will then be migrated to a new shard. In order do to these complex mutations the shard map manager has to now about the schema of the database. This is necessary to determine which records have to be moved.

When defining a schema you have to choose if a table is a Reference Table or a Sharded Table. A Reference Table is a table which contains information that’s the same on all shards. For example a Users table. A Sharded Table is a table which contains subsets of data. The data in these tables can be splitted and merged to different shards, depending on your defined shard maps.

The following code block creates a new database schema and adds it to the specified shard map in the shard map manager.

var schemaInfo = new SchemaInfo();
schemaInfo.Add(new ReferenceTableInfo("News"));
schemaInfo.Add(new ShardedTableInfo("Companies", "Id"));
schemaInfo.Add(new ShardedTableInfo("Orders", "CompanyId"));
schemaInfo.Add(new ReferenceTableInfo("Users"));
// Register it with the shard map manager for the given shard map name
this.shardMapManager.RegisterSchemaToShard("MyExampleShardMap", schemaInfo);

Keep in mind, for a Sharded Table, you have to specify the foreign key column name also. This means all sharded tables have to contain a column with a foreign key relation to their matching shardlet. The Elastic Scale libraries aren’t smart enough to check for cascading foreign key relations yet. Maybe this will be added in the future, maybe not.

Now that you have also added a proper schema to the shard map manager you are good to go and are able to do some more advanced stuff, like querying on your databases.

I would advise to check out the sample applications provided by the Visual Studio gallery and check out the specifics. After having done this you can add this awesome library to your own solution.