Replication in MS SQL Server

Keshav Singh

4.61/5 (31 votes)

Jun 23, 2011

CPOL

12 min read

176074

An article on replication in MS SQL Server

Introduction

There are a lot of people who subscribe for magazines like readers digest or India today or get organizational news update every month in their mail box. Information about the latest organizational/national development is communicated over these email/magazines. The information is directly dropped in mail box or at the door step. We simply need to subscriber for the desired magazine or mailer. Similarly the replication feature in MS SQL Server moves the data from a remote server to our local server boxes via publications and subscriptions mechanism. There are various reasons and scenario where replications can be considered a very strong tool for data relay. We could consider replication for,

Getting the data closer to the user, consider a server stationed in Germany and since the business also operates at Bangalore in India we need the data quite frequently. Now every time we need the data from the Products or Sales table we need to use a linked server, connect to the German Server and pull the data. This will have impacts like:

We need to rely heavily on the network connectivity each time we pull the data.
Secondly the source server will have to bear the load of data reads.
Also these ad-hoc queries might create conflicts if there is any exclusive lock on a record on account of any transaction taking place on the source data. Further to this, the ad-hoc pull queries will run for a really long time in this case eating up network bandwidth and also causing unnecessary load on the source.
And if the data is to be pulled on a regular basis this ad-hoc queries seem really out of place option.

Consider replication for removing impacts of heavy read intensive operations like report generation etc. Replication is a very good option when the desired data is simply read only and the update to the source is not intended.
Consider replication when server pulling the data intends to own the pulled data i.e. make changes to the pulled version without impacting the source. Replication provides the desired autonomy to the subscriber.

Getting Started with Replication

Before getting into the details on how to setup replication let’s try to get acquainted with the terms involved with this exciting feature. Replication traditionally takes the Publisher/Subscriber analogy. It’s quite similar to the magazine example. For any magazine there is a publisher who publishes information in the form of articles. Once the magazine (which is collection of article and is called publication) is published there needs to be a distributor who will distribute it to people like you and me who are actually the subscribers. This forms the standard of the entire Publisher/Subscriber cycle. But there could be changes in the setup like there is a publisher who also acts as distributor or there could be a distributor who is also a subscriber. The key terms are:

Article: The article is the information that is going to be replicated. It could be a table, a procedure or a filtered table etc.

Publisher: The publisher is the database on the source server which is actually replicating the data. The publication which is collection of articles (various objects in the database) is published by the publisher.

Distributor: The distributor can be considered as the delivery boy who brings the publications to the subscriber. The distributor could himself be a publisher or a subscriber.

Subscriber: Subscriber is the end receiver of the publication who gets the data in the form of subscriptions. The changes published are propagated to all the subscribers of the publications through the distributor. The subscriber simply has to create a subscription on the publication from its end to receive the data.

There are various types of replication:

Transactional Replication
Transactional Replication with Updatable Subscriptions
Snapshot Replication
Merge replication

Configuring a Distributor

Before trying to get insights about each of the replications and how to configure it, it’s important to setup a distributor.

Select the server which is to act as the distributor and the right click on the replication folder and then click configure Distribution.

This will lead to the below screen, click next.

The next screen as below will as to either configure the current server as the Distributor or connect to the different desired server and configure it to be a distributor. Let’s select the current server and click next.

The next screen point to the folder path where the snapshots of the publications will be kept by the snapshot agent, we will keep the default value and click next.

The Next screen configures the Distribution database its data (.MDF) and Log (.ldf) files. Bear in mind once the distribution has been configured on a server, the system databases will have an additional database added to it “Distribution”. Click next.

On clicking next it brings you to the screen where you can add all the servers which will be the publishers and use the currently being configured distributor to distribute its publications. By default the current server will be added as the publisher once could add more servers. Click next.

Click next on the below screen and proceed.

This will bring you to the last screen which will have the summary of the configurations. Click finish to complete the Distributor configuration.

Alternatively please watch the distribution configuration video below.

Transactional Replication

In the transactional replication the transactions occurring on the published articles from the publisher are forwarded on to the distributor who in turn replicates the same and commits them on the subscribers. Subscribers can use this data for read only purposes. As transactions are small items to publish the latency for transactional replication is very low. As far as the autonomy is concerned as the data is read only type each of the subscribers cannot update the data and hence there is absolutely no autonomy in this type of replication.

Suppose there is a ticket booking web site, all the tickets booked are centrally stored in the database hosted at New Delhi. There are distribution centers in every city in the country where the bookings are received and the booked ticket shipped at the addresses provided. All the tickets booked from Hyderabad needs to be shipped to the respective customers. The Hyderabad distribution center could setup a filtered (get bookings for Hyderabad only) transactional replication so that every new booking (transaction) is replicated to their center with minimal delay (almost immediately). They need a read only access to the replicated data so transactional replication fits the bill. They could dispatch the booked ticket ASAP with transactional replication setup.

Key facts of transactional replication:

As replication happens on a transaction, the latency of replication is very low.
The subscription is read only, hence there almost no autonomy for the subscribers.

Please follow the configuration video here.

In the video we have seen something like below: The Publisher (Server 1) of the article was acting itself as the dristibutor and the servers 2 & 3 subscribed the published aritcle.

Scenario: Let’s consider a slightly complex scenario which I have faced sometime back. Consider a hypothetical scenario, as per the diagram below, there are 2 servers both acting as a publisher and distributor themselves and they are publishing an article which consists of a table say “Student”. Server 1 publishes records of class 10 students in the table “Student” and Server 2 publishes class 12 student details. The schema and other details of the student table are consistent and identical but the data contained is not the same. Now the server 3 (which is the server of the school’s principal) needs read only data from both these server into its student table i.e. the student table at the principal’s end should have data from class 10^th and 12^th both.

To see the transactional replications setup configurations for this scenario please see the video above.

Transactional Replication with Updatable Subscriptions

This is similar to the transactional replication with the additional capability for the subscribers to be able to update the published articles. This way there is a gain in the autonomy as the subscribers can update the existing data, the latency is maintained at the same level. In most of the scenarios this considered to be the best solution. This setup uses the two-phase commit process to keep the Publisher/Subscriber in sync. The two phase commit process uses the MSDTC so that the changes are made simultaneously at the publisher-subscribers end. So before setting up this kind of replication it’s important to have the MSTDC up and running.

To activate the MSDTC, In Windows Explorer -> Start -> Control Panel -> Administrative Tools -> Component Services.

In Component Services, open Services. In the services list, you will find the item on the DTC. Activate it! Once the DTC is activated your object explorer will show DTC as green and runnable as shown in the below screen shot.

This kind of setup is extremely useful for something like railways reservation system with multiple reservation centers. Bookings are replicated transactionally all across to all the subscribers (centers) and also at the publishers (primary server) end. The synchronization takes place all across on a transaction and not on the whole of the article. But what if a transaction happens on the same record at the same time at the subscriber1 and subscriber2 and publisher, in this case there is a conflict resolution that is specified at the time of setting up of the replication. Example the setup makes the publisher transaction winner in case of any conflicts. So the Publisher’s change will be the final one and will be replicated across all the subscribers in case of conflicts.

To see the transactional replications with immediate updating subscribers setup configurations please see the video below.

Snapshot Replication

Snapshot replications as the distribution method moves the entire copy of published articles throught the distributor to the subscriber. This type of replication method provides the subscribers with a very high autonomy. That means that they can update or changes their local copy of subscribed data without causing any changes on the publishers data. The next time the data gets syncronized the entire snapshot gets overwritten. Latency for such configuration setup is also high because as the entire publication gets syncronized. This kind of replicatons mainly finds use in OLAP servers. The data for OLAP might be pulled every week or fortnight and would be readonly for reporting and analytical processing purposes.

To see the snapshot replication setup configurations please see video below.

Merge Replication

Merge replication allows each of the subscribers to edit their piece of subscriptions independently and at some point these changes are merged together and synced amongst all the subscribers and publisher on the whole. This seems quite complex isn’t it? It is, but with wise configuration, this setup could be a real asset as this solution provides the highest level of autonomy to each of the subscribers. This setup requires a very wise conflict resolution. Suppose a record is updated with value 10 by a subsA and the same record gets updated to 5 by subsB and also the publisher updates it to 15 now changes from both the subscribers will be merged followed by the publisher’s change. In this case the publisher’s change wins. For changes involving only the subscribers we need to have a conflict resolution in place, example it has been configured that for only subscriber’s changes SubsA wins.

To see the merge replication setup configurations please see video below.

Conclusion

While a subscription is configured they are either created as PULL or PUSH subscriptions. Push subscription means the control for syncing the changes falls on the shoulders of the centralized distributor. This configurations helps when the intentions is receive the changes whenever there is a change on the publisher side. Moreover since this is a centralized way it’s helpful as all the subscribers to be updated by the distributor itself.

Pull subscriptions are designed to be triggered from the subscriber’s side. The subscriber pulls the data from the publisher on need basis or can conveniently schedule the job to run on its own discretion without depending on the publisher. This is also a good way of doing things for system which are not constantly connected to the system. Hence it helps do away from unnecessary push jobs even when the subscriber is not connected. Rather whenever the subscriber gets connected to the system it simply pulls the data and syncs up.

For either of the subscriptions there are few involved agent jobs which take care of syncing things up. They are,

Log reader Agent: The log reader agent sits on the distribution server with the entrusted job to constantly monitor the transactions logs of the published databases which uses this distributor. Whenever there a transaction happens on the database which acts as the publisher, the log reader agent stores the transaction in its Distribution system database and further these changes are forwarded to the respective subscribers via distribution agent or merge agent.

Distribution Agent: The distribution agent is responsible for moving the stored transaction from the distribution server to the subscribers.

Snapshot Agent: For any replication type this agent is responsible for copying the initial snapshot (i.e. complete schema and data) of the publication from the publisher to the subscribers. This forms the base after which the depending upon the kind of replication configured further changes are copied to the subscribers.

Merger Agent:This agent is responsible for syncing all the changes across the subscribers and publisher.

Queue Reader Agent: This agent is used for updating publishers/subscribers replication where all the changes are queued up at the distributor’s end and are reapplied on the publications.

These are the SQL Server agent job which gets created upon configuring replication. This is a sincere try to explain replication briefly and clearly. Your comments on its improvement are dearly welcome.