Introduction
There are a
lot of people who subscribe for magazines like readers digest or India today or
get organizational news update every month in their mail box. Information about
the latest organizational/national development is communicated over these
email/magazines. The information is directly dropped in mail box or at the door
step. We simply need to subscriber for the desired magazine or mailer. Similarly
the replication feature in MS SQL Server moves the data from a remote server to
our local server boxes via publications and subscriptions mechanism. There are
various reasons and scenario where replications can be considered a very strong
tool for data relay. We could consider replication for,
- Getting the data closer
to the user, consider a server stationed in Germany and since the business also
operates at Bangalore in India we need the data quite frequently. Now every
time we need the data from the Products or Sales table we need to use a linked
server, connect to the German Server and pull the data. This will have impacts
like:
- We need to rely
heavily on the network connectivity each time we pull the data.
- Secondly the
source server will have to bear the load of data reads.
- Also these ad-hoc
queries might create conflicts if there is any exclusive lock on a record on
account of any transaction taking place on the source data. Further to this,
the ad-hoc pull queries will run for a really long time in this case eating up
network bandwidth and also causing unnecessary load on the source.
- And if the data
is to be pulled on a regular basis this ad-hoc queries seem really out of place
option.
- Consider
replication for removing impacts of heavy read intensive operations like report
generation etc. Replication is a very good option when the desired data is simply
read only and the update to the source is not intended.
- Consider
replication when server pulling the data intends to own the pulled data i.e.
make changes to the pulled version without impacting the source. Replication provides
the desired autonomy to the subscriber.
Getting
Started with Replication
Before
getting into the details on how to setup replication let’s try to get
acquainted with the terms involved with this exciting feature. Replication
traditionally takes the Publisher/Subscriber analogy. It’s quite similar to the
magazine example. For any magazine there is a publisher who publishes
information in the form of articles. Once the magazine (which is
collection of article and is called publication) is published there
needs to be a distributor who will distribute it to people like you and
me who are actually the subscribers. This forms the standard of the
entire Publisher/Subscriber cycle. But there could be changes in the setup like
there is a publisher who also acts as distributor or there could be a
distributor who is also a subscriber. The key terms are:
Article: The article is the information that
is going to be replicated. It could be a table, a procedure or a filtered table
etc.
Publisher:
The publisher is the
database on the source server which is actually replicating the data. The publication
which is collection of articles (various objects in the database) is published
by the publisher.
Distributor: The distributor can be considered as
the delivery boy who brings the publications to the subscriber. The distributor
could himself be a publisher or a subscriber.
Subscriber:
Subscriber is the
end receiver of the publication who gets the data in the form of subscriptions.
The changes published are propagated to all the subscribers of the publications
through the distributor. The subscriber simply has to create a subscription on
the publication from its end to receive the data.
There are various types of
replication:
- Transactional
Replication
- Transactional
Replication with Updatable Subscriptions
- Snapshot
Replication
- Merge replication
Configuring a Distributor
Before
trying to get insights about each of the replications and how to configure it,
it’s important to setup a distributor.
Select the
server which is to act as the distributor and the right click on the
replication folder and then click configure Distribution.
![Image 1]()
This will lead to the below screen, click next.
![Image 2]()
The next screen as below will as to either configure the
current server as the Distributor or connect to the different desired server
and configure it to be a distributor. Let’s select the current server and click
next.
![Image 3]()
The next screen point to the folder path where the snapshots
of the publications will be kept by the snapshot agent, we will keep the
default value and click next.
![Image 4]()
The Next screen configures the Distribution database its
data (.MDF) and Log (.ldf) files. Bear in mind once the distribution has been
configured on a server, the system databases will have an additional database
added to it “Distribution”. Click next.
![Image 5]()
![Image 6]()
On clicking next it brings you to the screen where you can
add all the servers which will be the publishers and use the currently being
configured distributor to distribute its publications. By default the current
server will be added as the publisher once could add more servers. Click next.
![Image 7]()
Click next on the below screen and proceed.
![Image 8]()
This will bring you to the last screen which will have the
summary of the configurations. Click finish to complete the Distributor
configuration.
![Image 9]()
Alternatively please watch the distribution configuration video below.
Transactional
Replication
In the
transactional replication the transactions occurring on the published articles
from the publisher are forwarded on to the distributor who in turn replicates
the same and commits them on the subscribers. Subscribers can use this data for
read only purposes. As transactions are small items to publish the latency for
transactional replication is very low. As far as the autonomy is concerned as
the data is read only type each of the subscribers cannot update the data and
hence there is absolutely no autonomy in this type of replication.
Suppose
there is a ticket booking web site, all the tickets booked are centrally stored
in the database hosted at New Delhi. There are distribution centers in every
city in the country where the bookings are received and the booked ticket
shipped at the addresses provided. All the tickets booked from Hyderabad needs
to be shipped to the respective customers. The Hyderabad
distribution center could setup a filtered (get bookings for Hyderabad only)
transactional replication so that every new booking (transaction) is replicated
to their center with minimal delay (almost immediately). They need a read only
access to the replicated data so transactional replication fits the bill. They
could dispatch the booked ticket ASAP with transactional replication setup.
Key facts of
transactional replication:
- As replication
happens on a transaction, the latency of replication is very low.
- The subscription
is read only, hence there almost no autonomy for the subscribers.

Please follow the configuration video here.
In the video we have seen something
like below: The Publisher (Server 1) of the article was acting itself as the
dristibutor and the servers 2 & 3 subscribed the published aritcle.
Scenario:
Let’s consider a
slightly complex scenario which I have faced sometime back. Consider a
hypothetical scenario, as per the diagram below, there are 2 servers both
acting as a publisher and distributor themselves and they are publishing an
article which consists of a table say “Student”. Server 1 publishes records of
class 10 students in the table “Student” and Server 2 publishes class 12 student
details. The schema and other details of the student table are consistent and
identical but the data contained is not the same. Now the server 3 (which is
the server of the school’s principal) needs read only data from both these
server into its student table i.e. the student table at the principal’s end
should have data from class 10th and 12th both.

To
see the transactional replications setup configurations for this scenario
please see the video above.
Transactional
Replication with Updatable Subscriptions
This is
similar to the transactional replication with the additional capability for the
subscribers to be able to update the published articles. This way there is a
gain in the autonomy as the subscribers can update the existing data, the
latency is maintained at the same level. In most of the scenarios this
considered to be the best solution. This setup uses the two-phase commit
process to keep the Publisher/Subscriber in sync. The two phase commit process
uses the MSDTC so that the changes are made simultaneously at the
publisher-subscribers end. So before setting up this kind of replication it’s
important to have the MSTDC up and running.
To activate
the MSDTC, In Windows Explorer -> Start -> Control Panel ->
Administrative Tools -> Component Services.
In Component Services, open Services. In the services list, you will find the
item on the DTC. Activate it! Once the DTC is activated your object explorer
will show DTC as green and runnable as shown in the below screen shot.
This kind of
setup is extremely useful for something like railways reservation system with
multiple reservation centers. Bookings are replicated transactionally all
across to all the subscribers (centers) and also at the publishers (primary
server) end. The synchronization takes place all across on a transaction and
not on the whole of the article. But what if a transaction happens on the same
record at the same time at the subscriber1 and subscriber2 and publisher, in
this case there is a conflict resolution that is specified at the time of
setting up of the replication. Example the setup makes the publisher
transaction winner in case of any conflicts. So the Publisher’s change will be
the final one and will be replicated across all the subscribers in case of conflicts.
To
see the transactional replications with immediate updating subscribers setup configurations
please see the video below.
Snapshot
Replication
Snapshot
replications as the distribution method moves the entire copy of published
articles throught the distributor to the subscriber. This type of replication
method provides the subscribers with a very high autonomy. That means that they
can update or changes their local copy of subscribed data without causing any
changes on the publishers data. The next time the data gets syncronized the
entire snapshot gets overwritten. Latency for such configuration setup is also
high because as the entire publication gets syncronized. This kind of
replicatons mainly finds use in OLAP servers. The data for OLAP might be pulled
every week or fortnight and would be readonly for reporting and analytical
processing purposes.
To
see the snapshot replication setup configurations please see video below.
Merge
Replication
Merge
replication allows each of the subscribers to edit their piece of subscriptions
independently and at some point these changes are merged together and synced
amongst all the subscribers and publisher on the whole. This seems quite
complex isn’t it? It is, but with wise configuration, this setup could be a
real asset as this solution provides the highest level of autonomy to each of
the subscribers. This setup requires a very wise conflict resolution. Suppose a
record is updated with value 10 by a subsA and the same record gets updated to
5 by subsB and also the publisher updates it to 15 now changes from both the
subscribers will be merged followed by the publisher’s change. In this case the
publisher’s change wins. For changes involving only the subscribers we need to
have a conflict resolution in place, example it has been configured that for
only subscriber’s changes SubsA wins.
To
see the merge replication setup configurations please see video below.
Conclusion
While a
subscription is configured they are either created as PULL or PUSH
subscriptions. Push subscription means the control for syncing the changes
falls on the shoulders of the centralized distributor. This configurations
helps when the intentions is receive the changes whenever there is a change on
the publisher side. Moreover since this is a centralized way it’s helpful as
all the subscribers to be updated by the distributor itself.
Pull subscriptions
are designed to be triggered from the subscriber’s side. The subscriber pulls
the data from the publisher on need basis or can conveniently schedule the job
to run on its own discretion without depending on the publisher. This is also a
good way of doing things for system which are not constantly connected to the
system. Hence it helps do away from unnecessary push jobs even when the
subscriber is not connected. Rather whenever the subscriber gets connected to
the system it simply pulls the data and syncs up.
For either
of the subscriptions there are few involved agent jobs which take care of
syncing things up. They are,
Log
reader Agent: The
log reader agent sits on the distribution server with the entrusted job to
constantly monitor the transactions logs of the published databases which uses
this distributor. Whenever there a transaction happens on the database which
acts as the publisher, the log reader agent stores the transaction in its
Distribution system database and further these changes are forwarded to the
respective subscribers via distribution agent or merge agent.
Distribution
Agent: The
distribution agent is responsible for moving the stored transaction from the
distribution server to the subscribers.
Snapshot
Agent: For any replication
type this agent is responsible for copying the initial snapshot (i.e. complete
schema and data) of the publication from the publisher to the subscribers. This
forms the base after which the depending upon the kind of replication
configured further changes are copied to the subscribers.
Merger
Agent:This agent is
responsible for syncing all the changes across the subscribers and publisher.
Queue
Reader Agent: This
agent is used for updating publishers/subscribers replication where all the
changes are queued up at the distributor’s end and are reapplied on the
publications.
These are
the SQL Server agent job which gets created upon configuring replication. This
is a sincere try to explain replication briefly and clearly. Your comments on
its improvement are dearly welcome.