|
Can some one explain to me what a Staging area is ?
And I also want to know whether it is a database, a file, or something else altogether..?
|
|
|
|
|
Here are some answers: What is a staging area? Do we need it? What is the purpose of a staging area?[^].
Right now I'm using staging tables to temporarily store data imported from an outside source before pushing it to production so that it can be verified in isolation and altered as required.
"If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.
nils illegitimus carborundum
me, me, me
|
|
|
|
|
So you are using a staging area which is in between development and production phases. You said this staging area was a set of temporary tables. Can it be a whole separate database by itself? And if it is a database how do we make it temporary(as in what is its scope)?
And whatever operations you do on the staging area do you write the code for it or generate it using code generators?
PS: Don't mind the long question?
|
|
|
|
|
No, I said the data was temporarily stored: the tables are a permanent fixture and are used to transition that data before it gets exposed in the production tables. For my purposes it's okay for the tables to reside in the prod database but that might not suit you. There is no reason it can't be a wholly separate database: that's up to you and the requirements.
All of the code is either in stored procedures and/or c#. The usual scenario is that the tables are truncated and (in my case) filtered data is dumped into them and then other processes may work on them and then the data is moved to the primary tables.
Note that this is just how our solution works for us: it may not be right for you. Use anything like this with extreme caution, especially near production data.
"If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.
nils illegitimus carborundum
me, me, me
|
|
|
|
|
Aha that gives a lot of clarity....So these staging tables ...are they exact replica of production tables or do we have to follow some separate design principles to create them?
|
|
|
|
|
In my case they are straight copies of the production table but with all indexes, etc removed. Again, that suits my purpose. May not suit yours.
"If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair.
nils illegitimus carborundum
me, me, me
|
|
|
|
|
We do it somewhat different to Mark, our staging table are exact replicas of the source data, this allows us to use Bulk Copy when loading the data. It also insures there are no errors in the Load part of the ELT process.
We then use stored procedures to do the transforms to the production format (these may also be staging tables in some solutions). The major benefit is that you are doing any transforms on a limited data set and it is easy to debug the procedure. It then becomes a simple matter to dump in the staging data to the production tables.
I find a well crafted ELT process leave any ETL application in the dust. There are some ETL apps that this may not apply to but the cost is extraordinary.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
Mycroft Holmes wrote: our staging table are exact replicas of the source data, this allows us to use
Bulk Copy when loading the data. It also insures there are no errors in the Load
part of the ELT process.
I do the same thing, works great plus if there are any questions about the source data you already have a loaded copy in a table to query against.
Common sense is admitting there is cause and effect and that you can exert some control over what you understand.
|
|
|
|
|
The second greatest scam after Y2K that the IT industry has ever perpetrated is ETL! Extract - Load - Transform simplifies and speeds up the entire process.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
Mycroft Holmes wrote: The second greatest scam
I have worked for some CIO's that have come up with way more atrocities than ETL or Y2K.
Mycroft Holmes wrote: Extract - Load - Transform simplifies and speeds up the entire process.
I think you can write bad code / processes in any language or tool. It's up to the person to find the best way to do it that works for them. Not sure about you but I have some amazingly fast SSIS packages that load a gig of data in seconds, but have seen some others that take hours doing a lot less data.
Common sense is admitting there is cause and effect and that you can exert some control over what you understand.
|
|
|
|