Click here to Skip to main content
15,885,917 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
My team and I have been tasked with re-writing a VB6 application to .NET. The current logic takes a fixed file, performs some validation on a record, and then loads a hierarchy of database table’s one record at a time. This is incredibly slow. It currently takes 30 minutes to load 45k records. There are a couple of design issues we need to overcome. The first is how to bulk load the database. In the illustration below, each fixed file has a file header and file footer record; a batch header and batch footer, and one to many detail records:

File Header
Batch Header
Detail
Detail
:
Detail
Detail
Batch Footer
File Footer

The second challenge is dynamic input file formats. Currently, we process 1 fixed file format. We want to be able to process at least three other file formats (2 fixed and 1 XML). What management wants is the file format to be recognized from its name (i.e. file100f.txt or fileXML.txt) and the appropriate "format" be pulled from a database table. That way in the future if we need to change the format or add a new one, its a simple database change. I'm thinking solutions to both issues are interconnected.
Any help would be appreciated.
Max W.
Posted

And people mock COBOL - which is really good at things like this :-)

Have you thought about (in the case of the fixed file formats) uploading them as 'raw' ie to staging table in the DB and then using DB tools to load your more permanent hierarchy ? (its hard to know where your bottleneck is on the info given)

I've played a lot with external file format definitions over the years, and these days, I'd be more inclined to use something like BizTalk to handle the ETL stuff - else you're going to end up writing a system that looks like BizTalk anyway

[updated]
I'll add some more thoughts to this - one is, have a look at http://www.unidex.com/overview.htm[^] - if you had a pre-step that translated your xml files to 'flat' files then you could treat them the same - a description format like xflat's definitions can be used to have the 'definition' of the files ... Ive seen (even had) a book on this sort of thing, using 'like' an xsd descriptor file to process full EDI files - I'll see if I can find the name

[update 2]
See if you can find a copy of (no eBook sorry) "Using XML with Legacy Business Applications By Michael C. Rawlins" if you adopted a pipeline processing approach, converting all xml files to 'flat', then you can have the same upload process - then you just need to work out speeding up the load to your hierarchy - maybe you just need to look at transactions etc
 
Share this answer
 
v3
LOL. I started my career as a COBOL programmer a long time ago. I actually did think of that solution and I'm glad you brought it up. Your response legitimizes to me someone else thought of that as a solution.

So my initial solution builds each record with Temporary reference numbers. File has a reference number which is carried by each Batch. In turn each Batch has a temporary number that is carried by each detail record. And for good measure, each Detail record has a temporary record id.

In this way, I don't carry the hierarchy in classes contained within classes. I load them up into an XML message and then push the entire file up to the database where a Stored procedure does the inserts based on the temp ID's and assigns the permanent ID's.

Is there a better way though? How does Entity Framework apply "updates"? Can I build the hierarchy and then say update? Any other thoughts?

Thank you for the response
 
Share this answer
 
Comments
Garth J Lancaster 22-Aug-14 20:32pm    
There's probably 'lots of ways' :-) but I don't get to spend time on Entity Frameworks, so I cant vouch for them, except to say that you'd be going from file -> object hierarchy -> db, and possibly adding another potential bottleneck - depending on what your DB is doing the rest of the , doing 'some of the work' in the DB layer may make sense which is where I was heading - and if you don't need the data 'real time', you could schedule the grunt stuff for overnight/when realtime daily work/demands on the db are low ... you'll see I added a couple of updates to my original post btw

last thought - if you can get the files into a suitable state for SSIS or a DB batch loader to use, then you'll upload them quicker into 'staging'
Garth J Lancaster 22-Aug-14 20:37pm    
god I hate my brain sometimes .. you should of course, design a logging process, so you can track your files from entering your company/system, to the final process - so you know where they are each step of the way, in case something goes wrong - we wrote log records to a db table from all the processes in the pipeline and then had a simple html front-end & email process so we could track files - also good for creating reports for mgmt. on throughput, cross checking files you expect (or identifying files that should have been sent but didn't arrive etc)
Member 11030334 22-Aug-14 21:58pm    
Thank you Garth for all the advise. We'll go ahead and put some file tracking in there.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900