I have an odd problem and google hasn't been all that helpful so far.
I have a project that I've been working on for my job. Essentially I am re-writing old school VB6 code into more modern C# (standard etc). I can't give too much in the way of details but the short of my issue is. The original code ingests a bunch of different excel files, imports the data and does stuff with it.
Unfortunately this data is notoriously unreliable. Fields will be missing or misspelled. Data will be invalid or missing in individual cells. Two or three very similarly named fields will be included (and naturally only one is valid). The format has also changed over time when new fields have been added or old fields removed. Usually without anyone letting us know ahead of time.
It's damned aggravating but unfortunately the entity giving us the input data isn't going to start giving us decent consistent data any time soon. This causes a lot of problems deep in the code when something comes up that wasn't planned for.
I can easily read the data using technologies such as EPPLUS and Dapper. But what I'd like to do is be able to look over the spreadsheet BEFORE it gets into the system and validate the contents of the spreadsheet against what we're expecting. That way if the spread sheet contains bad data we can refuse it and preferably prompt the user to let them know what they need to fix.
Presumably to do this, there would need to be some sort of schema describing what the data should look like. But the forms of these excel documents are varied. I'm wondering if there are any libraries available out there that already do this as I'd rather not have to build my own system from scratch.
Has anyone done anything like this?
What I have tried:
EPPLUS and Dapper along with various extensions.