This week I very quickly put together a simply database of some fifty tables and about three hundred columns all told, with every table having at least one foreign key. Then I noticed some spelling mistakes and or typos in some table and column names, and this inspired me to develop a tool that can perform a spelling check on all identifiers used in all database objects.
This is no mean feat. My rought draft strategy is to
1. Build a list of all identifiers in the database. To start with, I will limit this to table, column, view, and key and index names.
2. Use patterns and heuristics to split identifiers into word parts and ignore parts such as 'FK_' etc.
3. Check spellings of split identifiers, and those not split, and mark possible errors.
4. Where an error is e.g. found and corrected in a column name, correct all other occurrences of that column name in the DB.
I know none of these is simple, but I think this could make an interesting hobby project and an article or two, if I start simple and incrementally make the app more intelligent.
What warnings, criticisms, suggestions, or general comments do you have?
Lobster Thermidor aux crevettes with a Mornay sauce, served in a Provençale manner with shallots and aubergines, garnished with truffle pate, brandy and a fried egg on top and Spam - Monty Python Spam Sketch
I'll be using a set of algorithms to split names in various ways, hopefully into separate English words, as my example above would do quite nicely. Then I'll have a configurable set of standard non- or semi-English tokens that can be checked for deviation from their definitions, and good old ignore lists.
I would check for consistency first -- if a token is repeated many times, then it is probably what you meant and a token with a small Levenshtein Distance from it is probably misspelled.
On one project I decided that (by golly) I was going to spell "threshold" with three Hs (threshhold), as it's pronounced! I later changed it back, but before doing so I'd want the dozen "threshhold"s to outweigh an outlying "threshold".
Likewise, an incorrectly named item could be a valid word.
I've got a query going out to a Cache database via ODBC.
It almost always works (90% or so).
The other 10% of the time, it returns the wrong data for one of the columns (a LONGVARCHAR column)
By wrong, I mean the data is from another row.
At first, I though maybe there was too much text in the column and it was trashing a buffer or something, so I cahange my parameter from a VarChar to Text, but it made no difference. I just retried it again now without specifying the datatype at all, and it made no difference.
Further investigation showed that the correct data was only 1600 characters, so that isn't going to be the problem, anyway.
I've even cleared the parameters for each loop of the read and it still does it.
I just now tried to do a select specifically on one of the bad rows and it was still bad, so ti has to be something going on with ODBC or the remote server. Right?
Usually when I have problems like this, I first try and eliminate something like ODBC. Is there any way to connect natively to the database. Preferably by running something on the server that is hosting the database? If you can do something like that, it will help to eliminate whether the database itself is doing something unusual or unpredictable. As far as ODBC goes, can you try other drivers and duplicate the issue still?
I am Canadian. [heard in a local bar]
In theory there is no difference between theory and practice. In practice there is. [Yogi Berra]
posting about Crystal Reports here is like discussing gay marriage on a catholic church’s website.[Nishant Sivakumar]
Could you post the query? Be aware that Cache ::pinch of salt over the shoulder:: isn't SQL-92 compliant and the worst thing (in my opinion) is that it doesn't support operator precedence and that could be a factor here.
When I have used it (against my will) I used an ADO.net connector when I could, but prod used ODBC. I don't recall any differences.