|
This week I very quickly put together a simply database of some fifty tables and about three hundred columns all told, with every table having at least one foreign key. Then I noticed some spelling mistakes and or typos in some table and column names, and this inspired me to develop a tool that can perform a spelling check on all identifiers used in all database objects.
This is no mean feat. My rought draft strategy is to
1. Build a list of all identifiers in the database. To start with, I will limit this to table, column, view, and key and index names.
2. Use patterns and heuristics to split identifiers into word parts and ignore parts such as 'FK_' etc.
3. Check spellings of split identifiers, and those not split, and mark possible errors.
4. Where an error is e.g. found and corrected in a column name, correct all other occurrences of that column name in the DB.
I know none of these is simple, but I think this could make an interesting hobby project and an article or two, if I start simple and incrementally make the app more intelligent.
What warnings, criticisms, suggestions, or general comments do you have?
|
|
|
|
|
You are taking all the fun out of debugging. The hours of pure joy spent trying to understand a bug just because my typing skills were dyslexic are boundless. And you want to take this away!
Serious note: good idea and an article would be a good read for sure.
Chris Meech
I am Canadian. [heard in a local bar]
In theory there is no difference between theory and practice. In practice there is. [Yogi Berra]
posting about Crystal Reports here is like discussing gay marriage on a catholic church’s website.[Nishant Sivakumar]
|
|
|
|
|
I would be very happy to read such an article.
Lobster Thermidor aux crevettes with a Mornay sauce, served in a Provençale manner with shallots and aubergines, garnished with truffle pate, brandy and a fried egg on top and Spam - Monty Python Spam Sketch
|
|
|
|
|
Are you going to look into only key fields or all field names, for us dyslexics who can spell ModifiedBy in multiple ways.
All fields would actually be simpler, build a distinct from sysobjects, then do some matching and soundex comparisons might be just as useful.
It is going to totally fai on some naming conventions but for us who use sensible naming it could be good.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
I'm going to do all columns. My naming conventions nearlly always use full names, e.g.
SecondaryEducation ExamNumber
SecondaryEducation HighestLevelId
SecondaryEducation HighestLevelOther
SecondaryEducation SchoolName
SecondaryEducation EnglishProficiencyLevelId
SecondaryEducation EnglishProficiencyInstitution
SecondaryEducationLevel Id
SecondaryEducationLevel Description
I'll be using a set of algorithms to split names in various ways, hopefully into separate English words, as my example above would do quite nicely. Then I'll have a configurable set of standard non- or semi-English tokens that can be checked for deviation from their definitions, and good old ignore lists.
|
|
|
|
|
I have some vb code around somewhere that split words fairly reliably, used to make DGV headers readable before MS built it into their DGV.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
Maybe you should article it.
There, how's that for a really dirty verbing?
|
|
|
|
|
That entire phrase belongs in the nasty buzzword forum.
It would barely make a tip IIRC. I'll try and dig it up if you feel you have a need.
Never underestimate the power of human stupidity
RAH
|
|
|
|
|
I've found quite a list of splitting algorithms, so no need for you to go digging for it now, but thanks anyway.
|
|
|
|
|
Brady Kelly wrote: into separate English words
I would check for consistency first -- if a token is repeated many times, then it is probably what you meant and a token with a small Levenshtein Distance from it is probably misspelled.
On one project I decided that (by golly) I was going to spell "threshold" with three Hs (threshhold), as it's pronounced! I later changed it back, but before doing so I'd want the dozen "threshhold"s to outweigh an outlying "threshold".
Likewise, an incorrectly named item could be a valid word.
|
|
|
|
|
PIEBALDconsult wrote: I would check for consistency first -- if a token is repeated many times, then it is probably what you meant and a token with a small Levenshtein Distance from it is probably misspelled.
True, but e.g. using the designer to add foreign keys, where a misspelled column or table name is then used in the key name and DDL.
PIEBALDconsult wrote: Likewise, an incorrectly named item could be a valid word.
I plan to present a list of potential misspellings, with options to add to dictionary, ignore, propagate correction, etc. eventually.
|
|
|
|
|
Aside: I just found a table named ReportGalley rather than ReportGallery ! (The team lead was in the Navy, maybe this is something he cooked up, but it certainly is a mess.)
|
|
|
|
|
Groan!
|
|
|
|
|