|
The way spirit is mixed with drug I gather. (6)
"I didn't mention the bats - he'd see them soon enough" - Hunter S Thompson - RIP
|
|
|
|
|
I feel like Structured Query Language was designed by someone who felt like everything should be *work* or it's not worth the effort.
The syntax is inexplicable. Not so much designed by committee as designed by a committee on a bender in 'Vegas. It's impossible to remember it without using it every day, and that possibility makes me want to bash my head bloody on the nearest hard surface.
Plus just doing simple things you'd be able to expect to do in other languages like type conversion or heaven forbid, string manipulation is like pulling teeth.
Now I know there are reasons for *some* of this, but most of it just seems like it was a result of throwing things at a wall to see what sticks.
Real programmers use butterflies
|
|
|
|
|
You just uncovered a deep truth. But shush, there are those who would burn you at the stake for casting light on the shadow of the arcane arts! ....
|
|
|
|
|
My next article will give SQL people fits, I'm sure.
Real programmers use butterflies
|
|
|
|
|
Unless it gives you the fits first.
Doing advanced string manipulation in SQL-Server is a sure sign of masochism.
|
|
|
|
|
That's why I'm writing a tool to generate SQL for the regex matching. I at least only have to write it once. =)
Real programmers use butterflies
|
|
|
|
|
In many cases you won't need to write it at all.
Oracle, Postgres, DB2 and even MySQL have full support for Regex. For the big ones it's basically SQL-Server that's the exception.
And for SQL-Server I wrote a CLR-function to take care of that. (REGEXP_INSTR, REGEXP_REPLACE and REGEXP_SUBSTR like on most other implementations)
And yes, you can write CLR also for Linux, since Linux work fine with .Net Core.
|
|
|
|
|
But can they tokenize? Most regular expression engines leave that out.
Also, is their regex a different flavor for every DB vendor? If so, they may as well not have it at all, because for real world regular expressions translating one form of one to another reliably is a bear.
The code is nasty. I've tried. It's honestly easier just to generate matching code that is consistent across platforms.
Real programmers use butterflies
|
|
|
|
|
|
It depends on what you are doing. For column validation in most cases you wouldn't need it.
However, if you intended to say, store chunks of JSON in your database (a surprisingly common practice these days) you could use a tokenizer (along with a parser generated by Norm targeting SQL) to validate or even normalize the JSON content so that invalid JSON does not get stored in the database but rather is rejected.
Typically this is done in the middleware, but such validation should in theory (where possible) take place in each of the "three tiers", which would include the database, for maximum assurance of data integrity..
Another use for it might be to allow you to search long structured text for particular things, like search an HTML document for all <B> tags and yield those as a table, which could be useful for things like CMS systems.
Real programmers use butterflies
|
|
|
|
|
Do. Not. Do. String manipulation. In. SQL.
Having to do so is a sign that you have done something wrong upstream.
You know this.
|
|
|
|
|
There's a very legitimate use case for it when you want to validate denormalized string fields like phone numbers if they're not stored numerically just as an example - in practice databases are messy in the real world often with fields that are denormalized lists of ints, json data, xml, etc.
Doing string manipulation in a database function or stored procedure allows you to reject data that comes in in an invalid format before it gets into the database.
Particularly these days, it's popular to store denormalized JSON in the database, for better or worse.
With Norm and Reggie, you can target parsing code to SQL that will allow you to validate or normalize your string data at the database level.
The reason is not for performance, but for security and data integrity. You wouldn't want to run something like this as part of a query. You primarily use it on data update where performance can be less of an issue depending on the scenario.
Real programmers use butterflies
|
|
|
|
|
But you should do that up-front, before it hits the database. Or, more likely, when populating a landing/staging table. With what I'm doing now (mostly ETL), we land the data and then go through doing stuff like that, making IP addresses binary, splitting FQDNs into their parts, checking dates, etc.
One of the things I challenge you to do in SQL is take big-endian DNS names and reformat them as little-endian FQDNs. Examples:
.com.acme.southwest.anvils -- anvils.southwest.acme.com.
.com.widgetco.sales -- sales.widgetco.com.
This is something I need to do and I use a CLR function to do it. It's more complex than these examples.
(This is not a programming question.)
|
|
|
|
|
Of course you "should" - there are a lot of things that people "should" and "shouldn't" do that don't actually happen the way they "should" in the real world.
The result is almost always a database that accepts denormalized data for some of its content.
Over a long enough timeline / large enough project this is an inevitability.
That means in an enterprise sized application of any significant age, you're dealing with denormalized data.
Furthermore, in some cases, with an RDBMS it is more efficient to accept denormalized data and normalize it in a stored proc to get results based on that. A good example is passing an array of integer cache ids so you only get particular rows from the database (such as updating a stale list) - there is simply no good (standard SQL) way to pass a small array of ints to the database, so 9 times out of 10, you'll see a procedure like this that takes a string or even a varbinary that has the data encoded it in. This is far more efficient than updating some kind of local session table you use to talk to your routines with (the other way to get a list of rows by ids)
With a tool like this you can harden that, and the bottleneck is network, not DB CPU, and not row locks (unless you're using the table based version, which I actually don't recommend)
I'm not sure about reformatting your domain names. They're easy enough to parse, but I don't know enough about the allowable formats to discern the logic for it without delving into RFCs.
Real programmers use butterflies
|
|
|
|
|
PIEBALDconsult wrote: But you should do that up-front, before it hits the database. True, we should. But in the real world, this doesn't necessarily happen. Like when dealing with a legacy Oracle system written in C++ (appears to be VS97 or VS6), where the DB design defies rational logic. Or any logic.
My queries have as many as 20 nested REGEXP_REPLACE statements to make sense out of the mess that some fields present. Ugly as sin, but it gets the job done. I'm counting the days until the replacement system goes online!
|
|
|
|
|
honey the codewitch wrote: There's a very legitimate use case for it when you want to validate denormalized string fields like phone numbers if they're not stored numerically just as an example - in practice databases are messy in the real world often with fields that are denormalized lists of ints, json data, xml, etc. In the real world, normalization means something different. I go to BCNF.
honey the codewitch wrote: Doing string manipulation in a database function or stored procedure allows you to reject data that comes in in an invalid format before it gets into the database. Means you storing it wrong. There's no format in storage.
"Normalization" might be different from your idea. Go look it up
--edit
I didn't see your name; I didn't intend to challenge you. Yet, explain me, if that is normal (ization) to you?
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
According to a cursory google search:
> Normalization or normalisation refers to a process that makes something more normal or regular.
That's what I'm referring to.
Normal, as in regular, as in consistently structured, as in i can look impose a structure on it.
Further googling it sounds like there's a technical use for it that has to do with databases specifically, and how to impose some sort of notion of a "standard" order over the mess that is an RDBMS. In practice I can see why nobody cares.
Which is probably also why I don't care about that, as nobody has ever paid me to care about it. If I can star cluster my DB and get it to perform, nobody cares about BCNF, at least they never have in my 20+ years of development.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: Which is probably also why I don't care about that, as nobody has ever paid me to care about it. If I can star cluster my DB and get it to perform, nobody cares about BCNF, at least they never have in my 20+ years of development. The fact that you had to look it up says enough
And, well, I do.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
If someone pays me to care about it I will. I promise.
Real programmers use butterflies
|
|
|
|
|
Avoid using user-defined sql functions in a sql query. It absolutly destroys the performance of the query. Just get the data and return it to the calling app, and let the app do the validation/correction (C#/C++ is much better for that than is SQL). If you want to get really fancy, you can perform the validation, and re-save the corrected data so that the next time you do the query, it will return fewer invalid records.
".45 ACP - because shooting twice is just silly" - JSOP, 2010 ----- You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010 ----- When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013
|
|
|
|
|
I didn't implement them as functions, but I could have. They are stored procedures in order to discourage their use in views and such.
This isn't for performance. This is for data integrity. Often those two things are somewhat at odds, meaning there are many situations where you have to rob Peter to pay Paul.
Real programmers use butterflies
|
|
|
|
|
I don't know SQL but loved your rant nevertheless. And a too.
|
|
|
|
|
Greg Utas wrote: I don't know SQL
You're a fortunate soul. I've worn many hats in my development lifetime. "DBA" was one of my least favorite.
Real programmers use butterflies
|
|
|
|
|
I think you just outed where in the US you're originally from with that title. Ssshhh, don't leak our secret
|
|
|
|
|
If you're gauging based on the title, I'm probably not where you think I'm from but it means what you think it does. I also use various britishism's like "whilst"
I steal vernacular from wherever. I'm a thief like that.
Real programmers use butterflies
|
|
|
|