|
OriginalGriff wrote: the vitriol is strong ...
... in this one.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
I do the same thing with my code, for similar reasons.
Real programmers use butterflies
|
|
|
|
|
Best part is, it really catches them off-guard, and if they want to follow up with a legitimate question, they actually have to put the work in. Win-win.
[Edit]
Wow. This got downvoted?
Someone took offense.
modified 2-Nov-21 16:46pm.
|
|
|
|
|
You are soooo polite!
Get me coffee and no one gets hurt!
|
|
|
|
|
Unfortunately, if you read some of the posts first, and not the "message", you get the impression it may be programming (also). Maybe the message should add "unless you're a fixture".
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food
|
|
|
|
|
That sounds very hard. I'm not sure it's possibile at all unless you use AI, an array of the fastest computers on Earth AND power the whole solution with a private nuclear plant...
Anything that is unrelated to elephants is irrelephant Anonymous
- The problem with quotes on the internet is that you can never tell if they're genuine Winston Churchill, 1944
- Never argue with a fool. Onlookers may not be able to tell the difference. Mark Twain
|
|
|
|
|
Tsk, tsk!
This is the modern age: you have to use Green Energy from renewable sources or your ex-mates will ritually disembowel your iPhone*!
* Manufactured in a Chinese factory powered by coal plants using slave labour, from non-recyclable plastics, and shipped to you in diesel powered cargo ships and bought at an inflated price to replace the one the manufacturer slowed down "to save your battery life" just before the new one launched.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
|
If you need help with that and you're an adult, then you need to rethink your career.
If you're not an adult, then yes, that's where you start. Show us how far you got with the assignment.
Bastard Programmer from Hell
"If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.
|
|
|
|
|
Holy state machines batman!
The SQL code to make this work is ridic
It took me hours including relearning some lesser used SQL commands. No weirdness like .NET in the DB
Calling this
EXEC [dbo].[Example_MatchCommentBlock] @value= N'foo /* test //* bar /* baz *//* */ fubar'
Yields this resultset (positions are 0 based, unlike most DB string pos functions)
Position Value Length
5 /* test //* bar /* baz */ 27
32 /* */ 5
It works using SELECT s over state machine tables. I'm about to make the tokenizer rendition, and then finally, the ones that don't use tables, but are just SQL procs.
Hooaah! *cracks knuckles*
Triple tier validation generation coming soon. wowza. I'm on fire today.
Real programmers use butterflies
|
|
|
|
|
Are you going to post the SQL so we can see how you did it?
|
|
|
|
|
Keep in mind this was generated by a tool, so the table names would be different for a different input specification file. The trick with this routine is filling the state tables properly
The reason it's so nasty is fetching the next UTF32 codepoint in SQL is a pain in my backside
CREATE PROCEDURE [dbo].[Example_Match] @value NVARCHAR(MAX), @symbolId INT
AS
BEGIN
DECLARE @valueEnd INT = DATALENGTH(@value)/2+1
DECLARE @index INT = 1
DECLARE @ch BIGINT
DECLARE @ch1 NCHAR
DECLARE @ch2 NCHAR
DECLARE @tch BIGINT
DECLARE @state INT = 0
DECLARE @toState INT = -1
DECLARE @accept INT = -1
DECLARE @position BIGINT = 0
DECLARE @capture NVARCHAR(MAX)
DECLARE @blockEndId INT
DECLARE @result INT = 0
DECLARE @len INT = 0
DECLARE @done INT = 0
CREATE TABLE #Results (
[Position] BIGINT NOT NULL,
[Value] NVARCHAR(MAX) NOT NULL,
[Length] INT NOT NULL
)
IF @index >= @valueEnd
BEGIN
SET @ch = -1
END
ELSE
BEGIN
SET @ch1 = SUBSTRING(@value,@index,1)
SET @ch = UNICODE(@ch1)
SET @tch = @ch - 0xd800
IF @tch < 0 SET @tch = @tch + 2147483648
IF @tch < 2048
BEGIN
SET @ch = @ch * 1024
SET @index = @index + 1
IF @index >= @valueEnd RETURN -1
SET @ch2 = SUBSTRING(@value,@index,1);
SET @ch = @ch + UNICODE(@ch2) - 0x35fdc00
END
END
WHILE @ch <> -1
BEGIN
SET @capture = N''
SET @position = @index - 1
SET @done = 0
WHILE @done = 0
BEGIN
SET @done = 1
SET @toState = -1
SELECT @toState = [dbo].[ExampleStateTransition].[ToStateId] FROM [dbo].[ExampleState] INNER JOIN [dbo].[ExampleStateTransition] ON [dbo].[ExampleState].[StateId]=[dbo].[ExampleStateTransition].[StateId] AND [dbo].[ExampleState].[SymbolId]=[dbo].[ExampleStateTransition].[SymbolId] AND [dbo].[ExampleStateTransition].[BlockEndId]=-1 WHERE [dbo].[ExampleState].[SymbolId]=@symbolId AND [dbo].[ExampleState].[StateId]=@state AND [dbo].[ExampleState].[BlockEndId] = -1 AND @ch BETWEEN [dbo].[ExampleStateTransition].[Min] AND [dbo].[ExampleStateTransition].[Max]
IF @toState <> -1
BEGIN
SET @done = 0
SET @state = @toState;
SET @capture = @capture + @ch1
IF @tch < 2048 SET @capture = @capture + @ch2
SET @index = @index + 1
IF @index >= @valueEnd
BEGIN
SET @ch = -1
SET @done = 1
END
ELSE
BEGIN
SET @ch1 = SUBSTRING(@value,@index,1)
SET @ch = UNICODE(@ch1)
SET @tch = @ch - 0xd800
IF @tch < 0 SET @tch = @tch + 2147483648
IF @tch < 2048
BEGIN
SET @ch = @ch * 1024
SET @index = @index + 1
IF @index >= @valueEnd RETURN -1
SET @ch2 = SUBSTRING(@value,@index,1);
SET @ch = @ch + UNICODE(@ch2) - 0x35fdc00
END
END
END
END
SET @accept = -1
SELECT @accept = [dbo].[ExampleState].[SymbolId] FROM [dbo].[ExampleState] WHERE [dbo].[ExampleState].[SymbolId] = @symbolId AND [dbo].[ExampleState].[StateId] = @state AND [dbo].[ExampleState].[BlockEndId] = -1 AND [dbo].[ExampleState].[Accepts]=1
IF @accept <> -1
BEGIN
SELECT TOP 1 @blockEndId = [dbo].[ExampleState].[BlockEndId] FROM [dbo].[ExampleState] WHERE [dbo].[ExampleState].[SymbolId]=@symbolId AND [dbo].[ExampleState].[BlockEndId] <> -1
IF @blockEndId <> -1
BEGIN
SET @result = 0
SET @state = 0
WHILE @ch <> -1
BEGIN
SET @done = 0
WHILE @done = 0
BEGIN
SET @done = 1
SET @toState = -1
SELECT @toState = [dbo].[ExampleStateTransition].[ToStateId] FROM [dbo].[ExampleState] INNER JOIN [dbo].[ExampleStateTransition] ON [dbo].[ExampleState].[StateId]=[dbo].[ExampleStateTransition].[StateId] AND [dbo].[ExampleState].[SymbolId]=[dbo].[ExampleStateTransition].[SymbolId] AND [dbo].[ExampleStateTransition].[BlockEndId]=@blockEndId WHERE [dbo].[ExampleState].[SymbolId]=@symbolId AND [dbo].[ExampleState].[StateId]=@state AND [dbo].[ExampleState].[BlockEndId] = @blockEndId AND @ch BETWEEN [dbo].[ExampleStateTransition].[Min] AND [dbo].[ExampleStateTransition].[Max]
IF @toState <> -1
BEGIN
SET @done = 0
SET @state = @toState
SET @capture = @capture + @ch1
IF @tch < 2048 SET @capture = @capture + @ch2
SET @index = @index + 1
IF @index >= @valueEnd
BEGIN
SET @ch = -1
SET @done = 1
END
ELSE
BEGIN
SET @ch1 = SUBSTRING(@value,@index,1)
SET @ch = UNICODE(@ch1)
SET @tch = @ch - 0xd800
IF @tch < 0 SET @tch = @tch + 2147483648
IF @tch < 2048
BEGIN
SET @ch = @ch * 1024
SET @index = @index + 1
IF @index >= @valueEnd RETURN -1
SET @ch2 = SUBSTRING(@value,@index,1);
SET @ch = @ch + UNICODE(@ch2) - 0x35fdc00
END
END
END
END
SET @accept = -1
SELECT @accept = [dbo].[ExampleState].[SymbolId] FROM [dbo].[ExampleState] WHERE [dbo].[ExampleState].[SymbolId] = @symbolId AND [dbo].[ExampleState].[StateId] = @state AND [dbo].[ExampleState].[BlockEndId] = @blockEndId AND [dbo].[ExampleState].[Accepts]=1
IF @accept <> -1
BEGIN
INSERT INTO #Results SELECT @position AS [Position], @capture AS [Value], DATALENGTH(@capture)/2 as [Length]
SET @state = 0
BREAK
END
ELSE
BEGIN
SET @capture = @capture + @ch1
IF @tch < 2048 SET @capture = @capture + @ch2
SET @index = @index + 1
IF @index >= @valueEnd
BEGIN
SET @ch = -1
SET @done = 1
END
ELSE
BEGIN
SET @ch1 = SUBSTRING(@value,@index,1)
SET @ch = UNICODE(@ch1)
SET @tch = @ch - 0xd800
IF @tch < 0 SET @tch = @tch + 2147483648
IF @tch < 2048
BEGIN
SET @ch = @ch * 1024
SET @index = @index + 1
IF @index >= @valueEnd RETURN -1
SET @ch2 = SUBSTRING(@value,@index,1);
SET @ch = @ch + UNICODE(@ch2) - 0x35fdc00
END
END
END
SET @state = 0
END
SET @state = 0
CONTINUE
END
ELSE
BEGIN
SET @len = DATALENGTH(@capture)/2
IF(@len>0) INSERT INTO #Results SELECT @position AS [Position], @capture AS [Value], @len as [Length]
END
END
SET @index = @index + 1
IF @index >= @valueEnd
BEGIN
SET @ch = -1
SET @done = 1
END
ELSE
BEGIN
SET @ch1 = SUBSTRING(@value,@index,1)
SET @ch = UNICODE(@ch1)
SET @tch = @ch - 0xd800
IF @tch < 0 SET @tch = @tch + 2147483648
IF @tch < 2048
BEGIN
SET @ch = @ch * 1024
SET @index = @index + 1
IF @index >= @valueEnd RETURN -1
SET @ch2 = SUBSTRING(@value,@index,1);
SET @ch = @ch + UNICODE(@ch2) - 0x35fdc00
END
END
END
SELECT * FROM #Results
DROP TABLE #Results
END
Real programmers use butterflies
|
|
|
|
|
Better to post an article, so it doesn't get lost in the limbo of the lounge past pages...
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
I will be. The code is half done. I just got it tokenizing using tables in SQL! I'll be putting in non-table versions (compiled into stored procs) and then I'll update my Reggie article with it.
Gosh, when this is done, it won't be too hard to build parsers into SQL. Talk about normalizing content - give it a grammar and it can take your text fields and parse them into trees. That's actually pretty useful when you need to submit complicated content. You could easily make the DB normalize JSON values with such a beast. That's the next thing after Reggie: Norm - the data normalizing parser for C#, databases and beyond
Real programmers use butterflies
|
|
|
|
|
You should permalink this thread and ask @Sean-Ewington or @Chris-Maunder to bring some conversations to the message board of the article once published. There are interesting comments in them.
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
Argh this code is a bear. I had the C# stuff working right, and then I tuned the database code to match it. Then I decided to add AbsolutePosition and AbsoluteLength properties so you could ALSO get the position and length in native characters instead of just UTF32 codepoints (it matters)
Then as I added that code, my positioning code started to break. I added a couple of minor hacks that hopefully evened it out, and then my error handling code isn't quite working right in any of them anymore.
For a run of errors like "...." I should be reporting 1 error if "...." instead of 4 errors of "." each
I'm not sure where things went south, as the C# version worked at one point.
Real programmers use butterflies
|
|
|
|
|
We can certainly move this thread to the article but I think the article should start fresh
cheers
Chris Maunder
|
|
|
|
|
Sorry. I get carried away with this stuff sometimes. My hubby hates it.
Real programmers use butterflies
|
|
|
|
|
Never, ever apologise for getting carried away with code!
It's awesome. Truly awesome.
cheers
Chris Maunder
|
|
|
|
|
I'm dying to make this update to Reggie. It's worth at least two articles - one for the SQL targeting alone.
It's just this bloody error handling, and then backporting any changes I make to this C# code to the relevant templates used to generate it (both for it and for SQL, and there are two implementations for each target - one for tables and one for compiled - meaning 2x2 = 4 different places i need to alter the code templates)
Even with my tools to ease maintenance this project is getting a little too big for me.
Just wait til i make your RDBMS normalize structured text like JSON or XML or submitted to stored procedures. Parsing's coming text, once I have a good tokenizer. The parser's called Norm, because it's my data "normalizer"
Eventually I intend to target JS, C++, python, PHP and maybe Java or something, but it depends if I can get any help.
Reggie and Norm will make triple tier validation for all kinds of content possible, and then also so much more than that.
Real programmers use butterflies
modified 31-Oct-21 13:19pm.
|
|
|
|
|
Python, eh?
Keep me posted.
cheers
Chris Maunder
|
|
|
|
|
Oh you know I will. I don't even like python but a lot of people do so I figure it's probably worth targeting, so it's worth teaching myself a little more of it - right now I can read it but not write it.
I think in the end what I want is something you can use to generate validation code for any kind of middleware platform, as well as front ends and back ends.
Real programmers use butterflies
|
|
|
|
|
So, um, probably a dumb question. Why not use SQL's built in regex capability?
SELECT * FROM #Sample WHERE Field LIKE '%[^a-z0-9 .]%'
|
|
|
|
|
It's not full regex, and works very badly, plus its practical unicode support (such as having character classes for letters and numbers) is dodgy if it even has it at all.
It's not really regex. LIKE is simple pattern matching closer to glorified dos filename wildcards than anything regex-like - though I haven't checked if they've improved on it since say SQL2000
Also with this tool, you can have a single spec file that has the same regex's for your C# and SQL code (also potentially other targets like JS)
Real programmers use butterflies
|
|
|
|
|