|
Can numbers be considered a palindrome also?
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
|
|
|
|
|
|
Maciej Los wrote: Do i have a chance to win on lottery After I down-vote your QA answers until your rep is #666, then Satan will be in touch.
«One day it will have to be officially admitted that what we have christened reality is an even greater illusion than the world of dreams.» Salvador Dali
|
|
|
|
|
Thanks for all your down-votes, Bill.
I hope that satan-signed-reputation will help me winning on lottery.
|
|
|
|
|
Since Satan has made me the lottery, I guarantee you'll win.
«One day it will have to be officially admitted that what we have christened reality is an even greater illusion than the world of dreams.» Salvador Dali
|
|
|
|
|
So when you regex capture, you can specify capture groups like
(<identifier>?[A-Za-z_][A-Za-z0-9_]*)
what i can do with that is take those capture groups and make JSON fields out of them.
So then basically you can define a JSON object using a series of capture groups in regex, which you run over some text to scrape it.
Finally, you can take that and use it to create a web scraper that presents a facade of JSON objects based on regex scrape expressions.
Anyone seen anything like this already? And is it stupid? it's very simple, so i don't know. that can be good or bad.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
honey the codewitch wrote: Anyone seen anything like this already?
Yes, I have seen and used Regex before... not sure what else you are suggesting you may have discovered here?
Also, JSON is just a string (it's basically a serialisation format). There are no fields or objects, perhaps you mean javascript objects?
|
|
|
|
|
conceptually a JSON object is
{
<fieldname>: <value>,
<fieldname>: <value>,
<fieldname>: <value>
}
A JSON array is
[
<element>,
<element>,
<element>
]
So now when I say these things you can know what I mean.
Now, furthermore when people use JSON they don't use it as a raw string, but normalized data.
My JSON engine, like every single one in existence, normalizes JSON into queryable objects.
For example, a JSON object (see above) may become an instance of a class implementing IDictionary<string, object> . (Json: A Fairly Powerful JSON Engine in a Small Package[^])
I'm sorry, I figured most people would know what I meant already.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
honey the codewitch wrote: conceptually a JSON object is
{
<fieldname>: <value>,
<fieldname>: <value>,
<fieldname>: <value>
}
No, that's not a JSON object, as the OP alluded to there isn't any such thing as a JSON object, there are simply objects (javascript objects, c# objects, it doesn't matter). The "N" in JSON is "notation" - what you posted above is how an object is represented in string form using the JSON format\standard.
|
|
|
|
|
I think you're being pedantic.
JSON[^]
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
I don't mind if you say "JSON object", I know what you mean, but when someone clarifies your terminology and you talk down to them as if it is them that doesn't understand rather than you, then...yeah, I'mma gonna get real pedantic
|
|
|
|
|
I believe they talked down to me first, but maybe i misread the situation. It's early here and I shouldn't be up.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
When working with JavaScript and JSON it's important to know that they are different. In my experience, there are a lot of people, especially those new to JavaScript, that don't realise they are different. This misunderstanding can lead to problems.
The reason I pointed it out to you is because you were speaking as if you didn't know they are different, and like I said, I think it's an important thing to be aware of... so I made you aware of it.
Also, being corrected about something isn't "talking down".
|
|
|
|
|
well like i said to F-ES Sitecore it's early here and I guess i misread the situation.
I apologize.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Fair enough. And don't worry, I wasn't offended, you are entitled to defend yourself as and when you feel the need to.
|
|
|
|
|
I'm glad you weren't. Don't mind me, I shouldn't even be up right now. Meh.
(I'm a radiohead fan myself, but i also like muse. *hides*)
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
honey the codewitch wrote: And is it stupid?
Well, it depends what you are trying to do with it. Regexes aren't exactly brilliant performance wise: Counting Lines in a String[^] - which makes sense when you think just how "general purpose" they are.
I'd suspect that a specific-to-JSON solution would be a damn sight more efficient.
Sent from my Amstrad PC 1640
Never throw anything away, Griff
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
if I'm scraping a web page I'm worried about network performance, not regex performance
short of being on a SAN, which there'd be no reason to scrape except legacy integration, the network IO will outshadow any potential Regex performance issues by a large margin.
So I'm not worried about that.
Adding: If it really became an issue I could switch over to a non-backtracking engine like the one I wrote in C#
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
modified 20-Sep-19 8:50am.
|
|
|
|
|
I think there's a huge variation in performance between different regex engines.
Yes, it's never going to be lightning because of all the backward/forward matching going on but it can be a damned sight quicker than .NET would make it seem!
Whenever you find yourself on the side of the majority, it is time to pause and reflect. - Mark Twain
|
|
|
|
|
|
ah, cool, so someone has done it before, just not quite the same way.
I think my solution is simpler. They're munging the objects using python.
I want to make it so you can define JSON objects with just the regex. It will use nested group catures to build the JSON hierarchy.
Different enough to satisfy me that it's worth it.
Thanks for the link.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
Can it go arbitrarily deep?
And remember that .net's Regular Expression engine is far richer than most, so you may not be describing a generally-applicable technique.
I've been working at loading data from JSON files to SQL Server for only about a year now. I convert JSON to XML on-the-fly and pass the XML elements to SQL Server for further processing and storage -- using SQL Server's built-in XML functions.
In my situation, it's all about getting data from file to table as quickly as possible (with an eye toward not hogging resources) and I never need to have all the objects in memory at once.
Like you, IO seems to be the main bottleneck, with writing to the database being slower than reading from the disk.
|
|
|
|
|
I'm targeting .NET so I'm not worried about it. If I were to port it to anything it would be something that at least used PCRE which is about as rich as .NET's regex, IIRC
But yeah, I'm looking at going arbitrarily deep. If I can't do it using nested group captures I'll do it by allowing you to define a pseudo-JSON document where each of the values is a regex expression instead of an actual value.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|
|
What problem does this solve? I am sure people are scraping websites just fine based looking at posts on freelance work websites.
"It is easy to decipher extraterrestrial signals after deciphering Javascript and VB6 themselves.", ISanti[ ^]
|
|
|
|
|
It would present a JSON based facade you can apply over a non-webservice, traditional website, so you can basically front the website with a JSON REST service using some regex.
So say I declare some regex captures over Wikipedia so I can scrape encyclopedia information with search queries.
That site is then exposed as a REST service that I can use as though it was designed for that.
That's the basic idea anyway. I'm simplifying here as much as I can - in truth it might not scale to complexity. I'm still toying with the idea.
one of the things you can do with it is expose it to the client browsers so they can query that way, or you can handle it using JSON processing on the server, or whatever. maybe insert it into a mongoDB or something. There's lots of potential use cases out there, I'd think.
When I was growin' up, I was the smartest kid I knew. Maybe that was just because I didn't know that many kids. All I know is now I feel the opposite.
|
|
|
|