I'm attempting to use Regex to parse downloaded strings of text which are recipes. The ultimate goal is to feed a recipe into a database component by component such that they can later be randomly selected for a, say, weekly menu and then generate a overall required ingredients list (a shopping list) by summing the quantities from like ingredients from all the selected recipes...
My specific problem here is capturing blocks of text within a file containing one or more recipes in such a way as to, first, extract blocks of text for further parsing.
These would be things like, the Title, Meal Type (e.g. Breakfast, Snack), Nutritional Qualities (e.g. Calories, Carbs etc), Directions and, of course Ingredients. Starting from the very beginning of a set of recipes in a downloaded 7-day diet menu the first block of text I want to find is between "Day" and "XXX". I know this specific word search will have to be improved later, but if I can get this example working, I believe I'll be able to handle any such issues that follow.
I've been researching the web for a couple days now since I'm always convinced someone has probably asked the same or a similar question before... But I just can't find anything that gets me all the way there.
I've used Regex before on smaller tasks, but I don't consider myself to be fluent (or even "good") with it. And I've also done the requisite study on a couple of very good Regex Reference/Tutorial web sites. But without success. Please note, I'm using VB.Net.
What I have tried:
Here's an example of an input string. This obviously is greatly shortened for use here, but it is representative of the overall problem.
"
Day 1 3479 calories • 42g carbs (15g fiber) • 272g fat • 216g protein
BREAKFAST1144 calories • 3g carbs (0g fiber) • 98g fat • 60g protein
Bun-less Egg Sandwich
Ingredients:
1 1/2 tbsp Butter (21 g)
3 Egg (150 g)
III
Directions:
Heat a nonstick pan over low heat and brush butter around the pan.
Bla Bla Bla
DDD
XXX
Day 2 3560 calories • 48g carbs (13g fiber) • 269g fat • 239g protein
BREAKFAST1107 calories • 11g carbs (2g fiber) • 90g fat • 61g protein
Mushroom and Cheddar Omelette
Ingredients:
1 1/2 tbsp Butter (21 g)
2 Slices Cheddar Cheese
3 Egg (150 g)
III
Directions:
Heat a nonstick pan over medium heat and let butter melt until bubbling.
Bla Bla Bla
DDD
XXX
"
The desired result from Regex would be two matches; one containing all of Day 1 and a second containing all of Day 2. Note here, I've manually inserted "XXX" as a delimiter for clarity in the example and because the format, block order, terminology etc of other downloaded Recipes will likely be indeterminate, or at least very hard to specify. You'll also see "DDD" and "III" inserted as future helper delimiters, which may be avoided with more elegant programming...TBD.
Here are several of the unsuccessful Regex's I've tried.
Day[\u0000-\uFFFF.*?]+XXX - 1 Match
\ADay[\u0000-\uFFFF.*?]+XXX - 1 Match
\ADay[\u0000-\uFFFF.*?]+\ZXXX - No Match
I've also employed every combination of Multi-Line, Single Line, Global Mode On and Off. I've done this out of frustration and it isn't clear that any help, or hurt the result...
The best I've come up with is just one match that captures everything from "Day" to ~The Last~ "XXX." In other words, it finds the entire string/file, skipping over (this what I can't solve) the "XXX" in the interior of the string instead of using it to identify its first match and then moving on to find the next "Day". I've tried everything that would seem to be helpful; but either I get "No Match" or 1 match.
BTW, I use unicode characters in the 'capture everything' token because of the " • " character and because, I'll never know what characters may be contained in other recipes I may download. After more elemental parsing I may yet have to deal with this in a different way, but again first things first.
Any Help would be greatly appreciated and hopefully instructive to me and others.
Thanks in advance