Click here to Skip to main content
15,887,214 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have the list string with some tags is defined, I want to count how many total element in each tag.

<L1 //2 element
 <L1 //1 element
  <H1 content> 
 > 
 <L1 //3 element
  <H2 content> 
  <P content>
  <L1 //1 element
   <H3 content>
  >
 >
>


This is my snipped code C#

var str = "<L1\r\n <L1\r\n  <H1 content> \r\n > \r\n <L1\r\n  <H2 content> \r\n  <P content>\r\n  <L1\r\n   <H3 content>\r\n  >\r\n >\r\n>";
   var list = str.Split(new string[] { "\r\n" }, StringSplitOptions.None);
   var array_num = new List<string>();
   for (int i = 0; i < list.Length; i++)
   {
       if (list[i].Contains("<L1"))
       {
           int ElementNum = 0;
           for (int Lindex = i + 1; Lindex <= list.Length; Lindex++)
           {
               int end = list[Lindex].IndexOf(">");
               int start = list[Lindex].IndexOf("<");
               int def = list[i].IndexOf("<L1");
               if (end == def)
               {
                   break;
               }
               if (start == def + 2)
               {
                   ElementNum = ElementNum + 1;
               }
           }
           array_num.Add("index :" + i.ToString() + " have element: " + ElementNum.ToString());
       }
   }


But when i run it and get the result not expected, I can get direct position started of tag but the result array_num get content 4,0,1,0 is wrong . The result is correct should be 2,1,3,1 . So, any idea for this one, thank all.

What I have tried:

I try spent more time to debug or find some idea. but still not yet until now.
Posted
Updated 6-Jul-23 21:37pm
Comments
PIEBALDconsult 6-Jul-23 10:27am    
RegularExpressions.
headshot9x 9-Jul-23 2:57am    
So, look like other idea

After debug line by line, i try this below, but the result still not expected .But i'm not sure my code working well if more example case.

start:0 end: 11 count: 2 //correct
start:1 end: 3 count: 1 //correct
start:4 end: 11 count: 1 //incorrect
start:7 end: 9 count: 1 //correct


var list = str.Split(new string[] { "\r\n" }, StringSplitOptions.None);
var array_num = new List<string>();
int startpos = 0, endpos = 0, total = 0, newstartpos = 0;
bool newtag = false;
for (int i = 0; i < list.Length; i++)
{
    if (list[i].Trim() == "<L1")
    {
        startpos = i;
        for (int Lindex = i + 1; Lindex < list.Length ; Lindex++)
        {
            var item = list[Lindex].Trim().ToString();
            if (list[Lindex].Trim().StartsWith("<L1") && list[Lindex].Trim().EndsWith(">"))
            {
                total += 1;
            }
            if (list[Lindex].Trim() == "<L1")
            {
                total += 2;
                newstartpos = Lindex;
                newtag = true;
            }
            if (list[Lindex].Trim() == ">" && newstartpos != 0)
            {
                total -= 1;
                endpos = Lindex;
                newtag = false;
            }
            if (list[Lindex].Trim().StartsWith("<") && list[Lindex].Trim().EndsWith(">") && !newtag)
            {
                total += 1;
            }
            if (list[Lindex].Trim() == ">" && newstartpos == 0)
            {
                endpos = Lindex;
                break;
            }
        }
        array_num.Add("start: " + startpos + " end: " + endpos + " count: " + total);
        startpos = 0;
        endpos = 0;
        total = 0;
        newstartpos = 0;
        newtag = false;
    }
}
 
Share this answer
 
Comments
OriginalGriff 7-Jul-23 3:38am    
Normally, I wouldn't post a second solution, but in this case ... see #3
That's because you only count and process lines containing "<L1" - The other elements in your sample start with "<H" or <"P" so they are ignored.

Quote:
Really, it's mean you tried it before. But the results like that or not?
No, but I just tried it: it's not a complex task.
Create a class to hold tag info:
C#
protected sealed class TagItem
    {
    public int StartIndex { get; set; }
    public int EndIndex { get; set; }
    public int TagsCount { get; set; }
    public TagItem Parent { get; set; }
    public override string ToString()
        {
        return $"Start   : {StartIndex}\nEnd     : {EndIndex}\nContains: {TagsCount}";
        }
    }
Then create a List to hold them as you find them and a "current tag" variable so you know what a new tag belongs to:
C#
List<TagItem> tags = new List<TagItem>();
TagItem currentTag = null;
Then look at each character in the input string (no need to Split it!) looking for start and end tags.

For a start, count it on the current tag (if there is one), create a new TagItem with the character index and current tag as its parent, then add that to the List and set the new current tag.

For an end, set the end index on the current tag, and set it back to it's parent.

After the loop, print counts for all the non-zero tags:
C#
Console.WriteLine(string.Join(",", tags.Where(t => t.TagsCount > 0).Select(t => t.TagsCount)));
5 minutes of typing, or so?
Result: "2,1,3,1"
 
Share this answer
 
v4
Comments
headshot9x 6-Jul-23 8:22am    
are you try with my code ? it's loop all elements .
OriginalGriff 6-Jul-23 8:35am    
Your code is pretty nasty - I'd use a single pass through the input string with no nested loops and build up a tree structure of all tag elements with their counts as I went. Then a single pass through that collection and you print all the results.

It's a lot clearer what is going on, and doesn't need your IndexOf calls or nested loops at all.
headshot9x 6-Jul-23 8:44am    
Really, it's mean you tried it before. But the results like that or not?
OriginalGriff 6-Jul-23 9:01am    
Answer updated.
headshot9x 6-Jul-23 9:09am    
Hm. now i got your point, we need some middle class keep elements. But it can be in loop, no need other class ?. This is my concern.
Seriously, the way I suggested is way, way easier!
C#
        protected sealed class TagItem
            {
            public int StartIndex { get; set; }
            public int EndIndex { get; set; }
            public int TagsCount { get; set; }
            public TagItem Parent { get; set; }
            public override string ToString()
                {
                return $"Start   : {StartIndex}\nEnd     : {EndIndex}\nContains: {TagsCount}";
                }
            }
        /// <summary>
        /// Called once when form displayed for first time
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void FrmMain_Shown(object sender, EventArgs epp)
            {
            string str = "<L1\r\n <L1\r\n  <H1 content> \r\n > \r\n <L1\r\n  <H2 content> \r\n  <P content>\r\n  <L1\r\n   <H3 content>\r\n  >\r\n >\r\n>";
            List<TagItem> tags = new List<TagItem>();
            TagItem currentTag = null;   
            for (int i = 0; i < str.Length; i++)
                {
                if (str[i] == '<')
                    {
                    if (currentTag != null) currentTag.TagsCount++;
... 2 lines removed
                    }
                else if (str[i] == '>')
                    {
                    if (currentTag == null) throw new ArgumentException($"Input contains mismatched tags: unexpected \">\" at position {i}");
... 2 lines removed
                    }
                }
            Console.WriteLine(string.Join(",", tags.Where(t => t.TagsCount > 0).Select(t => t.TagsCount)));
            }
Now, you can't just hand that in - I removed a few lines of code - but you are really struggling with the way you are doing it because it's a clumsy way and you keep hitting it with a hammer to bend it around another edge case! Ask yourself this: "Why am I at all concerned with lines of text? Do they actually matter?"
If the answer is "No, they don't matter at all, but I thought they did and now I'm stuck with them" then think about throwing your existing way of doing it away and start again, with a plan for the end result in mind to start with.

Sometimes, the hardest thing to do is admit you made a mistake and the whole approach is flawed - but we all do it, and it's always a wrench to discard work we laboured on. You do get a "better" solution though!
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900