How to get total element tag in C#?

Question

0.00/5 (No votes)

See more:

I have the list string with some tags is defined, I want to count how many total element in each tag.

<L1 //2 element
 <L1 //1 element
  <H1 content> 
 > 
 <L1 //3 element
  <H2 content> 
  <P content>
  <L1 //1 element
   <H3 content>
  >
 >
>

This is my snipped code C#

var str = "<L1\r\n <L1\r\n  <H1 content> \r\n > \r\n <L1\r\n  <H2 content> \r\n  <P content>\r\n  <L1\r\n   <H3 content>\r\n  >\r\n >\r\n>";
   var list = str.Split(new string[] { "\r\n" }, StringSplitOptions.None);
   var array_num = new List<string>();
   for (int i = 0; i < list.Length; i++)
   {
       if (list[i].Contains("<L1"))
       {
           int ElementNum = 0;
           for (int Lindex = i + 1; Lindex <= list.Length; Lindex++)
           {
               int end = list[Lindex].IndexOf(">");
               int start = list[Lindex].IndexOf("<");
               int def = list[i].IndexOf("<L1");
               if (end == def)
               {
                   break;
               }
               if (start == def + 2)
               {
                   ElementNum = ElementNum + 1;
               }
           }
           array_num.Add("index :" + i.ToString() + " have element: " + ElementNum.ToString());
       }
   }

But when i run it and get the result not expected, I can get direct position started of tag but the result array_num get content 4,0,1,0 is wrong . The result is correct should be 2,1,3,1 . So, any idea for this one, thank all.

What I have tried:

I try spent more time to debug or find some idea. but still not yet until now.

Posted 6-Jul-23 0:49am

headshot9x

Updated 6-Jul-23 21:37pm

Add a Solution

Comments

PIEBALDconsult 6-Jul-23 10:27am

RegularExpressions.

headshot9x 9-Jul-23 2:57am

So, look like other idea

3 solutions

Solution 2

After debug line by line, i try this below, but the result still not expected .But i'm not sure my code working well if more example case.

start:0 end: 11 count: 2 //correct
start:1 end: 3 count: 1 //correct
start:4 end: 11 count: 1 //incorrect
start:7 end: 9 count: 1 //correct

var list = str.Split(new string[] { "\r\n" }, StringSplitOptions.None);
var array_num = new List<string>();
int startpos = 0, endpos = 0, total = 0, newstartpos = 0;
bool newtag = false;
for (int i = 0; i < list.Length; i++)
{
    if (list[i].Trim() == "<L1")
    {
        startpos = i;
        for (int Lindex = i + 1; Lindex < list.Length ; Lindex++)
        {
            var item = list[Lindex].Trim().ToString();
            if (list[Lindex].Trim().StartsWith("<L1") && list[Lindex].Trim().EndsWith(">"))
            {
                total += 1;
            }
            if (list[Lindex].Trim() == "<L1")
            {
                total += 2;
                newstartpos = Lindex;
                newtag = true;
            }
            if (list[Lindex].Trim() == ">" && newstartpos != 0)
            {
                total -= 1;
                endpos = Lindex;
                newtag = false;
            }
            if (list[Lindex].Trim().StartsWith("<") && list[Lindex].Trim().EndsWith(">") && !newtag)
            {
                total += 1;
            }
            if (list[Lindex].Trim() == ">" && newstartpos == 0)
            {
                endpos = Lindex;
                break;
            }
        }
        array_num.Add("start: " + startpos + " end: " + endpos + " count: " + total);
        startpos = 0;
        endpos = 0;
        total = 0;
        newstartpos = 0;
        newtag = false;
    }
}

Posted 6-Jul-23 21:01pm

headshot9x

Comments

OriginalGriff 7-Jul-23 3:38am

Normally, I wouldn't post a second solution, but in this case ... see #3

Solution 3

Seriously, the way I suggested is way, way easier!

C#

        protected sealed class TagItem
            {
            public int StartIndex { get; set; }
            public int EndIndex { get; set; }
            public int TagsCount { get; set; }
            public TagItem Parent { get; set; }
            public override string ToString()
                {
                return $"Start   : {StartIndex}\nEnd     : {EndIndex}\nContains: {TagsCount}";
                }
            }
        /// <summary>
        /// Called once when form displayed for first time
        /// </summary>
        /// <param name="sender"></param>
        /// <param name="e"></param>
        private void FrmMain_Shown(object sender, EventArgs epp)
            {
            string str = "<L1\r\n <L1\r\n  <H1 content> \r\n > \r\n <L1\r\n  <H2 content> \r\n  <P content>\r\n  <L1\r\n   <H3 content>\r\n  >\r\n >\r\n>";
            List<TagItem> tags = new List<TagItem>();
            TagItem currentTag = null;   
            for (int i = 0; i < str.Length; i++)
                {
                if (str[i] == '<')
                    {
                    if (currentTag != null) currentTag.TagsCount++;
... 2 lines removed
                    }
                else if (str[i] == '>')
                    {
                    if (currentTag == null) throw new ArgumentException($"Input contains mismatched tags: unexpected \">\" at position {i}");
... 2 lines removed
                    }
                }
            Console.WriteLine(string.Join(",", tags.Where(t => t.TagsCount > 0).Select(t => t.TagsCount)));
            }

Now, you can't just hand that in - I removed a few lines of code - but you are really struggling with the way you are doing it because it's a clumsy way and you keep hitting it with a hammer to bend it around another edge case! Ask yourself this: "Why am I at all concerned with lines of text? Do they actually matter?"
If the answer is "No, they don't matter at all, but I thought they did and now I'm stuck with them" then think about throwing your existing way of doing it away and start again, with a plan for the end result in mind to start with.

Sometimes, the hardest thing to do is admit you made a mistake and the whole approach is flawed - but we all do it, and it's always a wrench to discard work we laboured on. You do get a "better" solution though!

Posted 6-Jul-23 21:37pm

OriginalGriff

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Accepted Answer · 2023-07-06T01:18:00

That's because you only count and process lines containing "<L1" - The other elements in your sample start with "<H" or <"P" so they are ignored.

Quote:
Really, it's mean you tried it before. But the results like that or not?

No, but I just tried it: it's not a complex task.
Create a class to hold tag info:

C#

protected sealed class TagItem
    {
    public int StartIndex { get; set; }
    public int EndIndex { get; set; }
    public int TagsCount { get; set; }
    public TagItem Parent { get; set; }
    public override string ToString()
        {
        return $"Start   : {StartIndex}\nEnd     : {EndIndex}\nContains: {TagsCount}";
        }
    }

Then create a List to hold them as you find them and a "current tag" variable so you know what a new tag belongs to:

C#

List<TagItem> tags = new List<TagItem>();
TagItem currentTag = null;

Then look at each character in the input string (no need to Split it!) looking for start and end tags.

For a start, count it on the current tag (if there is one), create a new TagItem with the character index and current tag as its parent, then add that to the List and set the new current tag.

For an end, set the end index on the current tag, and set it back to it's parent.

After the loop, print counts for all the non-zero tags:

C#

Console.WriteLine(string.Join(",", tags.Where(t => t.TagsCount > 0).Select(t => t.TagsCount)));

5 minutes of typing, or so?
Result: "2,1,3,1"