rickard rickard - 1 month ago 8
C# Question

Multiple split between two chars to string array

I need to split a string that consists of html elements.

I want to split between two chars "<" and ">".

var htmlElements = "<p>lorem ipsum</p><span>nisi sapien</span><ul><li>list items</li></ul>";
string arrayOfElements = htmlElements.Split('<', '>')[1];


Using this code only pulls out the first "p". I need to pull out every element to a string array. The closing tag
</p>
doesn't matter, i need only the starting tag for every element.

Desired output is a string array containing
p span ul li

Answer

I suggest using regular expressions in order to extract (match) the required values:

string htmlElements = "<p>lorem ipsum</p><span>nisi sapien</span><ul><li>list items</li></ul>";

string[] arrayOfElements = Regex
  .Matches(htmlElements, @"<(\w+)>")
  .OfType<Match>()
  .Select(m => m.Groups[1].Value)
  .ToArray();

Test

// p span ul li
Console.Write(string.Join(" ", arrayOfElements));

In general case, parsing html by means of regular expressions is a bad idea, but if you want just to obtain items' values it can be good enough.