MrHaga MrHaga - 3 months ago 14
C# Question

Extract multiple urls from a string

I have a string containing a html source code. In this source there are many urls, but I find it hard to separate them from the rest of the string. I've been trying to find a way to get all the text between ("http:",".jpg"), but have not been successful in finding a way, at least to find multiple urls. As you have probably guessed I haven't been using C# for a long time. Any help will be appreciated.

Sample from the source I'm trying to extract the urls from:

<td class="rad">
<input type="hidden" name="filenames[]" value="1270000_12_2.jpg">
<a href="http://xxxxxxxxx/files/orders/120000/127200/12700000/Originals/1200000_12_2.jpg" target="_blank"><img src="http://xxxxxxxxxxxx/files/orders/120000/127200/120000/Originals/127000_12_2_thumb.jpg" border="0"></a>
<br />
120000_12_2.jpg </td>
<td class="rad" width="300" valign="top">
<label>Enter comment to photographer:</label><br />
<textarea rows="7" cols="35" name="comment[]"></textarea>
</td>
<td class="rad" width="300" valign="top">
<label for="comment_from_editor">Comment from editor</label><br />
<textarea rows="4" cols="35" name="comment_from_editor[]" id="comment_from_editor">




Answer

In C#

using System.Collections.Generic;
using System.Text.RegularExpressions;

    static string[] ParseLinkToJpg(string str)
    {
        Regex regex = new Regex(@"(http:.*?\.(.*?)).\s");
        Match match = regex.Match(str);
        List<string> result=new List<string>();
        while (match.Success)
        {
            if (match.Groups[2].ToString()=="jpg")
            result.Add(match.Groups[1].ToString());
            match = match.NextMatch();
        }
        return result.ToArray();
    }

This function will return an array of links to images.

You can change the regular expression (http:.*?\.(.*?)).\s to what you need.

https://www.debuggex.com/ is an exellent service for testing regular expressions.

Comments