MrHaga MrHaga - 2 months ago 5
C# Question

Extract multiple urls from a string

I have a string containing a html source code. In this source there are many urls, but I find it hard to separate them from the rest of the string. I've been trying to find a way to get all the text between ("http:",".jpg"), but have not been successful in finding a way, at least to find multiple urls. As you have probably guessed I haven't been using C# for a long time. Any help will be appreciated.

Sample from the source I'm trying to extract the urls from:

<td class="rad">
<input type="hidden" name="filenames[]" value="1270000_12_2.jpg">
<a href="http://xxxxxxxxx/files/orders/120000/127200/12700000/Originals/1200000_12_2.jpg" target="_blank"><img src="http://xxxxxxxxxxxx/files/orders/120000/127200/120000/Originals/127000_12_2_thumb.jpg" border="0"></a>
<br />
120000_12_2.jpg </td>
<td class="rad" width="300" valign="top">
<label>Enter comment to photographer:</label><br />
<textarea rows="7" cols="35" name="comment[]"></textarea>
<td class="rad" width="300" valign="top">
<label for="comment_from_editor">Comment from editor</label><br />
<textarea rows="4" cols="35" name="comment_from_editor[]" id="comment_from_editor">


In C#

using System.Collections.Generic;
using System.Text.RegularExpressions;

    static string[] ParseLinkToJpg(string str)
        Regex regex = new Regex(@"(http:.*?\.(.*?)).\s");
        Match match = regex.Match(str);
        List<string> result=new List<string>();
        while (match.Success)
            if (match.Groups[2].ToString()=="jpg")
            match = match.NextMatch();
        return result.ToArray();

This function will return an array of links to images.

You can change the regular expression (http:.*?\.(.*?)).\s to what you need. is an exellent service for testing regular expressions.