paul frith paul frith - 1 year ago 54
HTML Question

Extracting the "for" attribute from a label element in an html webpage

I have some code that analyses various attributes of a webpage when parts of that webpage are clicked. One of the elements which is picked out is the ID of the clicked element.

Sometimes however there is no ID and instead the element being clicked on is a label using the "for" attribute to reference an ID. In these case I want to pick up the "for" attribute value.

I have attempted to do this as follows:

txtID.Text = TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).GetAttribute("id")
If txtID.Text = "" Then
txtID.Text = TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).GetAttribute("for")
End If

For some reason
always returns blank. Am I referencing this attribute wrongly - or is something else going on.

HTML Example below:

<div class="question legal-owner active">

<a class="help-trigger help-trigger-layout">
<span class="help-text-icon"></span>

<div class="quote-help quote-help-layout">
<a class="quote-help-close-container">
<div class="quote-help-close"></div>
<h3>Car ownership</h3>

We need to know whether the car belongs to you. If you don’t own the car but you’re the registered keeper, you should answer ‘No’
(the owner of the car and the registered keeper can be different people).


<span class="editor-label question-layout">
<label for="OwningAndUsingCarPanel_LegalOwner">Are you (or will you be) the legal owner of this car?</label>
<ul class="question-layout yesno-radio-list">
<input name="OwningAndUsingCarPanel.LegalOwner" id="OwningAndUsingCarPanel_LegalOwner_true" type="radio" value="True">
<label for="OwningAndUsingCarPanel_LegalOwner_true">
<input name="OwningAndUsingCarPanel.LegalOwner" id="OwningAndUsingCarPanel_LegalOwner_false" type="radio" value="False">
<label for="OwningAndUsingCarPanel_LegalOwner_false">
<span class="editor-validation">
<span class="field-validation-valid" id="OwningAndUsingCarPanel_LegalOwner_validationMessage"></span>

Answer Source

I have resolved this by creating my own function called getUnknown to search for an attribute within a tag. This should work for any attribute whose value is surrounded by double quotes. The function has 2 arguments, the first is a string which should contain the element tag complete with attributes and values, and the second is the attribute that you want to extract the value for.

Private Function getUnknown(myText As String, myAttr As String)
    Dim myResult As String = ""
    Dim myStart As Integer = 0
    Dim myLen As Integer = 0
    'remove any spaces around the "=" sign
    Dim myCleanText As String = Regex.Replace(myText, "\s+([=])\s+|\s+([=])|([=])\s+", "=")
    'add =" to the attribute to avoid finding non-attributes when using IndexOf function
    Dim myFullAttr As String = myAttr.Trim().ToLower + "="""

        myStart = myCleanText.ToLower().IndexOf(myFullAttr)
        If myStart = -1 Then
            myResult = "Nothing Found"
            myStart = myStart + myFullAttr.Length
            myLen = myCleanText.IndexOf("""", myStart) - myStart
            myResult = myCleanText.Substring(myStart, myLen)
        End If
    Catch ex As Exception
        myResult = "Nothing Found"
    End Try

    Return myResult

End Function

In the context of my original question I have used this as follows

Dim myElement As String = _
 TryCast(myHTMLDocument, HtmlDocument).GetElementFromPoint(lastMousePos).OuterHtml.ToString

txtID.Text = getUnknown(myElement, "for")