An exploration of IE browser methods, part III

digital circuit

In previous articles I've written about how to extract information from webpages, and interact with them to navigate through a website.

This article won't strictly focus on IE, but on several collections available through the MSHTML Object Model.

Previous articles:

The GetMSXML Function

I've started using this function to return the XMLHTTP object. Hopefully I'll remember to use it consistently.

We'll use it because it's much faster than automating IE, as we learned previously, and we're more concerned with the HTML document once it's created rather than the method of actually creating it.

Function GetMSXML() As Object ' MSXML2.XMLHTTP
  On Error Resume Next
  Set GetMSXML = CreateObject("MSXML2.XMLHTTP.6.0")
End Function

Creating MSHTML Documents

Another function we'll need creates the HTML document which we'll load the HTML into. We also saw this code in An exploration of IE browser methods, part II.

Function CreateHTMLDoc() As Object ' MSHTML.HTMLDocument
  Set CreateHTMLDoc = CreateObject("htmlfile")
End Function

HTML Document Object Collections

These are the three collections we'll be working with here: Anchors, Images, Links. We dealt with them in An exploration of IE browser methods, part II, but here I'll be going into a bit more detail.

Each one accepts a HTML Document Object (which we've already loaded with content from a website) and returns a reference to the relevant collection. The collection can then be iterated to look for a value, compare values, and so on.

Anchors Collection

The Anchors Collection "retrieves a collection of all a objects that have a name and/or id property. Objects in this collection are in HTML source order."

Function GetHTMLAnchors(htmlDoc As Object) As Object ' MSHTML.IHTMLElementCollection
  Set GetHTMLAnchors = htmlDoc.anchors
End Function

The Images Collection

The Images Collection "retrieves a collection, in source order, of img objects in the document."

Function GetHTMLImages(htmlDoc As Object) As Object ' MSHTML.IHTMLElementCollection
  Set GetHTMLImages = htmlDoc.images
End Function

The Links Collection

The Links Collection "retrieves a collection of all a objects that specify the href property and all area objects in the document.".

Function GetHTMLLinks(htmlDoc As Object) As Object ' MSHTML.IHTMLElementCollection
  Set GetHTMLLinks = htmlDoc.links
End Function

Note that each collection returns objects "in source order", which means that the first link you see on a page will be the first link in the Anchors/Links Collection. Also, all three Collections have Length and Item Properties, which let you check the amount of items in the collection and return each individual item, respectively.

Looping through each collection

In the sample procedure below, we create a new HTML document and populate it with the source code from Google's homepage (hey, got a better site to test? Leave a comment!). Then we grab each collection in turn and print various attributes from each member.

Sub TestHTMLCollections()

On Error GoTo ErrorHandler

Dim xml As Object ' MSXML2.XMLHTTP
Dim htmlDoc As Object ' MSHTML.HTMLDocument
Dim htmlBody As Object ' MSHTML.HTMLBody
Dim anchors As Object ' MSHTML.IHTMLElementCollection
Dim anchor As Object ' MSHTML.IHTMLElement
Dim images As Object ' MSHTML.IHTMLElementCollection
Dim image As Object ' MSHTML.IHTMLElement
Dim links As Object ' MSHTML.IHTMLElementCollection
Dim link As Object ' MSHTML.IHTMLElement

Set xml = GetMSXML
Set htmlDoc = CreateHTMLDoc
Set htmlBody = htmlDoc.body

With xml
  .Open "GET", "http://www.google.com/", False
  .send
End With

  ' put xml response into HTML document
  htmlBody.innerHTML = xml.responseText

  ' get all anchors
  Set anchors = GetHTMLAnchors(htmlDoc)

  ' number of anchor tags
  Debug.Print anchors.Length

  ' print each anchor tag contents
  For Each anchor In anchors
    Debug.Print anchor.innerHTML
  Next anchor

  ' get all images
  Set images = GetHTMLImages(htmlDoc)

  ' number of images
  Debug.Print images.Length

  ' print each image URL (img 'src' attribute)
  For Each image In images
    Debug.Print image.getAttribute("src")
  Next image

  ' get links collection
  Set links = GetHTMLLinks(htmlDoc)

  ' number of links
  Debug.Print links.Length

  ' print link target
  For Each link In links
    Debug.Print link.getAttribute("href")
  Next link

ProgramExit:
  Exit Sub
ErrorHandler:
  MsgBox Err.number & " - " & Err.description
  Resume ProgramExit
End Sub

Related Articles:

About JP

I'm just an average guy who writes VBA code for a living. This is my personal blog. Excel and Outlook are my thing, with a sprinkle of Access and Word here and there. Follow this space to learn more about VBA. Keep Reading »

Share This Article:

Share and bookmark this articledelicious buttonfacebook buttonlinkedin buttonstumbleupon buttontwitter button

comment bubble 8 Comments:

  1. whel writes:

    hi JP… this is my 1st time to see other method to automate ie..

    i tried the above codes including the function but, it's returning an ERROR:

    -2147024891 – access is denied

  2. Matti writes:

    Can I ask here if it is possible to capture the events of html elements via vba? – I would need to attach the click event on each table cell of a table and then be able to capture the particular td-element fired by user. Is this possible via VBA?

  3. Aidan writes:

    Hi JP,

    Thanks for this, it is fantastic stuff!

    Is there any way to automate the use of dropdown boxes? I have used
    'Set UserN = appIE.Document.getElementsByName' to define the element, but can't get it to automatically select the dropdown option I need, usually the second option on the dropdown list.

    Grateful for any assistance.

  4. joby John writes:

    Hi,

    Thanks for IE automation information! I tried these methods and it works for me.. but the issue is that my program not giving me consistant result. My vba program is for accessing a password protected site and got a particular page and copy the content to excel.. on an avg 70% the program gives correct result..but some times it copies and paste some other pages
    …any reason for this behaviour? thanks a lot for the site informations

    Regards,
    John

  5. Dinesh writes:

    Hi JP,
    I am trying to build a code but i am getting some Automation Error – The interface is unknown or The object invoked has disconnected from its clients.

    Following is the code, I had made reference to shdocvw.dll, and mshtml.tlb..
    Dim objIE As SHDocVw.InternetExplorer
    Dim htmlDoc As MSHTML.HTMLDocument
    Dim htmlInput As MSHTML.HTMLInputElement

    Set objIE = New SHDocVw.InternetExplorer
    With objIE
    .Visible = True
    .navigate "https://xyz.com"

    Do While .readyState 4: DoEvents: Loop
    Set htmlDoc = .document
    ……… so on..

    Now the place i get this error is immediately after the step
    objIE.navigate "https://xyz.com"

    How to resolve this issue?
    I am using Excel 2007 – VBA and IE 8 on Window 7.
    Please advice

    Thank you

Note: Comments are subject to the Blog Comment Policy and may not appear immediately. To post VBA code in your comment, use code tags like this: [vb]your code goes here[/vb]

Add a Comment:

*

Site last updated: February 3, 2012