
In previous articles I've written about how to extract information from webpages, and interact with them to navigate through a website.
This article won't strictly focus on IE, but on several collections available through the MSHTML Object Model.
Previous articles:
- An exploration of IE browser methods, part I
- An exploration of IE browser methods, part II
- Automate Internet Explorer
The GetMSXML Function
I've started using this function to return the XMLHTTP object. Hopefully I'll remember to use it consistently.
We'll use it because it's much faster than automating IE, as we learned previously, and we're more concerned with the HTML document once it's created rather than the method of actually creating it.
Function GetMSXML() As Object ' MSXML2.XMLHTTP
On Error Resume Next
Set GetMSXML = CreateObject("MSXML2.XMLHTTP.6.0")
End FunctionCreating MSHTML Documents
Another function we'll need creates the HTML document which we'll load the HTML into. We also saw this code in An exploration of IE browser methods, part II.
Function CreateHTMLDoc() As Object ' MSHTML.HTMLDocument
Set CreateHTMLDoc = CreateObject("htmlfile")
End FunctionHTML Document Object Collections
These are the three collections we'll be working with here: Anchors, Images, Links. We dealt with them in An exploration of IE browser methods, part II, but here I'll be going into a bit more detail.
Each one accepts a HTML Document Object (which we've already loaded with content from a website) and returns a reference to the relevant collection. The collection can then be iterated to look for a value, compare values, and so on.
Anchors Collection
The Anchors Collection "retrieves a collection of all a objects that have a name and/or id property. Objects in this collection are in HTML source order."
Function GetHTMLAnchors(htmlDoc As Object) As Object ' MSHTML.IHTMLElementCollection Set GetHTMLAnchors = htmlDoc.anchors End Function
The Images Collection
The Images Collection "retrieves a collection, in source order, of img objects in the document."
Function GetHTMLImages(htmlDoc As Object) As Object ' MSHTML.IHTMLElementCollection Set GetHTMLImages = htmlDoc.images End Function
The Links Collection
The Links Collection "retrieves a collection of all a objects that specify the href property and all area objects in the document.".
Function GetHTMLLinks(htmlDoc As Object) As Object ' MSHTML.IHTMLElementCollection Set GetHTMLLinks = htmlDoc.links End Function
Note that each collection returns objects "in source order", which means that the first link you see on a page will be the first link in the Anchors/Links Collection. Also, all three Collections have Length and Item Properties, which let you check the amount of items in the collection and return each individual item, respectively.
Looping through each collection
In the sample procedure below, we create a new HTML document and populate it with the source code from Google's homepage (hey, got a better site to test? Leave a comment!). Then we grab each collection in turn and print various attributes from each member.
Sub TestHTMLCollections()
On Error GoTo ErrorHandler
Dim xml As Object ' MSXML2.XMLHTTP
Dim htmlDoc As Object ' MSHTML.HTMLDocument
Dim htmlBody As Object ' MSHTML.HTMLBody
Dim anchors As Object ' MSHTML.IHTMLElementCollection
Dim anchor As Object ' MSHTML.IHTMLElement
Dim images As Object ' MSHTML.IHTMLElementCollection
Dim image As Object ' MSHTML.IHTMLElement
Dim links As Object ' MSHTML.IHTMLElementCollection
Dim link As Object ' MSHTML.IHTMLElement
Set xml = GetMSXML
Set htmlDoc = CreateHTMLDoc
Set htmlBody = htmlDoc.body
With xml
.Open "GET", "http://www.google.com/", False
.send
End With
' put xml response into HTML document
htmlBody.innerHTML = xml.responseText
' get all anchors
Set anchors = GetHTMLAnchors(htmlDoc)
' number of anchor tags
Debug.Print anchors.Length
' print each anchor tag contents
For Each anchor In anchors
Debug.Print anchor.innerHTML
Next anchor
' get all images
Set images = GetHTMLImages(htmlDoc)
' number of images
Debug.Print images.Length
' print each image URL (img 'src' attribute)
For Each image In images
Debug.Print image.getAttribute("src")
Next image
' get links collection
Set links = GetHTMLLinks(htmlDoc)
' number of links
Debug.Print links.Length
' print link target
For Each link In links
Debug.Print link.getAttribute("href")
Next link
ProgramExit:
Exit Sub
ErrorHandler:
MsgBox Err.number & " - " & Err.description
Resume ProgramExit
End Sub




hi JP… this is my 1st time to see other method to automate ie..
i tried the above codes including the function but, it's returning an ERROR:
-2147024891 – access is denied
You'll need to be more specific. What line causes the error?
it's now working, instead of http://www.google.com, it now works using http://www.google.com.ph
i've noticed that every time i type http://www.google.com and hit enter, the browser automatically goes to http://www.google.com.ph
so i decided to edit and include .ph…
thanks JP…
by the way, do you have a sample code for ie automation for webpage with HTMLFrames? I'm interested to grab some data from http://www.technistock.net, enter symbol then click "GO", and grab all the data below "GO"
Can I ask here if it is possible to capture the events of html elements via vba? – I would need to attach the click event on each table cell of a table and then be able to capture the particular td-element fired by user. Is this possible via VBA?
Hi JP,
Thanks for this, it is fantastic stuff!
Is there any way to automate the use of dropdown boxes? I have used
'Set UserN = appIE.Document.getElementsByName' to define the element, but can't get it to automatically select the dropdown option I need, usually the second option on the dropdown list.
Grateful for any assistance.
Hi,
Thanks for IE automation information! I tried these methods and it works for me.. but the issue is that my program not giving me consistant result. My vba program is for accessing a password protected site and got a particular page and copy the content to excel.. on an avg 70% the program gives correct result..but some times it copies and paste some other pages
…any reason for this behaviour? thanks a lot for the site informations
Regards,
John
Hi JP,
I am trying to build a code but i am getting some Automation Error – The interface is unknown or The object invoked has disconnected from its clients.
Following is the code, I had made reference to shdocvw.dll, and mshtml.tlb..
Dim objIE As SHDocVw.InternetExplorer
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlInput As MSHTML.HTMLInputElement
Set objIE = New SHDocVw.InternetExplorer
With objIE
.Visible = True
.navigate "https://xyz.com"
Do While .readyState 4: DoEvents: Loop
Set htmlDoc = .document
……… so on..
Now the place i get this error is immediately after the step
objIE.navigate "https://xyz.com"
How to resolve this issue?
I am using Excel 2007 – VBA and IE 8 on Window 7.
Please advice
Thank you
Also to add one more point to this post,
When i use, "www.google.com" as my URL i dont get this error.
Only when i use the particular AUT url.. " https://xyz.abc.com" i have the error