Screen Scraping 101 with VBA

scraper

No, not that kind of scraping. You can capture and download virtually any webpage using VBA. Here's one method for doing so.

I've written a simple function that lets you screen scrape almost any webpage you can pass as a parameter. I say "almost" because some pages are rendered in Javascript or Flash (or frames!) and don't really reveal any useful source code. The function will return the raw client-side output as a String with no parsing whatsoever. How you choose to parse the information is up to you.

The GetWebpage Function

This function should be familiar to anyone who has read any of my articles on XMLHTTP parsing. It uses the MSXML.XMLHTTP object to open and grab a web page. Simple as that. If you specify a filename as the second parameter, the webpage contents are written to a file.

Function GetWebpage(url As String, Optional fileName As String) As String

Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String

  Set xml = GetMSXML

  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With

  GetWebpage = xml.responseText

  ' write to file?
  If Len(fileName) > 0 Then
    If Not FileExists(fileName) Then
      Call CreateFile(fileName, GetWebpage)
    Else ' file exists
      If MsgBox("File already exists, overwrite?", vbYesNo) = vbYes Then
        Call CreateFile(fileName, GetWebpage)
      End If
    End If
  End If

End Function

To create an instance of the XMLHTTP Object, I've started using this function:

Function GetMSXML() As Object
  On Error Resume Next
  Set GetMSXML = CreateObject("MSXML2.XMLHTTP.6.0")
End Function

If you do choose to write the output to a file, you'll need these two functions:

Sub CreateFile(fileName As String, contents As String)
' create file from string contents

Dim tempFile As String
Dim nextFileNum As Long

  nextFileNum = FreeFile

  tempFile = fileName

  Open tempFile For Output As #nextFileNum
  Print #nextFileNum, contents
  Close #nextFileNum

End Sub

Function FileExists(fileName As String) As Boolean
  FileExists = (Len(Dir(fileName)) > 0)
End Function

Sample usage

Sub TestGetWebpage()

Dim webpage As String

webpage = GetWebpage("http://www.jpsoftwaretech.com/blog/")

Debug.Print webpage

End Sub

Related Articles:

About JP

I'm just an average guy who writes VBA code for a living. This is my personal blog. Excel and Outlook are my thing, with a sprinkle of Access and Word here and there. Follow this space to learn more about VBA. Keep Reading »

Share This Article:

Share and bookmark this articledelicious buttonfacebook buttonlinkedin buttonstumbleupon buttontwitter button

comment bubble 9 Comments:

  1. Johnny writes:

    Hey JP, hoping you could help, i have a created in excel 2003 a vba that goes to the website in internet explorer, logs in and submits, now i need it to click a specific tab or access the key below. Would you be able to assist? i tried to use keyboard keys (tab) but for some reason its not working. Below is the key i am trying to access. i am trying to access the last key. Please let me know if you have any suggestions. thanks !

    var cmdMap = new Array();
    cmdMap[cmdMap.length] = { Key:"110221", Method:"NextApplet", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"110219", Method:"PrevApplet", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"11086", Method:"ViewList", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"110187", Method:"NextFrame", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"011187", Method:"NextFrame", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"110189", Method:"PrevFrame", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"011189", Method:"PrevFrame", Applet:"", Service:"", Argument:""};

  2. Glenn writes:

    JP:

    I appreciate this code, it really helped me out. However, I notice that once the function has been used for a particular site within an excel session, repeated calls generate the same data even though the target website may change the retrieved data each time. I have to exit excel in order to retrieve an update. Is there a way to force a refresh?

    • JP writes:

      Do you have an example?

      • Glenn writes:

        I've been trying to extract a puzzle from a web page using the following, which uses some of your code:

        Sub GetWebPuzzle()

        Dim webPuz As String
        Dim iStart As Integer
        Dim String2 As String

        String2 = ""
        webPuz = GetWebpage("http://www.wordles.com/getcrypto.aspx")
        '"http://www.jpsoftwaretech.com/blog/")

        ' Parse puzzle
        'Stop
        iStart = InStr(1, webPuz, String2)
        'Debug.Print webPuz
        webPuz = Right(webPuz, Len(webPuz) – iStart – Len(String2) + 1)

        iStart = InStr(1, webPuz, "<")
        webPuz = Left(webPuz, iStart – 1)
        Range("puzzle").Value = webPuz

        End Sub

        Function GetWebpage(url As String, Optional fileName As String) As String
        ' http://www.jpsoftwaretech.com/blog/2010/08/screen-scraping-101-with-vba/
        Dim xml As Object ' MSXML2.XMLHTTP
        Dim result As String
        Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
        ' grab webpage
        With xml
        .Open "GET", url, False
        .send
        End With
        GetWebpage = xml.responseText
        Set xml = Nothing
        End Function

        When I run it the first time, it works great. However, for successive tries, it just repeats the same string, which leads me to suspect the web query is not refreshing, or that I need to close the connection, which is why I added the "Set xml =Nothing" statement. I don't really understand the mechanics of grabbing stuff from the web, and was trying to do something simple to learn about this. If you can point me in the right direction, I'd appreciate it.

        —Glenn

  3. Hemant writes:

    how do you account for Ajax or Js based pages, can we check the ready state and then call "Get", if so how?

    • JP writes:

      Some pages can't be scraped using this method, they simply show different text than what is found using View » Source. What is the URL and can you provide a sample query?

Note: Comments are subject to the Blog Comment Policy and may not appear immediately. To post VBA code in your comment, use code tags like this: [vb]your code goes here[/vb]

Add a Comment:

*

Site last updated: February 3, 2012