Screen Scraping 101 with VBA

scraper

No, not that kind of scraping. You can capture and download virtually any webpage using VBA. Here's one method for doing so.

I've written a simple function that lets you screen scrape almost any webpage you can pass as a parameter. I say "almost" because some pages are rendered in Javascript or Flash (or frames!) and don't really reveal any useful source code. The function will return the raw client-side output as a String with no parsing whatsoever. How you choose to parse the information is up to you.

The GetWebpage Function

This function should be familiar to anyone who has read any of my articles on XMLHTTP parsing. It uses the MSXML.XMLHTTP object to open and grab a web page. Simple as that. If you specify a filename as the second parameter, the webpage contents are written to a file.

Function GetWebpage(url As String, Optional fileName As String) As String

Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String

  Set xml = GetMSXML

  ' grab webpage
  With xml
    .Open "GET", url, False
    .send
  End With

  GetWebpage = xml.responseText

  ' write to file?
  If Len(fileName) > 0 Then
    If Not FileExists(fileName) Then
      Call CreateFile(fileName, GetWebpage)
    Else ' file exists
      If MsgBox("File already exists, overwrite?", vbYesNo) = vbYes Then
        Call CreateFile(fileName, GetWebpage)
      End If
    End If
  End If

End Function

The GetMSXML and CreateFile functions may be found on the MSXML Object Library page.

If you do choose to write the output to a file, you'll need this function:

Function FileExists(fileName As String) As Boolean
  FileExists = (Len(Dir(fileName)) > 0)
End Function

Sample usage

Sub TestGetWebpage()
  Dim webpage As String

  webpage = GetWebpage("http://www.jpsoftwaretech.com/blog/")
  Debug.Print webpage
End Sub
About JP

I'm just an average guy who writes VBA code for a living. This is my personal blog. Excel and Outlook are my thing, with a sprinkle of Access and Word here and there. Follow this space to learn more about VBA. Keep Reading »



Share This Article:

Share and bookmark this articledelicious buttonfacebook buttonlinkedin buttonstumbleupon buttontwitter button

comment bubble 9 Comment(s) on Screen Scraping 101 with VBA:

  1. Hey JP, hoping you could help, i have a created in excel 2003 a vba that goes to the website in internet explorer, logs in and submits, now i need it to click a specific tab or access the key below. Would you be able to assist? i tried to use keyboard keys (tab) but for some reason its not working. Below is the key i am trying to access. i am trying to access the last key. Please let me know if you have any suggestions. thanks !

    var cmdMap = new Array();
    cmdMap[cmdMap.length] = { Key:"110221", Method:"NextApplet", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"110219", Method:"PrevApplet", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"11086", Method:"ViewList", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"110187", Method:"NextFrame", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"011187", Method:"NextFrame", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"110189", Method:"PrevFrame", Applet:"", Service:"", Argument:""};
    cmdMap[cmdMap.length] = { Key:"011189", Method:"PrevFrame", Applet:"", Service:"", Argument:""};

  2. JP:

    I appreciate this code, it really helped me out. However, I notice that once the function has been used for a particular site within an excel session, repeated calls generate the same data even though the target website may change the retrieved data each time. I have to exit excel in order to retrieve an update. Is there a way to force a refresh?

    • Do you have an example?

      • I've been trying to extract a puzzle from a web page using the following, which uses some of your code:

        Sub GetWebPuzzle()

        Dim webPuz As String
        Dim iStart As Integer
        Dim String2 As String

        String2 = ""
        webPuz = GetWebpage("http://www.wordles.com/getcrypto.aspx")
        '"http://www.jpsoftwaretech.com/blog/")

        ' Parse puzzle
        'Stop
        iStart = InStr(1, webPuz, String2)
        'Debug.Print webPuz
        webPuz = Right(webPuz, Len(webPuz) – iStart – Len(String2) + 1)

        iStart = InStr(1, webPuz, "<")
        webPuz = Left(webPuz, iStart – 1)
        Range("puzzle").Value = webPuz

        End Sub

        Function GetWebpage(url As String, Optional fileName As String) As String
        ' http://www.jpsoftwaretech.com/blog/2010/08/screen-scraping-101-with-vba/
        Dim xml As Object ' MSXML2.XMLHTTP
        Dim result As String
        Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
        ' grab webpage
        With xml
        .Open "GET", url, False
        .send
        End With
        GetWebpage = xml.responseText
        Set xml = Nothing
        End Function

        When I run it the first time, it works great. However, for successive tries, it just repeats the same string, which leads me to suspect the web query is not refreshing, or that I need to close the connection, which is why I added the "Set xml =Nothing" statement. I don't really understand the mechanics of grabbing stuff from the web, and was trying to do something simple to learn about this. If you can point me in the right direction, I'd appreciate it.

        —Glenn

  3. how do you account for Ajax or Js based pages, can we check the ready state and then call "Get", if so how?

    • Some pages can't be scraped using this method, they simply show different text than what is found using View » Source. What is the URL and can you provide a sample query?

This article is closed to any future comments.
Random Data Generator