
No, not that kind of scraping. You can capture and download virtually any webpage using VBA. Here's one method for doing so.
I've written a simple function that lets you screen scrape almost any webpage you can pass as a parameter. I say "almost" because some pages are rendered in Javascript or Flash (or frames!) and don't really reveal any useful source code. The function will return the raw client-side output as a String with no parsing whatsoever. How you choose to parse the information is up to you.
The GetWebpage Function
This function should be familiar to anyone who has read any of my articles on XMLHTTP parsing. It uses the MSXML.XMLHTTP object to open and grab a web page. Simple as that. If you specify a filename as the second parameter, the webpage contents are written to a file.
Function GetWebpage(url As String, Optional fileName As String) As String
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Set xml = GetMSXML
' grab webpage
With xml
.Open "GET", url, False
.send
End With
GetWebpage = xml.responseText
' write to file?
If Len(fileName) > 0 Then
If Not FileExists(fileName) Then
Call CreateFile(fileName, GetWebpage)
Else ' file exists
If MsgBox("File already exists, overwrite?", vbYesNo) = vbYes Then
Call CreateFile(fileName, GetWebpage)
End If
End If
End If
End FunctionThe GetMSXML and CreateFile functions may be found on the MSXML Object Library page.
If you do choose to write the output to a file, you'll need this function:
Function FileExists(fileName As String) As Boolean FileExists = (Len(Dir(fileName)) > 0) End Function
Sample usage
Sub TestGetWebpage()
Dim webpage As String
webpage = GetWebpage("http://www.jpsoftwaretech.com/blog/")
Debug.Print webpage
End Sub
Hey JP, hoping you could help, i have a created in excel 2003 a vba that goes to the website in internet explorer, logs in and submits, now i need it to click a specific tab or access the key below. Would you be able to assist? i tried to use keyboard keys (tab) but for some reason its not working. Below is the key i am trying to access. i am trying to access the last key. Please let me know if you have any suggestions. thanks !
var cmdMap = new Array();
cmdMap[cmdMap.length] = { Key:"110221", Method:"NextApplet", Applet:"", Service:"", Argument:""};
cmdMap[cmdMap.length] = { Key:"110219", Method:"PrevApplet", Applet:"", Service:"", Argument:""};
cmdMap[cmdMap.length] = { Key:"11086", Method:"ViewList", Applet:"", Service:"", Argument:""};
cmdMap[cmdMap.length] = { Key:"110187", Method:"NextFrame", Applet:"", Service:"", Argument:""};
cmdMap[cmdMap.length] = { Key:"011187", Method:"NextFrame", Applet:"", Service:"", Argument:""};
cmdMap[cmdMap.length] = { Key:"110189", Method:"PrevFrame", Applet:"", Service:"", Argument:""};
cmdMap[cmdMap.length] = { Key:"011189", Method:"PrevFrame", Applet:"", Service:"", Argument:""};
What do you mean by "access"? Follow a link?
JP:
I appreciate this code, it really helped me out. However, I notice that once the function has been used for a particular site within an excel session, repeated calls generate the same data even though the target website may change the retrieved data each time. I have to exit excel in order to retrieve an update. Is there a way to force a refresh?
Do you have an example?
I've been trying to extract a puzzle from a web page using the following, which uses some of your code:
Sub GetWebPuzzle()
Dim webPuz As String
Dim iStart As Integer
Dim String2 As String
String2 = ""
webPuz = GetWebpage("http://www.wordles.com/getcrypto.aspx")
'"http://www.jpsoftwaretech.com/blog/")
' Parse puzzle
'Stop
iStart = InStr(1, webPuz, String2)
'Debug.Print webPuz
webPuz = Right(webPuz, Len(webPuz) – iStart – Len(String2) + 1)
iStart = InStr(1, webPuz, "<")
webPuz = Left(webPuz, iStart – 1)
Range("puzzle").Value = webPuz
End Sub
Function GetWebpage(url As String, Optional fileName As String) As String
' http://www.jpsoftwaretech.com/blog/2010/08/screen-scraping-101-with-vba/
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Set xml = CreateObject("MSXML2.XMLHTTP.6.0")
' grab webpage
With xml
.Open "GET", url, False
.send
End With
GetWebpage = xml.responseText
Set xml = Nothing
End Function
When I run it the first time, it works great. However, for successive tries, it just repeats the same string, which leads me to suspect the web query is not refreshing, or that I need to close the connection, which is why I added the "Set xml =Nothing" statement. I don't really understand the mechanics of grabbing stuff from the web, and was trying to do something simple to learn about this. If you can point me in the right direction, I'd appreciate it.
—Glenn
The nature of the page is probably preventing MSXML scraping. I think you might need to use Internet Explorer automation here.
Okay, thanks
how do you account for Ajax or Js based pages, can we check the ready state and then call "Get", if so how?
Some pages can't be scraped using this method, they simply show different text than what is found using View » Source. What is the URL and can you provide a sample query?