InfoScraper
Tools and techniques to extract information from web pages and newsletters



Subscribe to "InfoScraper" in Radio UserLand.

Click to see the XML version of this web page.

Click here to send an email to the editor of this weblog.
 

 

 
 September 16, 2002
  8:44:12 AM  

Daily comics

This project locates today's comics on their web pages, and builds a table with the comics all in one place.

Doonesbury: from the web page, look for <a href="http://www.ucomics.com/cgi-bin/shopping/buycomic.cgi and extract all text up to </a>. This gives today's comic with a hyperlink to the order form.

Normally, a web page has no way to directly load another. This is usually done with a COM component.

 

How to make a HTTP connection in VBS
Newsgroups: microsoft.public.inetsdk.programming.scripting.vbscript, microsoft.public.scripting.vbscript
From: Johnny Xia (johnny_xia@wistron.com.cn) Date: 2001-08-20 04:32:17 PST
Is there any component which can make a HTTP request in VBS? I don't need any UI, just want to GET/POST a URL.
 
From: Adrian Forbes (noemail@noemail.xxx) Date: 2001-08-20 06:55:27 PST
set obj = CreateObject("Microsoft.XMLHTTP")
 
From: oxygen (oxygen@swbell.net) Date: 2001-08-27 08:57:16 PST
Yes there is...  You need to have an XML parser installed on the server. There are three that I know of: Microsoft's xmlhttp, ASPTear,
and ASPHTTP. I personally use the Microsoft version just to keep my server all Microsoft.(uniformity I guess)

Here is some code that I wrote to access a remote URL and grab the source code:
<%@ Language=VBScript%>
<%
  Response.Buffer = True
  Dim objXMLHTTP, xml, dtmTime, strURL
  strURL = http://www.someurl.com
  ' Create an xmlhttp object:
  Set xml = Server.CreateObject("MSXML2.ServerXMLHTTP")
  ' Opens the connection to the remote server.
  xml.Open "GET", strURL, False
  ' Actually Sends the request and returns the data:
  xml.Send
  ' Move the source of what was returned into a string for later use.
  strSource = xml.responseText
  ' Be clean and clean up
  Set xml = Nothing
Response.Write strSource%>

And there you have it.

 


It's trivial in ASP.NET:

03/06/2002 [VB.NET Snippets] (c)Zidler 2002
How to read the content of an external website in a variable
This snippet explains how some search engines 'crawl' your website and cache it in their database.
'VB.Net
Function readHtmlPage(url As String) As String

   Dim objResponse As WebResponse
   Dim objRequest As WebRequest
   Dim result As String

   objRequest = System.Net.HttpWebRequest.Create(url)
   objResponse = objRequest.GetResponse()
   Dim sr As New StreamReader(objResponse.GetResponseStream())
   result = sr.ReadToEnd()

   'clean up StreamReader
   sr.Close()
   return result

End Function
Source: Dotnet4all

Other links for ASP.NET:

 


Click here to visit the Radio UserLand website. © Copyright 2002 Eric Hartwell.
Last update: 03/10/2002; 10:59:50 AM.
This theme is based on the SoundWaves (blue) Manila theme.

September 2002
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
Aug   Oct


"Data! data! data!" he cried impatiently. "I can't make bricks without clay."
— Sherlock Holmes to Dr. Watson in "The Adventure of the Copper Beeches" by Arthur Conan Doyle. 


"I like deadlines," cartoonist Scott Adams once said. "I especially like the whooshing sound they make as they fly by."


"There is nothing like that feeling of spending days and days banging your head against a wall trying to solve a programming problem then suddenly finding that one tiny obscure and seemingly unrelated piece of the puzzle that unlocks the solution. Oh yeah!"

- Chris Maunder, CodeProject Newsletter 28 Jan 2002


"Management at eSnipe, which is me, is also feeling the pain of the 2002 bear market. So rather than pout about it, I bought some stuff on eBay that I really didn’t need, but made me feel better."

- Tom Campbell, president of eSnipe