tclscript.com
main projects forum manpages tutorial

Script Search - Enter pattern to search for (wildcards accepted):
  Monday, January 05, 2009 Advanced Search | Ask Feathers | New Files | Popular Files | Links | Contact  

How to use egghttp.tcl


This lesson assumes some Tcl knowledge by the reader.
The tutorial may also be applied to the Tcl http package with
some minor tweaks, but for our purposes we will be focused
on egghttp.tcl.

1. Loading and Checking for egghttp.tcl

To load egghttp.tcl, you must use the source command within your
eggdrop's config file, or within a script.

ie. source scripts/egghttp.tcl
(assuming egghttp.tcl is in your scripts directory)

To check within your script if egghttp.tcl has been successfully loaded,
we would do a check against the variable $egghttp(version)

ie.

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
}


2. Opening a connection to a web page

To open a connection to a page, we use the egghttp:geturl command, which
returns the connection descriptor, which we may or may not need to know.

You need to specify what server/page you are interested in obtaining,
and what procedure you want to be called once the connection has
been established and obtained all required information.

ie.
set sock [egghttp:geturl http://www.yourserver.tld/index.shtml your_callbackproc]

The procedure your_callbackproc (which, btw, you can name whatever you wish to call
your procedure) will be called with one parameter by default, the connection descriptor
(aka. socket id).


3. Callback procedure and obtaining data

As mentioned, the callback procedure will be called with the connection descriptor,
which holds the identification of what server connection we are dealing with,
which comes in handy incase we have issued multiple connections to multiple servers.

Your callback procedure will take the form of:

proc your_callbackproc {sock} {
# stuff here
}


To obtain the html data from the page within your callback procedure, we
need to use the procedures egghttp:headers, which returns all the data
before the <HTML> tag, and egghttp:data, which returns all other data, including
the <HTML> tag. Both of these routines, you must specify the connection
descriptor to which you are interested in obtaining data for.

ie.

proc your_callbackproc {sock} {
  set headers [egghttp:headers $sock]
  set body [egghttp:data $sock]
}


4. Parsing the HTML data

This is probably the most important part of the procedure, and probably the most difficult.
One thing to make note of, is that eggdrop sockets insert carriage returns after x amount
of characters (256 If I remember correctly), and also strips out blank lines, so the data
obtained in the variable $body, won't resemble the original page itself. A good strategy
to handle this, is to get rid of all carriage returns (\n's), and insert your own where
you want them.

For example, let's say the page we are interested in looks like this:

<HTML>
  <BODY>
    <font>Welcome to My site</font><br>

    <center>
     this is some random text... blah... blah...blah.....<br>
     blah...blah....<br>

     <br>
     We have served <b>578</b> people to date!<br>
     <br>

     </center>
   </BODY>
  </HTML>


So, needless to say, any lines longer than 256 characters will be wrapped onto the next line,
and blank lines will be removed, when egghttp:data is called. This could cause problems in
finding what we are looking for, if we assume that the data is like it is in the original html source.

To handle this, we will format it ourselves, so we know how it will look.
For our example, we will get rid of all newline's (\n's) and insert our own wherever there is a <br>:

proc your_callbackproc {sock} {
  set headers [egghttp:headers $sock]
  set body [egghttp:data $sock]
  
  regsub -all "\n" $body "" body
  regsub -all -nocase {<br>} $body "<br>\n" body
}


So now, we know the HTML source in $body will look something like:

<HTML>  <BODY>  <font>Welcome to My site</font><br>
  <center>  this is some random text... blah... blah...blah.....<br>
  blah...blah....<br>
  <br>
  We have served <b>578</b> people to date!<br>
  <br>
  </center>  </BODY></HTML>


Now, let us say we are interested in finding out how many people have been served to date in the webpage.
One strategy is to use regexp to pull that data.

For example:
regexp {We have served <b>(.*)</b> people to date!} $body - served

Which will store "578" into the variable $served. Without going into too much detail about regexp, the (.*) in our regular expression
tells Tcl we are interested in extracting text between "We have served <b>" and "</b> people to date!", and spitting it out into a variable, which
we provided as being called "served".

Now, some of you may not be comfortable working with regexp's yet, so another method is to loop through line by line, and use string functions.

For example:

foreach line [split $body \n] {
  if {[string match "*We have served*" $line]} {
    set start [string first "We have served <b>" $line]
    set start [expr {$start + 18}] ;# 18 is the number of characters in "We have served <b>"
    set end [string first "</b> people to date!" $line]
    set end [expr {$end - 1}];# We don't want the "<" from "</b>"
    set served [string range $line $start $end]
  }
}


Note: Because of repeated patterns of text in HTML code, you may need to use a loop as well when using 'regexp' and make sure you are
at a position where you know the text you want is.


5. Final product

Putting together all of what we have discussed in this tutorial, we end up with something like this:

--------------------------------
# egghttp_example.tcl

# Config
set url "http://www.yourserver.tld/index.shtml"
set dcctrigger "example"
# End of config

if {![info exists egghttp(version)]} {
  putlog "egghttp.tcl was NOT successfully loaded."
  putlog "egghttp_example.tcl has not been loaded as a result."
} else {
  proc your_callbackproc {sock} {
    global url
    set headers [egghttp:headers $sock]
    set body [egghttp:data $sock]
  
    regsub -all "\n" $body "" body
    regsub -all -nocase {<br>} $body "<br>\n" body

    regexp {We have served <b>(.*)</b> people to date!} $body - served

    putlog "Website '$url' has served $served people so far."
  }

  bind dcc o|o $dcctrigger our:dcctrigger
  proc our:dcctrigger {hand idx text} {
    global url 
    set sock [egghttp:geturl $url your_callbackproc]
    return 1
  }  

  putlog "egghttp_example.tcl has been successfully loaded."
}
------------------------------------