Downloading and searching a web page...

Hey. I just got into Sidebar development and have a question. I've already got my gadget displaying nicely, but have run into problems making it functional.

My Javascript is really rusty, so bear with me here. I'd like to be able to download a web page's source (HTML, generated HTML from a PHP page, whatever, not like the actual scripted source or anything, just what the end user sees) and then be able to sift through it for a certain element. Specifically, I'd like to download the index of http://www.charas-project.net and find the first link after the source:

<span class='title'>Last 10 Forum Discussions</span><br><hr>

I can probably find the string handling functions to do the latter myself, but the downloading to a string part is vexing me. I can't find anything via web searches or help sites.

Any help would be greatly appreciated


Answer this question

Downloading and searching a web page...

  • geoff hirst

    osmose1000 you think you can give a sample of your project I am also setting something similar up I think it would be a good refrence guide :)


  • Shippa

    Now it's complaining that xmlReq doesn't exist. I tried declaring it as a new XMLHttpRequest outside of the function and it didn't help. Hum.

    EDIT: I got it to work without errors, except it's getting stuck at the if statement in the handler - anything outside of the if that tests the state of xmlReq works fine, but anything inside doesn't. Any suggestions

  • Shadow Chaser

    Is that code in the gadget.html file or is it a seperate script .js

    Because when i load the code in my gadget.html file i get no output


  • Shaka

    Ah. Thanks for the tip. There is a problem, however. I used your code exactly as it is above like so:

    getURL('http://www.charas-project.net/index.php');

    And it returned an error saying xmlReq.readyState is null or does not exist. Is there something wrong with how I'm calling it

  • j2associates

    It should go into the HEAD section, so either in the Gadget.html or an external JS - it doesn't matter.

  • Coggsa

    After reading that last code snipper I tried having it check if the readyState is greater than or equal to 4 instead of just if it's equal to 4, and it's working after a delay - I'm assuming this is a sign that it is indeed downloading the webpage, but now I can't test it - I'm attempting to use a flyout, but it can't find the div tag I'm trying to reference. I've gotten the flyout to appear before with HTML in it, but now it's giving me an error that it cannot find the tag with the "content" ID and then showing nothing.

    PS: The search in the page is for cutting out the part that I need, and the commented out code is for how I'm hoping to extract the info, but if there's a better way, please mention it.

    Main gadget file:

    System.Gadget.Flyout.file = "flyout.html";

    function getURL(url) {
    xmlReq = new XMLHttpRequest();
    xmlReq.open("GET", url);
    xmlReq.setRequestHeader("If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT");
    xmlReq.onreadystatechange = retrievedData;
    xmlReq.send(null);
    }


    function retrievedData() {

    if (xmlReq.readyState >= 4 && xmlReq.status == 200) {
    urlData = xmlReq.responseText;
    listpos = 62 + urlData.search('Last 10 Forum Discussions');
    /*endlistpos = 13 + urlData.search('<//a>)<br><br>');*/
    System.Gadget.Flyout.show = true;
    System.Gadget.Flyout.document.getElementById("content").innerHTML = listpos;
    /*urlData.slice(listpos,endlistpos);*/

    }
    }

    flyout.html:

    <html>
    <head>
    <title>Charas Top Ten</title>
    </head>
    <body>
    <div id="content"></div>
    </body>
    </html>

    EDIT: Aha! I wass trying to set something in the flyout before it was displayed! I dug through the forums for awhile and found something, and now the flyout is popping out quite nicely.

    Thanks for all your help. If anyone still has a better method for cutting out that list than what I have, you're welcome to share it.


  • pu132

    Use XMLHttpRequest to get the URL. There's plenty of examples on here. eg:

    var xmlReq;

    function getURL(url) {
    xmlReq = new XMLHttpRequest();
    xmlReq.open("GET", url);
    xmlReq.setRequestHeader("If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT");
    xmlReq.onreadystatechange = retrievedData;
    xmlReq.send(null);
    }


    function retrievedData() {
    if (xmlReq.readyState == 4 && xmlReq.status == 200)
    {

    var urlData = xmlReq.responseText;

    //Do something with urlData
    }
    }

    EDIT: Corrected code above

  • Ricardo Pinto

    Is there an error being reported

    You could try splitting up the if to see where it's failing:

    var xmlReq;

    function getURL(url) {
    xmlReq = new XMLHttpRequest();
    xmlReq.open("GET", url);
    xmlReq.setRequestHeader("If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT");
    xmlReq.onreadystatechange = retrievedData;
    xmlReq.send(null);
    }


    function retrievedData() {
    if (xmlReq.readyState < 4)
    return;

    if (xmlReq.status == 200)
    {

    var urlData = xmlReq.responseText;

    //Do something with urlData
    }
    }


  • Mark The Archer Evans

    Sorry, my mistake.  Remove the var declaration on the xmlReq line above:

    function getURL(url) {
    xmlReq = new XMLHttpRequest();
    ...


  • Downloading and searching a web page...