Posts tagged with web crawling
I love data scraping :) So far I have heard about the following scraping tools that I will hopefully have time to look at more of them in the future:
PHP Scraping Tools
Java Scraping Tools
- CasperJS, PhantomJS
- Node.js request
Screen Scraping APIs
I have only used curl, simplehtmldom, tidy, casperjs/phantomjs. It is quite easy and straighforward. Diffbot...
In my experiment with CasperJS to extract the data from an aspx page, I faced some issues with dynamic drop-down. What happened is that there can be 2-3 dropdowns box that depend on each other e.g. User selects a category in dropdown1 , an AJAX request is triggered to create and populate sub-categories in dropdown2.
My first reaction to this problem is to use Chrome Network Tool to capture the POST request when the form is submitted to find out all the parameters. Then, I attempt to simulate this by filling the form with all the...
Recently, I have been playing around with CasperJS and PhantomJS for web scraping. I always find screen scraping fun and fascinating. I mean there are just so many applications:
We have bills/accounts all over the place in different websites. The scraping tools can be used to develop a program for personal use that can combine the results in a single place. It also can be used to trigger notifications e.g. bill payments reminder, manga notification, movies notification. The possibility is just endless :)
We want to find and compare the...
Just write my very first web spider which will first crawl mangafox and mangastream websites. Then, it emails me automatically about new mangas that I am currently following. This is fun :)
I basically make use of simple parsing functions, and the CURL and tidy extensions:
1) Get the HTML using the curl
public static function http($target, $ref, $method, $data_array, $incl_head)
# Initialize PHP/CURL handle
$ch = curl_init();
# Prcess data, if presented