• List of open source screen scraping tools

    I love data scraping :) So far I have heard about the following scraping tools that I will hopefully have time to look at more of them in the future:

    PHP Scraping Tools

    Java Scraping Tools

    Javascript

    • CasperJS, PhantomJS
    • Readability
    • Node.js request
    • Cheerio

    Screen Scraping APIs

    I have only used curl, simplehtmldom, tidy, casperjs/phantomjs. It is quite easy and straighforward. Diffbot...

    Read More
  • Funny music and talent videos

    After a really long day, it is time for us to relax. I just finished watching James Bond - Skyfall movie. It was decent. The show lasted for more than 2 hours.

    Time to relax:

    1) Funny video about Michael Lang, a guy who imitates Michael Jackson. This is just hilarious.

    2) The shortest talent drama ever. It lasts for 2 seconds :))

    3) Rotating hands talent show :))

    Read More
  • Understand Simple & Effective Search Engine Optimizations

    Recently, I have read about search engine optimizations. I am quite new to this game :) It is quite interesting. Previously I was only aware about on-site optimization techniques. Basically, we could optimize the followings for our website:

    1. Friendly URLs containing keywords
    2. Friendly HTML titles, meta tags, keywords & description
    3. Image "Alt" tag
    4. Hyperlink anchors to contain the keywords
    5. Include keywords in headers (h1, h2,h3)
    6. Do internal linking
    7. Have a proper sitemap.xml
    8. Put javascript in external files.

    The more interesting aspect...

    Read More
  • Dealing with dynamic dropdowns with CasperJS

    In my experiment with CasperJS to extract the data from an aspx page, I faced some issues with dynamic drop-down. What happened is that there can be 2-3 dropdowns box that depend on each other e.g. User selects a category in dropdown1 , an AJAX request is triggered to create and populate sub-categories in dropdown2.

    My first reaction to this problem is to use Chrome Network Tool to capture the POST request when the form is submitted to find out all the parameters. Then, I attempt to simulate this by filling the form with all the...

    Read More
  • Fun scraping with casperjs and phantomjs

    Recently, I have been playing around with CasperJS and PhantomJS for web scraping. I always find screen scraping fun and fascinating. I mean there are just so many applications:

    1. We have bills/accounts all over the place in different websites. The scraping tools can be used to develop a program for personal use that can combine the results in a single place. It also can be used to trigger notifications e.g. bill payments reminder, manga notification, movies notification. The possibility is just endless :)

    2. We want to find and compare the...

    Read More