Diffbot web scraping tool develops machine learning and computer visual algorithms and public APIs for extracting data from web scraping. This tool allows software developers to analyze web home pages and articles pages and extract the information while ignoring element deemed not core to the primary content.

This web scraping tool has gained interest from its application of computer vision technology to web pages, wherein it visually parses a web page for important elements and returns them in a structured format. Diffbot has two APIs:

On-demanding processing of web pages. For example, this can be used to extract elements of a web page, while ignoring other features like ads or navigation elements.

A follow API, which is used to detect changes in a webpage and extract relevant information that can be used to illustrate the change.

By running them on the AWS cloud, Diffobot is able to focus resources on developing cutting-edge machine learning algorithms, rather than worrying about hardware failure. Utilizing AWS allows Diffbot to run on the same kind of world-class infrastructure that bid software use to operate their businesses. The resulting level of reliability, performance, and scale gained as a result would have been impossible to achieve by building out our own servers.

Diffbot APIs analyze a web page and return a Javascript Object Notation (JSON) object in real-time. The on-demand nature of some of its APIs means that traffic can spike throughout the day as new web pages are created across the web.

Diffbot monitors resources with Amazon CloudWatch and Auto Scaling with custom predictive logic in order to scale up its analysis fleet during periods of high demand. This allows Diffbot to maintain high performance regardless of the amount of traffic it receives. This software uses Amazon Machine Images(AMIs) to define images of worker roles, greatly simplifying deployment and rollback and Amazon Simple Storage Service to store the AMIs.

