Home Blog TOP web scraping libraries for Python, PHP, Ruby: web scraping tools open source
TOP Libraries for Web scraper Development
Python
Scrapy
An open source and collaborative framework for extracting the data you need from websites.
Scrapy project web site:http://scrapy.org/
Type
Framework
First release date
2008
Issues count
221
License
BSD License
Programming language
Python
Current version
1
Last release date
2015
Open source
Yes
BeautifulSoup
In a fast, simple, yet extensible way.
BeautifulSoup project web site: http://www.crummy.com/software/BeautifulSoup/
Last release date
2015
Open source
Yes
Type
Library
First release date
2004
Issues count
58
License
BSD License
Programming language
Python
Current version
4.4.1
mechanize (Python)
Stateful programmatic web browsing in Python, after Andy Lester's Perl module WWW::Mechanize .
mechanize (Python) project web site: https://github.com/jjlee/mechanize/
First release date
2010
Issues count
60
License
BSD-style License
Programming language
Python
Current version
0.2.5
Last release date
2011
Open source
Yes
Type
Library
Requests (Python)
Python HTTP Requests for Humans
Requests (Python) project web site: https://github.com/kennethreitz/requests/
Current version
2.9.1
Last release date
2015
Open source
Yes
Programming language
Python
First release date
2011
Issues count
70
License
Apache 2 License
html5lib
html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.
html5lib project web site: https://github.com/html5lib/html5lib-python
Type
Library
First release date
2013
Issues count
56
License
Any
Programming language
Python
Current version
1.0b8
Last release date
2015
Open source
Yes
urllib2
urllib2 extensible library for opening URLs
urllib2 project web site: https://docs.python.org/2/library/urllib2.html
First release date
1990
Open source
Yes
Programming language
Python
Current version
Stable
Last release date
2015
License
Python Software Foundation License
Type
Library
PHP
Requests (PHP)
Requests for PHP is a humble HTTP request library. It simplifies how you interact with other sites and takes away all your worries.
Requests (PHP) project web site: https://github.com/rmccue/Requests
Type
Library
First release date
2012
Issues count
29
License
ISC License
Programming language
PHP
Current version
1.6.1
Last release date
2015
Open source
Yes
Buzz
Buzz is a lightweight PHP 5.3 library for issuing HTTP requests.
Buzz project web site: https://github.com/kriswallsmith/Buzz
Type
Library
First release date
2010
Issues count
44
License
MIT License
Programming language
PHP
Current version
0,15
Last release date
2015
Open source
Yes
Guzzle
It is  a simple PHP Web Scraper
guzzle project web site: https://github.com/guzzle/guzzle
Programming language
PHP
Current version
6.1.1
License
Any
Type
Library
Open source
Yes
Goutte
Goutte is a web scraping library. It provides a nice API to crawl websites and extract data from the HTML/XML responses.
Goutte project web site: https://github.com/FriendsOfPHP/Goutte
First release date
2012
Issues count
40
License
MIT License
Programming language
PHP
Current version
3.1.0
Last release date
2015
Open source
Yes
Type
Library
Ruby
data_miner
Download, unpack from a ZIP/TAR/GZ/BZ2 archive, parse, correct, convert units and import Google Spreadsheets, XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses RemoteTable gem internally.
data_miner project web site: https://github.com/seamusabshere/data_miner
Type
Library
First release date
2009
Issues count
8
License
MIT License
Programming language
Ruby
Current version
3.0.0
Last release date
2014
Open source
Yes
pismo
pismo - Web page content analysis and metadata extraction
pismo project web site: https://github.com/peterc/pismo
Issues count
11
License
MIT License
Programming language
Ruby
Current version
0.7.4
Last release date
2013
Open source
Yes
Type
Library
First release date
2010
Nokogiri
Nokogiri ( 鋸 ) is an HTML, XML, SAX, and Reader parser with XPath and CSS selector support
Nokogiri project web site: https://github.com/sparklemotion/nokogiri
Last release date
2015
Open source
Yes
Type
Library
First release date
2008
Issues count
180
License
MIT License
Programming language
Ruby
Current version
1.6.8.rc1