Python
Scrapy
An open source and collaborative framework for extracting the data you need from websites.
Scrapy project web site:http://scrapy.org/
Type | Framework |
First release date | 2008 |
Issues count | 221 |
License | BSD License |
Programming language | Python |
Current version | 1 |
Last release date | 2015 |
Open source | Yes |
BeautifulSoup
In a fast, simple, yet extensible way.
BeautifulSoup project web site: http://www.crummy.com/software/BeautifulSoup/
Last release date | 2015 |
Open source | Yes |
Type | Library |
First release date | 2004 |
Issues count | 58 |
License | BSD License |
Programming language | Python |
Current version | 4.4.1 |
mechanize (Python)
Stateful programmatic web browsing in Python, after Andy Lester’s Perl module WWW::Mechanize .
mechanize (Python) project web site: https://github.com/jjlee/mechanize/
https://youtube.com/watch?v=p4dOPXWaeLI
First release date | 2010 |
Issues count | 60 |
License | BSD-style License |
Programming language | Python |
Current version | 0.2.5 |
Last release date | 2011 |
Open source | Yes |
Type | Library |
Requests (Python)
Python HTTP Requests for Humans
Requests (Python) project web site: https://github.com/kennethreitz/requests/
Current version | 2.9.1 |
Last release date | 2015 |
Open source | Yes |
Programming language | Python |
First release date | 2011 |
Issues count | 70 |
License | Apache 2 License |
html5lib
html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.
html5lib project web site: https://github.com/html5lib/html5lib-python
https://youtube.com/watch?v=dWlhrL1l3QU
Type | Library |
First release date | 2013 |
Issues count | 56 |
License | Any |
Programming language | Python |
Current version | 1.0b8 |
Last release date | 2015 |
Open source | Yes |
urllib2
urllib2 extensible library for opening URLs
urllib2 project web site: https://docs.python.org/2/library/urllib2.html
First release date | 1990 |
Open source | Yes |
Programming language | Python |
Current version | Stable |
Last release date | 2015 |
License | Python Software Foundation License |
Type | Library |
PHP
Requests (PHP)
Requests for PHP is a humble HTTP request library. It simplifies how you interact with other sites and takes away all your worries.
Requests (PHP) project web site: https://github.com/rmccue/Requests
Type | Library |
First release date | 2012 |
Issues count | 29 |
License | ISC License |
Programming language | PHP |
Current version | 1.6.1 |
Last release date | 2015 |
Open source | Yes |
Buzz
Buzz is a lightweight PHP 5.3 library for issuing HTTP requests.
Buzz project web site: https://github.com/kriswallsmith/Buzz
Type | Library |
First release date | 2010 |
Issues count | 44 |
License | MIT License |
Programming language | PHP |
Current version | 0,15 |
Last release date | 2015 |
Open source | Yes |
Guzzle
It is  a simple PHP Web Scraper
guzzle project web site: https://github.com/guzzle/guzzle
Programming language | PHP |
Current version | 6.1.1 |
License | Any |
Type | Library |
Open source | Yes |
Goutte
Goutte is a web scraping library. It provides a nice API to crawl websites and extract data from the HTML/XML responses.
Goutte project web site: https://github.com/FriendsOfPHP/Goutte
First release date | 2012 |
Issues count | 40 |
License | MIT License |
Programming language | PHP |
Current version | 3.1.0 |
Last release date | 2015 |
Open source | Yes |
Type | Library |
Ruby
data_miner
Download, unpack from a ZIP/TAR/GZ/BZ2 archive, parse, correct, convert units and import Google Spreadsheets, XLS, ODS, XML, CSV, HTML, etc. into your ActiveRecord models. Uses RemoteTable gem internally.
data_miner project web site: https://github.com/seamusabshere/data_miner
Type | Library |
First release date | 2009 |
Issues count | 8 |
License | MIT License |
Programming language | Ruby |
Current version | 3.0.0 |
Last release date | 2014 |
Open source | Yes |
pismo
pismo – Web page content analysis and metadata extraction
pismo project web site: https://github.com/peterc/pismo
Issues count | 11 |
License | MIT License |
Programming language | Ruby |
Current version | 0.7.4 |
Last release date | 2013 |
Open source | Yes |
Type | Library |
First release date | 2010 |
Nokogiri
Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser with XPath and CSS selector support
Nokogiri project web site: https://github.com/sparklemotion/nokogiri
Last release date | 2015 |
Open source | Yes |
Type | Library |
First release date | 2008 |
Issues count | 180 |
License | MIT License |
Programming language | Ruby |
Current version | 1.6.8.rc1 |