Some of the data extraction features include disparate data collection, document extraction, email address extraction, image extraction, IP address extraction, phone number extraction, pricing extraction and web data extraction.
It has been noticed that the more dynamic a website is, the trickier it is to get Parsehub to automatically recognize patterns of information when you are doing selections. Its website has tutorials for how to work with more dynamic websites, but ultimately it is a bit of a learning curve to get things working right. Below is the way to use the website.
First, you open up the app then browse to the first page of the website you want to extract data from. As long as you have the browser arrow tool selected, you can navigate around and use Parsehub as a regular browser until you get the view you need. Lists tend to work best for extracting data, thus do a search for whatever data you need and bring it up as a list of results before starting data extraction commands. Itâ€™s okay for results to appear on several pages since Parsehub has an excellent way to navigate through them all.
Once you have got the first page, the first page you want to extract is the first column of data that you would have in your dataset. After selecting the names, I will instruct Parsehub to create a list from those names and extract them in my dataset. These can be done using the appropriately named list and extract tools. The next step is extracting the address and telephone information for each venue and I want Parsehub to recognize that the information is part of each individual. Therefore I will select the relative select tool and click on the venue name, then the associated address /telephone number to link to it. Lastly, I want to extract the accessibility for each information, for each venue. Since my information can still be seen when I hover over the icons next to my venues, therefore, Parsehub can still extract the information if the settings under the extract tools are changed.
The desktop application supports Windows, Mac OS X, and Linux. This web scraping tool has a web app that is built within the browser. You can get a free version and a paid version for massive data extraction.
Parsehub supports more systems as compared with Octoparse. It is also flexible when you want to scrape data that has different needs.
Parsehub works well with programs that have API access. The free version limits the users with only 5 projects and 200 pages per run.