At previous articles we learned how to create a simple web scraper using NodeJs and axios.
Axios Node.Js module allows us to load html page source. Now we have to extract required fields from html and we will do that via Cheerio module for NodeJs.
Lets make this function for script where we extracted html page source by product urls
Lets write extraction code for this test page
Here is part of html with sku,price,title, currency
[code lang=”html”]
<div class="nv-content-wrap entry-content">
<table width="200px">
<tbody>
<tr>
<td>Sku:
</td>
<td>
<div class="sku">
testSku
</div>
</td>
<td>Price:
</td>
<td>
<div class="price-value">
123
</div>
</td>
<td>
Currency:
</td>
<td>
<div class="price-currency">
USD
</div>
</td>
</tr>
</tbody>
</table>
</div>
[/code]
here is a code for fields extraction
[code lang=”js”]
async function extractData(html)
{
const cheerio = require(‘cheerio’);
const $ = cheerio.load(html);
var sku = $(‘.sku’).first().html().trim();
var title = $(‘h1’).first().html().trim();
var price = $(‘.price-value’).first().html().trim();
var currency = $(‘.price-currency’).first().html().trim();
console.log(‘sku:’ + sku);
console.log(‘title:’ + title);
console.log(‘price:’ + price);
console.log(‘currency:’ + currency);
}
[/code]
This cheerio code sample from visual studio:
Hope this sample will help you to extract required for you fields from source sites / html pages.