If you read this article it means that you have issue with puppeteer proxy with auth.
It is really interesting case because by default puppeteer does not support this scenario and you have to care about that itself. I will show below code samples that will help you to manage that.
What Can You Do With Puppeteer?
Here are some of the things that you can do with a Puppeteer:
- Create page screenshots and PDFs.
- Crawl an SPA (Single-Page Application) to generate pre-rendered content.
- Perform form submission automation, user interface testing, keyboard input, and other comparable tasks.
- Create a current, automated testing environment for your Chrome tests using the most recent JavaScript and browser features
- Conduct runtime analysis to identify performance issues.
- Experiment with various Chrome Extensions.
What is Proxy Server?
A proxy server acts as a gateway between the client browser and the internet.
Simply put, you browse the internet through a proxy rather than directly requesting resources from the websites you visit. Your browser sends a request to the proxy server, which then passes the request to the website. The website delivers the information to the proxy, and the proxy sends it back to you.
The primary benefit of using a proxy server to browse the web is that your IP address remains hidden, and websites can’t trace the request’s origin.
Why it does not work by default:
1. puppeteer uses by default chromium , and chromium supports only proxies without auth.
2. format that should be supported is the next: https://user:password@host:port.
3. chromium supports only the next format https://host:portfor proxies.
Find below the code that allows us to parse proxy with auth:
[code lang=”js”]
function chromeProxyParse(chromeProxy)
{
var r = new Object();
r.ProtocolHostAndPort = chromeProxy;
r.User = null;
r.Password = null;
if (chromeProxy.includes("@"))//http://user:pwd@3.8.19.10:9
{
var parts = chromeProxy.split(‘:’).join(‘,’).split(‘@’).join(‘,’).split(‘/’).join(‘,’).split(‘,’);
r.ProtocolHostAndPort = parts[0] + "://" + parts[5] + ":" + parts[6];
r.User = parts[3];
r.Password = parts[4];
}
var json = JSON.stringify(r, null, 4);
log(json);
return r;
}
[/code]
ok, we did the main of this task. The next step is to apply this proxy for your active puppeteer instance. Method 1 via page.authenticate
[code lang=”js”]
await page.authenticate({
username: proxyuserName,
password: proxyPassword,
});
[/code]
I want to say that I copies similar code from stackoverflow and it did not work. My code I published here is actual 100%.
[code lang=”js”]
const basic = "Basic " + u + ":" + p;
await page.setExtraHTTPHeaders({
‘Proxy-Authorization’: basic,
Authorization: basic,
});
[/code]
We all know the frustration of trying to web scrape behind a proxy. It’s like being in a maze – every time you think you’ve found the exit, you hit another wall.
Luckily, there are some tools and tricks that can help you get around this hurdle. In this article, we’ll show you how to use Puppeteer with a proxy. First, let’s take a look at some of the common issues you might encounter when using Puppeteer with a proxy. Then, we’ll share some tips on how to overcome them.
Common Issue
One of the most common issues is having Puppeteer return an error when trying to connect to the proxy. This can happen for a number of reasons, but the most likely cause is that your proxy requires authentication.
Tips and Tricks
Now that we’ve covered an issue that you might have when utilizing Puppeteer with a proxy, let’s go over some tips and tricks for getting around it.
A rotating proxy is a helpful tip. This will help to avoid any problems that may occur from using a single proxy for a lengthy period of time. You can use one of several rotating proxy providers, or you can create your own.
A VPN, in addition to a proxy, is another useful tip. This will help to conceal your identity even further and make it more difficult for websites to restrict your IP address. You can use one of several VPN companies, or you can set up your own. Hopefully, these pointers and tactics will assist you in making the most of Puppeteer with a proxy.
A bit about proxies
Proxies have numerous functions, but one of the most important in the scraping and automation world is to deal with the various protections that some sites may employ. Some websites, for example, can determine if an IP address is originating from a server or from a genuine human being on their computer/phone and therefore deny the request from the server.
Normally, when using a proxy requiring authentication in a non-headless browser (specifically Chrome), you’ll be required to add credentials into a popup dialog that looks like this:
The problem with running in headless mode is that this dialog never even exists, as there is no UI in a headless browser
I did not test this method never, so not sure that it works right. If so write about that in comments.