Skip to main content

Why Puppeteer is more popular than Selenium

· 5 min read
Mark

Is Selenium worth using? Or is it better to use a more modern library like Puppeteer? Our platform supports both Selenium and Puppeteer. The choice of our clients is influenced by various factors: company culture, tools and languages, and more. However, we always recommend using Selenium. Now we'll tell you why.

Selenium is an HTTP-based JSON API

It's better to choose puppeteer or any CDP-based (Chrome DevTools Protocol) library. But what exactly makes Selenium so inconvenient? Let's find out with a specific example. Let's take the example of loading a website, clicking a button, and getting the title. Puppeteer allows you to do this over a single socket connection. It starts and ends as soon as we connect, and then closes. However, Selenium 6+ HTTP JSON payloads.

  • Selenium routes every HTTP call through a standard TCP handshake, boosts its speed, and forwards it to the final location. You need to check if the settings are set; otherwise, it will take extra time.
  • Selenium makes a lot of API calls. Each of these calls has its own "batch" patterns, which are difficult to rate limits on.
  • Selenium makes it difficult to execute load balancing and round robin queries. To make the request complete, you will need sticky sessions. This is an algorithm that distributes the load not only by the number of connections to servers but also by the IP addresses of network elements. It is possible that you will have to create this algorithm yourself.
  • Selenium does have a binary somewhere that simply sends a CDP message to Chrome. So why do users have to interact with all this HTTP stuff? As time passes, you will have to put in some serious effort to understand how Selenium works. Well, if you use puppeteer, then you don’t have to learn all its capabilities and operating principles from scratch. You can immediately use almost any load balancer (nginx, apache, envoy, etc.). In general, Selenium requires some specialized knowledge, and libraries like puppeteer and playwright allow you to get up and running quickly.

Selenium Requires more Binaries to Track

Both puppeteer and playwright launch with the corresponding version of their browser. All you have to do is start using them... and everything will just work. Well, Selenium will complicate your life. You will have to figure out on your own which version of chromedriver corresponds to which version of Chrome... which version of Selenium you are using can work with. That is, at least three stages at which your integration can break down. Finally, let's remember Selenium Grid, which will also give you headaches if you don't keep an eye on it. In general, all this is a clear disadvantage of Selenium in comparison with more universal and accessible tools.

In Selenium Basic Things are more Complicated

If you are using Selenium, you will have to face problems with basic things as well. For example, you want to add headers to the browser. You'll need this to load test your site or to apply a header to certain authenticated network requests. So you run Selenium, and in order for the proxy to work or to use a proxy with authentication, you need additional drivers or a plugin. And so you spend extra time searching for and installing them. Both puppeteer and playwright simply have drivers or plugins in their libraries. That is, once again, Selenium loses convenience to more universal libraries.

You Can Customize a lot more in Selenium

It's difficult to set up a simple script in Selenium to do something. This is because Selenium caters to many browsers. Here's an example: Selenium's retrieval of the name example.com in NodeJS looks like this:

const { Builder, Capabilities } = require('selenium-webdriver');

(async function example() {
const chromeCapabilities = Capabilities.chrome();
chromeCapabilities.set(
'goog:chromeOptions', {
'prefs': {
'homepage': 'about:blank',
},
args: [
'--headless',
'--no-sandbox',
],
}
);


let driver = new Builder()
.forBrowser('chrome')
.withCapabilities(chromeCapabilities)
.usingServer('http://localhost:3000/webdriver')
.build();

try {
await driver.get('http://www.example.com/');
console.log(await driver.getTitle());
} catch(e) {
console.log('Error', e.message);
} finally {
await driver.quit();
}
})();

Approximately 35 lines of code. It looks good, but completely loses if you compare the script with another library. Let's take puppeteer. He will need about half as much code time:

const puppeteer = require('puppeteer');

(async function example () {
let browser;
try {
browser = await puppeteer.connect({
browserWSEndpoint: 'ws://localhost:3000',
});
const page = await browser.newPage();
await page.goto('https://example.com');
console.log(await page.title());
} catch(e) {
console.log('Error', e.message);
} finally {
await browser.close();
}
})();

Novice developers and users who write simple scripts may not be familiar with the features we wrote about above. And this is normal, because the capabilities described above are not needed for simple and basic things. However, if you start using larger deployments with different browsers and their capabilities, then Selenium will become a headache for you.

So Selenium or...?

Of course, we wrote above about the disadvantages of Selenium. In some ways, it is inferior to more modern libraries. Puppeteer and playwright are more extensive, have a simpler configuration algorithm, and are more flexible in their use. To work with them, you do not need specialized software and so on. And they integrate more easily with other technologies. All these are clear advantages. However, Selenium is still popular. All this is because it has a simple and clear API. Thanks to it, Selenium has abstracted all the different browsers, their respective protocols, and integration issues. Many large projects use Selenium but hide it. And the problems associated with its use are solved for you, so that they do not seriously bother you.