If you want to create a script for scraping pages in JavaScript, then Selenium and Playwright are suitable tools for you. In this article, we will take a closer look at their differences and identify their advantages.
Playwright
Playwright is a Node.js library for automating Chromium, Firefox and WebKit with a single API. This framework for E2E testing of applications integrates well with any applications and sites and also offers a wide range of customization.
The main advantages of Playwright include:
- Cross-browser compatibility: Playwright supports all modern render engines, including Chromium, Firefox and WebKit.
- Cross-platform: You can test under Windows, Linux, macOS, locally or on a continuous integration server, with or without a graphical interface.
- Multilingual: Use the Playwright API in TypeScript, JavaScript, Python, .NET or Java.
- Automatic waits: Before running the test, Playwright waits for the required elements to become available. This eliminates the need for artificial timeouts.
- Advanced Features: Playwright supports multi-page scripts, network capture, and other features that make your workflows faster and easier.
Playwright, although it has clear advantages, has several disadvantages:
- High Resource Waste: While performing complex tasks or rendering, Playwright may require significant system resources. Especially if you need to run multiple instances.
- Browser Specifics: Use of some features varies depending on the browser you are running.
- Need to Know Node.js: If you are not familiar with Node.js asynchronous patterns, you will need some time to get the hang of them. Without knowing the patterns, Playwright is quite difficult to use.
- Risk of Detection: Even if you use the most advanced technology, your actions may be detected by bot detection systems. Unfortunately, such a risk cannot be excluded.
Selenium
Selenium is an open source test automation framework for web applications. It is designed to test the functional aspects of web applications across different browsers and platforms.
Selenium has several advantages:
- Cross-Browser Compatibility: Selenium scripts can run in multiple browsers, such as Chrome, Firefox, IE and Opera, allowing for easy cross-browser testing.
- Multi-language support: Provides the flexibility to write tests in multiple programming languages. These include Java, C#, Python, JavaScript, Ruby and Kotlin.
- Open Source and Community Support: Being an open source product, it is free to use and supported by a large community, providing extensive resources and assistance.
- Flexibility: Test scripts easily integrate with other tools such as TestNG, JUnit for test case management, and Maven, Jenkins for continuous integration.
Selenium, while having advantages, still has several disadvantages:
- Handling dynamic content: Difficulty testing web applications with dynamic content, requiring additional tools or frameworks to effectively manage such scenarios.
- Depends on web drivers: To use Selenium when parsing pages, you need to install additional drivers that are optimized for the browser you are running.
- Slowness: In comparison, headless browsers are slower due to the difficulty of operating a full-fledged browser.
- Resource-Intensive: Like Playwright, Selenium is resource-intensive when running complex tasks and multi-page scripts.
Setup and ease of use
Both Playwright and Selenium are multilingual. They support multiple programming languages using a single API. Before you start using these frameworks, you need to download the binding library for the programming language in which you write your scripts. For example, in Python, Playwright uses the pytest-playwright
library, or when using Selenium, the selenium
library.
However, Selenium requires one more step: downloading the WebDriver for the browser you are using. For example, ChromeDriver for Chrome. Playwright, unlike Selenium, has one driver and supports all browsers equally effectively. In Playwright, you only need to use one command: playwright install
, and all the necessary files will be installed at the same time.
After installation and configuration, the operating principle of both libraries is almost the same. However, in the early stages, Playwright will still be more understandable to beginners. It offers extensive customization options that will help you write some simple scripts.
Well, the Playwright documentation is more complete and easier to understand than the Selenium documentation.
Suggested Features
Both Playwright and Selenium offer all the necessary functions for finding the location of basic elements. You can find elements using CSS or XPath selectors:
# Playwright
heading = page.locator('h1')
accept_button = page.locator('//button[text()="Accept"]')
# Selenium
heading = driver.find_element(By.CSS_SELECTOR, 'h1')
accept_button = driver.find_element(By.XPATH, '//button[text()="Accept"]')
Playwright supports additional locators that allow you to retrieve data such as text, placeholder, title, and role. Locators help both experienced developers and beginners who cannot yet obtain these locators using selectors:
accept_button = page.get_by_text("Accept")
When parsing web applications, it is important to correctly calculate the specified time for executing the script. Actions should not be performed on elements that have not yet appeared, and elements should not take too long to load. To control this, Selenium uses explicit wait statements. For example, they will tell the script to wait until the page has fully loaded before allowing it to complete the task:
el = WebDriverWait(driver, timeout=3).until(lambda x: x.find_element(By.TAG_NAME,"button"))
el.click()
Playwright is a little more thoughtful. Before performing any action on elements, Playwright performs a series of sanity checks. That is, you won't be able to click on an element that isn't already visible:
page.get_by_role("button").click()
In addition to the main functions, Playwright and Selenium also offer several additional ones. For example, Playwright Inspector allows you to review scripts and see where they go wrong. This means you won't have to re-run it several times in a row. Playwright also offers a code generator that allows you to write scripts without searching for selectors in HTML. The generator records the sequence of actions you perform and immediately writes the code. For beginners, this is a way to quickly become familiar with Playwright's functionality. Experienced developers can use code to customize the actions that happen before parsing. For example, log into your account. Selenium has a playback and recording tool called Selenium IDE. It is available as an additional browser extension for Chrome and Firefox. Selenium IDE combines both the capabilities of the Playwright Inspector and the capabilities of a code generator.
Flexibility and performance
Multilingualism has already been mentioned among the advantages of Playwright and Selenium. Playwright supports JavaScript/TypeScript, Java, Python and C#. Selenium supports Java, C#, Python, JavaScript, Ruby and Kotlin. And these are only the languages that Playwright and Selenium officially support. However, you can use unofficial binding libraries. In this, we note that Selenium has surpassed Playwright. Most programming languages have unofficial binding libraries for them. Playwright is considered faster than Selenium. The developers have done serious work and optimized Playwright. Thanks to this, Playwright allows for fast script execution and simplifies parallelization. Both Playwright and Selenium support contexts, which replicate the principle of incognito mode in the browser. That is, through the bottom, you can launch several independent sessions in Chrome or any other browser. However, in Playwright you can run multiple contexts in parallel, and the parsing will be faster than through Selenium.
Conclusion
The comparison shows that Playwright is better for beginners. It is easy to learn and helps in writing your first scripts. However, in other respects, Playwright and Selenium are almost equally effective. For example, both libraries easily integrate with the Bright Data proxy, which is used for web scraping.