Running Puppeteer Scenarios by REST API for Web scraping
This feature is used for more advanced ways of web scraping. Interact with the webpage you want to scrape using the Puppeteer automation tool.
This method allows running full-featured JavaScript scenarios with Puppeteer wrapped into a single POST request which can be made in any language from your side.
▸ Open Scenario Builder - a convenient builder that helps create and convert Puppeteer scrips to a POST request
What can be made with Puppeteer:
- Automated form submission
- Keyboard input
- Authorization / Login
- Mouse clicks
- Custom JavaScript execution
- Waiting for CSS elements to appear
- Extracting data by CSS selector
- Page scrolling
- XHR/AJAX requests interception
The /scenario
endpoint takes data from context
variable and and runs code from code
variable.
Puppeteer Example
Script:
export default async function ({ page, context }) {
const { url } = context; // Read the `url` from context
await page.goto( // Docs: https://pptr.dev/api/puppeteer.page.goto
url,
{waitUntil: 'domcontentloaded'}
);
const data = await page.content();
return {
data,
type: 'application/html',
};
};
Context (variables passed to the script):
{
"url": "https://en.wikipedia.org"
}
The code above was minified with the online babel repl or jscompress.com, so being unable to have multi-line strings in JSON, you still can use it in the following curl call:
Final POST request
curl -X POST \
'https://chrome-v2.browsercloud.io/scenario?token=API_TOKEN' \
-H 'Content-Type: application/json' \
-d '{
"code": "export default async function ({ page, context }) { const { url } = context; await page.goto( url, {waitUntil: \"domcontentloaded\"} ); const data = await page.content(); return { data, type: \"application\/html\",};};",
"context": {
"url": "https://en.wikipedia.org/"
}
}'
Example: Logging in and getting data as JSON
As an example, we took bw-bank.de demo page to show how to log in to an account to parse data. The 'context' variable is an empty object (because the URL is inside the code)
puppeteer scenario:
export default async function ({ page, context }) {
await page.goto(
'https://www.bw-bank.de/en/home/login-online-banking/demo-online-banking-pushtan.html',
{waitUntil: 'domcontentloaded'}
);
// Fill inputs & click Login
await page.type('input[autocomplete="username"]', 'pushDEMO');
await page.type('input[type=password]', '12345');
await page.click('[title="Log in"]');
// Waiting page load
await page.waitForSelector('.mkp-card-group');
// Fetching data using JavaScript exec on the page
let payments = await page.evaluate(() => {
let result = [];
let elements = document.querySelectorAll('span.offscreen'); // get elements by selector
for (i=0; i<elements.length; i++) { // iterate over elements
result.push(elements[i].innerText);
}
return result; // returning data to 'payments' variable
})
return {
data: payments,
type: 'application/json',
};
};
Scenario wrapped into request
- Curl
curl --request POST 'https://chrome-v2.browsercloud.io/scenario?token=API_TOKEN' \
--header 'Content-Type: application/json' \
--data-raw '{"code":"export default async function({page,context}){await page.goto(\"https://www.bw-bank.de/en/home/login-online-banking/demo-online-banking-pushtan.html\",{waitUntil:\"domcontentloaded\"});await page.type('\''input[autocomplete=\"username\"]'\'',\"pushDEMO\");await page.type(\"input[type=password]\",\"12345\");await page.click('\''[title=\"Log in\"]'\'');await page.waitForSelector(\".mkp-card-group\");let payments=await page.evaluate(()=>{let result=[];let elements=document.querySelectorAll(\"span.offscreen\");for(i=0;i<elements.length;i++){result.push(elements[i].innerText)}return result});return{data:payments,type:\"application/json\"}}","context":{"url":"https://wikipedia.org/"}}'
Result
{
"data": [
"23.825,53 EUR",
"1.000,00 EUR",
"-125,50 EUR",
"18.235,00 EUR",
"1.000,00 USD",
"-880,00 EUR",
"1.897,45 EUR",
"2.378,90 EUR",
"97.458,32 USD",
"558,91 EUR",
"26,12 EUR",
"23.825,53 EUR",
"52.000,00 EUR",
"2.000,00 EUR",
"15.000,00 EUR",
"35.000,00 EUR",
"52.000,00 EUR",
"145.550,80 EUR",
"47.473,85 EUR",
"98.076,95 EUR",
"145.550,80 EUR",
"-36.613,93 EUR",
"-36.613,93 EUR",
"-9.922,44 EUR",
"3.172,56 EUR",
"-13.095,00 EUR",
"-9.922,44 EUR",
"510.000,00 EUR",
"450.000,00 EUR",
"60.000,00 EUR",
"510.000,00 EUR",
"684.839,96 EUR"
],
"type": "application/json"
}
Session Timeout
Puppeteer / Playwright
By default session timeout is 30 seconds. You can set your value (&timeout=<VALUE>
in milliseconds) if it is needed for your script
// 60-second limit:
https://chrome-v2.browsercloud.io/scenario?token=API_TOKEN&timeout=60000