Skip to main content

Web scraping with Javascript rendering

An easy page content getting, Javascript rendering is enabled by default. Useful for web scraping where real browser behavior is needed

Capture the content of https://browsercloud.io/test.html

curl -X POST \
https://chrome.browsercloud.io/content?token=API_TOKEN \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '
{
"url": "https://browsercloud.io/doc-examples/content.html"
}'

Example with all available JSON options

curl -X POST \
https://chrome.browsercloud.io/content?token=API_TOKEN \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '
{
"url": "https://browsercloud.io/doc-examples/content.html",
"addScriptTag" : [
{
"content" : "let node = document.querySelector(\"#header-2\"); node.textContent = \"My custom JS did it!\""
}
],
"setJavaScriptEnabled" : true,
"waitFor" : "#delayed",
"userAgent" : "Mozilla/5.0 (iPhone; CPU iPhone OS 13_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.2 Mobile/15E148 Safari/604.1",
"rejectResourceTypes" : ["image"],
"authenticate" : {
"username" : "test",
"password" : "test"
},
"cookies" : [
{
"name" : "session",
"value" : "session-value",
"domain" : "browsercloud.io"
}
]
}'
ParametersAvailable valuesDescription
urlstringURL for web scraping
setJavaScriptEnabledtrue(default), false - javascript renderingJavascript rendering
waitForstringScript waits certain DOM element to be rendered
addScriptTag.contentstring - js codeAdds custom <script> tag to the page
userAgentstringsets custom UserAgent for a web scraper
rejectResourceTypesstring: 'document','stylesheet','image','media', 'font','script','texttrack','xhr', 'fetch','eventsource','websocket','manifest','other'Blocks unnecessary resource type to boost page load
authenticateusername, password: stringBasic auth
cookiesstringCustom cookies (for example: auth session)