Skip to main content

Web scraping with Javascript rendering

Getting started

Simply send a GET request to http://chrome.browsercloud.io with two query string parameters and the API will return the HTML response:

curl 'https://chrome.browsercloud.io/content?token=API_TOKEN&url=https://site.com'
ParameterAvailable valuesDescription
urlstring, required. Example : https://browsercloud.iosite URL you would like to scrape
tokenstring, required

Javascript rendering

Javascript rendering is enabled by default, you can use static web-crawler with disabled JS rendering using render parameter

curl 'https://chrome.browsercloud.io/content?token=API_TOKEN&url=https://site.com&render=false'
ParameterAvailable valuesDescription
renderfalse / true (enabled by default, you can omit this param)Disable / Enable JS rendering by real browser

Device

Our web crawler takes a unique UserAgent string to each request to avoid getting blocked. We use more than 1000 of the most popular desktop and mobile devices

curl 'https://chrome.browsercloud.io/content?token=API_TOKEN&url=https://site.com&device=mobile'
ParameterAvailable valuesDescription
devicemobile / desktop (enabled by default, you can omit this param)Choose the device type for UserAgent rotation in the request

Proxies & GEO targeting

Our standard proxy pools include millions of proxies from over dozens of ISPs and should be sufficient for the vast majority of scraping jobs.

You can also use geo-targeting by setting country parameter. Or omit parameter for global proxy rotation

curl 'https://chrome.browsercloud.io/content?token=API_TOKEN&url=https://site.com&country=US'
ParameterAvailable valuesDescription
proxy1) omit parameter to use common pool with 1.5+ million proxies
2) false - disable proxies ( for example when you need just JS rendering )
3) premium - premium proxy pool for a few particularly difficult to scrape sites
countryparameter works with 'common proxy pool'

1) omit parameter or set ALL for global rotating
2) two-letter country ISO code. Example: US, CA, GB, DE and more
3) EU proxy rotation over EU countries
Proxy geo targeting

Residential & Mobile Proxies

Our standard proxy pools include millions of proxies from over a dozen ISPs and should be sufficient for the vast majority of scraping jobs. However, for a few particularly difficult to scrape sites, we also maintain a private internal pool of residential and mobile IPs. This pool is only available by request.

Wait for Element when rendering

If a rendered request is a bit slow and the page stabilizes before the request is satisfied, it can fool the API into thinking the page has finished rendering.

To cope with this, you can tell the API to wait for a dom element (selector) to appear on the page when rendering. You just need to send the wait-for parameter, passing a URL-encoded jQuery selector.

The API will then wait for this to appear on the page before returning results.

curl 'https://chrome.browsercloud.io/content?token=API_TOKEN&url=https://site.com&wait-for=%23ajax-content'
ParameterAvailable valuesDescription
wait-forstring, Example: %23ajax-content (%23 is # symbol)URL-encoded selector. Requires JS rendering

Javascript execution

You can pass your custom Javascript code to run in the browser context using js_snippet param, and it will be executed after the page load will finish.

Custom javascript can be used for interaction with a page, like scrolling, pressing a button, etc.

curl -X POST \
https://chrome.browsercloud.io/content?token=API_TOKEN \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '
{
"url": "https://browsercloud.io/doc-examples/content.html",
"addScriptTag" : [
{
"content" : "let node = document.querySelector(\"#header-2\"); node.textContent = \"My custom JS did it!\""
}
]
}'

Additional parameters

Example with all available JSON options

curl -X POST \
https://chrome.browsercloud.io/content?token=API_TOKEN \
-H 'Cache-Control: no-cache' \
-H 'Content-Type: application/json' \
-d '
{
"url": "https://browsercloud.io/doc-examples/content.html",
"addScriptTag" : [
{
"content" : "let node = document.querySelector(\"#header-2\"); node.textContent = \"My custom JS did it!\""
}
],
"setJavaScriptEnabled" : true,
"waitFor" : "#delayed",
"userAgent" : "Mozilla/5.0 (iPhone; CPU iPhone OS 13_0 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.2 Mobile/15E148 Safari/604.1",
"rejectResourceTypes" : ["image"],
"authenticate" : {
"username" : "test",
"password" : "test"
},
"cookies" : [
{
"name" : "session",
"value" : "session-value",
"domain" : "browsercloud.io"
}
]
}'
ParametersAvailable valuesDescription
urlstringURL for web scraping
setJavaScriptEnabledtrue(default), false - javascript renderingJavascript rendering
waitForstringScript waits certain DOM element to be rendered
addScriptTag.contentstring - js codeAdds custom <script> tag to the page
userAgentstringsets custom UserAgent for a web scraper
rejectResourceTypesstring: 'document','stylesheet','image','media', 'font','script','texttrack','xhr', 'fetch','eventsource','websocket','manifest','other'Blocks unnecessary resource type to boost page load
authenticateusername, password: stringBasic auth
cookiesstringCustom cookies (for example: auth session)