Documentation
Getting Started
You can start scraping any website in literally 1 minute. Our Web Scraping API will provide you with a seamless data pipeline which is almost 99.99% unbreakable.
All REST API calls will return JSON or HTML
results.
Our API endpoint is: https://api.scrapingdog.com/scrape
Our Proxy Port is: http://scrapingdog:[email protected]:8081
Built for Developers
- Each request will be retried until it can be successfully completed (up to 60 seconds). Remember to set your timeout to 60 seconds to ensure this process goes smoothly. In cases where every request fails in 60 seconds we will return a 500 error, you may retry the request and you will not be charged for the unsuccessful request (you are only charged for successful requests, 200 and 404 status codes). Make sure to catch these errors! They will occur on roughly 1-2% of requests for hard to scrape websites.
- There is no overage allowed on the free plan, if you exceed 1000 requests per month on the free plan, you will receive a 403 error.
- Each request will return a string containing the raw html from the page requested, along with any headers and cookies.
- If you exceed your plan concurrent connection limit, the API will respond with a 429 status code, this can be solved by slowing down your request rate
Overview
Parameters [type] (default) | Description | Learn more |
---|---|---|
api_key [string] (required) | This is your API key | learn more |
url [string] (required) | The URL of the page you want to scrape | learn more |
country [string] ("") | Premium residential proxy location | learn more |
premium [string] ("") | Use premium proxies to bypass difficult to scrape websites (10-25 credits/request) | learn more |
dynamic [boolean] (true) | Render the JavaScript on the page with a headless browser (5-25 credits/request) | learn more |
wait [integer] (0) | You can wait for 0 to 35 seconds for a heavy website to load. | learn more |
custom_headers [boolean] (false) | Forward particular headers to the webpage, as well as other headers generated by ScrapingDog. | learn more |
api_key [string] (required) | This is your API key for Google API | learn more |
query [string] (required) | query string for Google API. | learn more |
results [string] (required) | number of google search results you need (1-100) | learn more |
country [string] ("") | Location for google search results | learn more |
api_key [string] (required) | This is your API key for Linkedin API | learn more |
type [string] (required) | It could be profile or company | learn more |
listing [boolean] (required) | It could be true or false | learn more |
linkId [string] (required) | This will be the id for a profile/company | learn more |
api_key [string] (required) | This is your API key for Screenshot API | learn more |
url [string] (required) | The url of the page you want to take screenshot of. | learn more |
field [string] (required) | The type of job you want to scrape. | learn more |
geoid [string] (required) | The unique id of any location in linkedin. | learn more |
page [string] (required) | The page number of linkedin jobs. | learn more |
parsed [boolean] (required) | To tell scraper to return data in JSON. | learn more |
Parameters [type] (default) | Description | Learn more |
---|---|---|
api_key [string] (required) | This is your API key | learn more |
url [string] (required) | The URL of the page you want to scrape | learn more |
country [string] ("") | Premium residential proxy location | learn more |
premium [string] ("") | Use premium proxies to bypass difficult to scrape websites (10-25 credits/request) | learn more |
dynamic [boolean] (true) | Render the JavaScript on the page with a headless browser (5-25 credits/request) | learn more |
wait [integer] (0) | You can wait for 0 to 35 seconds for a heavy website to load. | learn more |
custom_headers [boolean] (false) | Forward particular headers to the webpage, as well as other headers generated by ScrapingDog. | learn more |
api_key [string] (required) | This is your API key for Google API | learn more |
query [string] (required) | query string for Google API. | learn more |
results [string] (required) | number of google search results you need (1-100) | learn more |
country [string] ("") | Location for google search results | learn more |
api_key [string] (required) | This is your API key for Linkedin API | learn more |
type [string] (required) | It could be profile or company | learn more |
listing [boolean] (required) | It could be true or false | learn more |
linkId [string] (required) | This will be the id for a profile/company | learn more |
api_key [string] (required) | This is your API key for Screenshot API | learn more |
url [string] (required) | The url of the page you want to take screenshot of. | learn more |
field [string] (required) | The type of job you want to scrape. | learn more |
geoid [string] (required) | The unique id of any location in linkedin. | learn more |
page [string] (required) | The page number of linkedin jobs. | learn more |
parsed [boolean] (required) | To tell scraper to return data in JSON. | learn more |
Parameters [type] (default) | Description | Learn more |
---|---|---|
api_key [string] (required) | This is your API key | learn more |
url [string] (required) | The URL of the page you want to scrape | learn more |
country [string] ("") | Premium residential proxy location | learn more |
premium [string] ("") | Use premium proxies to bypass difficult to scrape websites (10-25 credits/request) | learn more |
dynamic [boolean] (true) | Render the JavaScript on the page with a headless browser (5-25 credits/request) | learn more |
wait [integer] (0) | You can wait for 0 to 35 seconds for a heavy website to load. | learn more |
custom_headers [boolean] (false) | Forward particular headers to the webpage, as well as other headers generated by ScrapingDog. | learn more |
api_key [string] (required) | This is your API key for Google API | learn more |
query [string] (required) | query string for Google API. | learn more |
results [string] (required) | number of google search results you need (1-100) | learn more |
country [string] ("") | Location for google search results | learn more |
api_key [string] (required) | This is your API key for Linkedin API | learn more |
type [string] (required) | It could be profile or company | learn more |
listing [boolean] (required) | It could be true or false | learn more |
linkId [string] (required) | This will be the id for a profile/company | learn more |
api_key [string] (required) | This is your API key for Screenshot API | learn more |
url [string] (required) | The url of the page you want to take screenshot of. | learn more |
field [string] (required) | The type of job you want to scrape. | learn more |
geoid [string] (required) | The unique id of any location in linkedin. | learn more |
page [string] (required) | The page number of linkedin jobs. | learn more |
parsed [boolean] (required) | To tell scraper to return data in JSON. | learn more |
Parameters [type] (default) | Description | Learn more |
---|---|---|
api_key [string] (required) | This is your API key | learn more |
url [string] (required) | The URL of the page you want to scrape | learn more |
country [string] ("") | Premium residential proxy location | learn more |
premium [string] ("") | Use premium proxies to bypass difficult to scrape websites (10-25 credits/request) | learn more |
dynamic [boolean] (true) | Render the JavaScript on the page with a headless browser (5-25 credits/request) | learn more |
wait [integer] (0) | You can wait for 0 to 35 seconds for a heavy website to load. | learn more |
custom_headers [boolean] (false) | Forward particular headers to the webpage, as well as other headers generated by ScrapingDog. | learn more |
api_key [string] (required) | This is your API key for Google API | learn more |
query [string] (required) | query string for Google API. | learn more |
results [string] (required) | number of google search results you need (1-100) | learn more |
country [string] ("") | Location for google search results | learn more |
api_key [string] (required) | This is your API key for Linkedin API | learn more |
type [string] (required) | It could be profile or company | learn more |
listing [boolean] (required) | It could be true or false | learn more |
linkId [string] (required) | This will be the id for a profile/company | learn more |
api_key [string] (required) | This is your API key for Screenshot API | learn more |
url [string] (required) | The url of the page you want to take screenshot of. | learn more |
field [string] (required) | The type of job you want to scrape. | learn more |
geoid [string] (required) | The unique id of any location in linkedin. | learn more |
page [string] (required) | The page number of linkedin jobs. | learn more |
parsed [boolean] (required) | To tell scraper to return data in JSON. | learn more |
Parameters [type] (default) | Description | Learn more |
---|---|---|
api_key [string] (required) | This is your API key | learn more |
url [string] (required) | The URL of the page you want to scrape | learn more |
country [string] ("") | Premium residential proxy location | learn more |
premium [string] ("") | Use premium proxies to bypass difficult to scrape websites (10-25 credits/request) | learn more |
dynamic [boolean] (true) | Render the JavaScript on the page with a headless browser (5-25 credits/request) | learn more |
wait [integer] (0) | You can wait for 0 to 35 seconds for a heavy website to load. | learn more |
custom_headers [boolean] (false) | Forward particular headers to the webpage, as well as other headers generated by ScrapingDog. | learn more |
api_key [string] (required) | This is your API key for Google API | learn more |
query [string] (required) | query string for Google API. | learn more |
results [string] (required) | number of google search results you need (1-100) | learn more |
country [string] ("") | Location for google search results | learn more |
api_key [string] (required) | This is your API key for Linkedin API | learn more |
type [string] (required) | It could be profile or company | learn more |
listing [boolean] (required) | It could be true or false | learn more |
linkId [string] (required) | This will be the id for a profile/company | learn more |
api_key [string] (required) | This is your API key for Screenshot API | learn more |
url [string] (required) | The url of the page you want to take screenshot of. | learn more |
field [string] (required) | The type of job you want to scrape. | learn more |
geoid [string] (required) | The unique id of any location in linkedin. | learn more |
page [string] (required) | The page number of linkedin jobs. | learn more |
parsed [boolean] (required) | To tell scraper to return data in JSON. | learn more |
Parameters [type] (default) | Description | Learn more |
---|---|---|
api_key [string] (required) | This is your API key | learn more |
url [string] (required) | The URL of the page you want to scrape | learn more |
country [string] ("") | Premium residential proxy location | learn more |
premium [string] ("") | Use premium proxies to bypass difficult to scrape websites (10-25 credits/request) | learn more |
dynamic [boolean] (true) | Render the JavaScript on the page with a headless browser (5-25 credits/request) | learn more |
wait [integer] (0) | You can wait for 0 to 35 seconds for a heavy website to load. | learn more |
custom_headers [boolean] (false) | Forward particular headers to the webpage, as well as other headers generated by ScrapingDog. | learn more |
api_key [string] (required) | This is your API key for Google API | learn more |
query [string] (required) | query string for Google API. | learn more |
results [string] (required) | number of google search results you need (1-100) | learn more |
country [string] ("") | Location for google search results | learn more |
api_key [string] (required) | This is your API key for Linkedin API | learn more |
type [string] (required) | It could be profile or company | learn more |
listing [boolean] (required) | It could be true or false | learn more |
linkId [string] (required) | This will be the id for a profile/company | learn more |
api_key [string] (required) | This is your API key for Screenshot API | learn more |
url [string] (required) | The url of the page you want to take screenshot of. | learn more |
field [string] (required) | The type of job you want to scrape. | learn more |
geoid [string] (required) | The unique id of any location in linkedin. | learn more |
page [string] (required) | The page number of linkedin jobs. | learn more |
parsed [boolean] (required) | To tell scraper to return data in JSON. | learn more |
Authentication
You can authenticate to our API by providing your API key which you can obtain in the member area.
All requests must be made to our endpoint via HTTPS. You must authenticate for all API requests. Major rules are discussed here
Basic Usage
Scrapingdog API exposes a single API endpoint, simply send a GET request to https://api.scrapingdog.com/scrape with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.
curl "https://api.scrapingdog.com/scrape?api_key=5e5a97e5b1ca5b194f42da86&url=http://httpbin.org/ip&dynamic=false"
- (string) url
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Javascript Rendering (default=true)
If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Premium plans. To render javascript, simply set dynamic=true and we will use a headless Google Chrome instance to fetch the page. Each request with normal rotating proxies will cost 5 credits and 25 credits with premium proxies.
To fetch the URL without using a headless browser, use the dynamic=false parameter in the GET request.
curl "https://api.scrapingdog.com/scrape?api_key=5e750530f030026c843fbefc646&url=http://httpbin.org/ip"
- (string) url
- (boolean) dynamic
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"192.15.81.132"}
</pre>
</body>
</html>
Wait for a fixed amount of time (default=0)
If you are crawling a heavy website then you might need this parameter. This will help you to load complete HTML before returning the results. Our browsers will wait for that particular time before returning the complete HTML.
Use the wait parameter with a value in milliseconds between 0 and 35000.
curl "https://api.scrapingdog.com/scrape?api_key=5e750530f030026c843fbefc646&url=http://httpbin.org/ip&wait=5000"
- (string) url
- (integer) wait
Passing Custom Headers
If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set custom_headers=true. Only use this feature in order to get customized results, do not use this feature in order to
avoid blocks, we handle that internally.
curl --header "X-customheader: bar" \
"https://api.scrapingdog.com/scrape?api_key=5e5a97e5b1ca5b194f42da86fr444356&url=http://httpbin.org/anything&custom_headers=true"
- (string) url
- (boolean) custom_headers
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{
"args":{},
"data":"",
"files":{},
"form":{},
"headers": {
"Accept":"*/*",
"Accept-Encoding":"gzip, deflate",
"Cache-Control":"max-age=259200",
"Connection":"close",
"Host":"httpbin.org",
"Referer":"http://httpbin.org",
"Timeout":"10000",
"User-Agent":"curl/7.54.0",
"X-Myheader":"123"
},
"json":null,
"method":"GET",
"origin":"45.72.0.249",
"url":"http://httpbin.org/anything"
}
</pre>
</body>
</html>
Sessions
To reuse the same proxy for multiple requests, simply use the &session_number= boo (e.g. session_number=666). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 60 seconds after the last usage.
curl "https://api.scrapingdog.com/scrape?api_key=5e3a0e5a97e5b1ca5b194f42da86&url=http://httpbin.org/ip&session_number=666"
curl "https://api.scrapingdog.com/scrape?api_key=5e3a0e5a97e5b1ca5b194f42da86&url=http://httpbin.org/ip&session_number=666"
- (string) url
- (integer) session_number
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Geographic Location
To ensure your requests come from a particular country, please use the ISO code of the country (e.g. country=us). United States (us) geotargeting is available on the Startup plan and higher. PRO plan customers also have access to Canada (ca), United Kingdom (uk), Russia (ru), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Italy (it), China (cn), and Australia (au). Other countries are available to PRO customers upon request.
curl "https://api.scrapingdog.com/scrape?api_key=3e3a09b6ecde9f83856906c5e27dd646&url=http://httpbin.org/ip&country=us"
- (string) url
- (string) country
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Premium Residential Proxies
For a few particularly difficult to scrape sites, we also maintain a private internal service of residential and mobile IPs. This service is only available to users on the PRO plan or higher. Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium pool will be charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit). To send a request through our premium proxy service, please use the premium=true query.
curl "https://api.scrapingdog.com/scrape?api_key=5e5a97e5b1ca5b194f42da86d646&url=http://httpbin.org/ip&premium=true"
- (string) url
- (boolean) premium
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"25.16.48.78"}
</pre>
</body>
</html>
POST/PUT Requests
You can also send a POST/PUT request through Scrapingdog API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.
# To send PUT request just replace POST with PUT
curl -d 'foo=bar' \
-X POST \
"https://api.scrapingdog.com/scrape?api_key=5e5a97e5b1ca5b194f42da86c5e27dd646&url=http://httpbin.org/anything"
# For form data
curl -H 'Content-Type: application/x-www-form-urlencoded' \
-F 'foo=bar'
-X POST \
"https://api.scrapingdog.com/scrape?api_key=5e5a97e5b1ca5b194f42da86e27dd646&url=http://httpbin.org/anything"
- None
{
"args": {},
"data": "{\"foo\":\"bar\"}",
"files": {},
"form": {},
"headers": {
"Accept": "application/json",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "13",
"Content-Type": "application/json; charset=utf-8",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
},
"json": {
"foo": "bar"
},
"method": "POST",
"origin": "25.16.48.78, 25.16.48.78",
"url": "https://httpbin.org/anything"
}
Google Search API
Using Google Search API you can scrape google search results without worrying about the proxy rotation and data parsing. Our API is the fastest, reliable and cheaper too. Each successful request will cost you 20 API credits.
#Through google search query
curl "https://api.scrapingdog.com/google/?api_key=5eaa61a6e562fc52fe763tr516e4653&query=football&results=10&country=us&page=0"
#Through google search URL
curl "https://api.scrapingdog.com/google/?api_key=5eaa61a6e562fc52fe763tr516e4653&query=https://www.google.com/search?q=pizza"
- (string) query - Google search query string or a google page url.
- (string) results - Number of results you want to scrape. It goes from 1 to 100.
- (string) country - Name of the country. The name should be in ISO format. Right now we support 15 countries.
- (string) page - It could be any number, starting from 0.
country | Country Name |
---|---|
us | United States |
cn | China |
au | Australia |
de | Germany |
fr | France |
ca | Canada |
it | Italy |
in | India |
ru | Russia |
mx | Mexico |
gb | United Kingdom |
sg | Singapore |
ch | Chile |
nl | Netherlands |
be | Belgium |
{
"Data": [
{
"link": "https://es.wikipedia.org/wiki/F%C3%BAtbol",
"title": "Fútbol - Wikipedia, la enciclopedia libre",,
"description": "El fútbol o futbol (del inglés británico football, traducido como balompié) es un deporte de equipo jugado entre dos conjuntos de once jugadores cada uno y ..."
"position": 1
},
{
"link": "https://en.wikipedia.org/wiki/Football",
"title": "Football - Wikipedia",
"description": "Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. Unqualified, the word football normally means the form of ...",
"position": 2
},
...
]
}
Datacenter and Residential Proxies
Scrapingdog also provides a proxy server. We have a pool of more than 7M residential proxies and 40000 Datacenter proxies. There is no limit to proxies. You can Scrape, track, go anonymous, track ads, etc.
Proxy Server proxy.scrapingdog.com
Proxy PORT 8081
Username scrapingdog
Password Your-API-KEY
- The username for the proxy is "scrapingdog" and the password is your API key. You can pass parameters to the proxy by adding them to the API key, separated by hyphen. For example, if you want to use a US residential proxy, the api key would be "5e36726387872864823-country=us". You can geotarget any country. If you want to use datacenter proxy then you don't have to pass any extra parameter with the API key. Also, you can use random residential proxy by passing "country=random" with your API key in the proxy.
- Your code should be configures to not verify the SSL.
- Each residential proxy will cost you 5 requests credits and each datacenter proxy will cost 1 request credit.
{"origin":"25.16.48.78"}
Scrape Linkedin Jobs💼.
With our dedicated Linkedin Jobs Scraper API you can scrape jobs at scale without parsing raw HTML. You just have to pass four query i.e. api_key, geoid, field and the page. One API call will cost 5 requests credit. Learn more about Linkedin Jobs API.
curl "https://api.scrapingdog.com/linkedinjobs/?api_key=5eaa61a6e562fc52fe763tr516e4653&field=python&geoid=100293800&page=1"
- (string) field is the type of job you want to scrape.
- (string) geoid is the unique location id issued by linkedin itself. You can find it inside the linkedin jobs url.
- (string) page is the page number of linkedin jobs page. It should be greater than 0. For each page you will get 25 jobs or less.
Scrape Linkedin User Profile.
Scrapingdog also provides API to scrape linkedin. You just have to pass three query i.e. api_key, type and the linkedin Id of the user linkId. One API call will cost 300 requests credit.
curl "https://api.scrapingdog.com/linkedin/?api_key=5eaa61a6e562fc52fe763tr516e4653&type=profile&linkId=rbranson"
- (string) type "profile"
- (string) linkId of the User Profile. You can find it in linkedin URL.
Scrape Linkedin Company Page.
Scrapingdog also provides API to scrape linkedin Company Page. You just have to pass three query i.e. api_key, type and the linkedin Id of the company linkId. One API call will cost 300 requests credit.
curl "https://api.scrapingdog.com/linkedin/?api_key=5eaa61a6e562fc52fe763tr516e4653&type=company&linkId=scrapingdog"
- (string) type "company"
- (string) linkId of the Company Page. You can find it in linkedin URL.
🏠 Scrape Zillow Properties.
With this dedicated scraper for zillow you will get parsed data from any property page of Zillow. To access this GET API you have to pass three queries api_key, url and the listing. "listing" parameter helps system to analyze whether the page has a list of properties or not.
curl "https://api.scrapingdog.com/zillow/?api_key=5eaa61a6e562fc52fe763tr516e4653&url=https://www.zillow.com/homes/for_sale/&listing=true"
- (string) api_key which is your API Key.
- (string) url of the zillow page you want to scrape.
- (boolean) listing will be true if the page has multiple property listed on it and false if the page is a dedicated page for a particular property.
Twitter Scraping API
With this dedicated scraper for twitter you will get parsed JSON data from any tweet. To access this GET API you have to pass three queries api_key, parsed and the url.
curl "https://api.scrapingdog.com/twitter?api_key=5eaa61a6e562fc52fe763tr516e4653&url=https://twitter.com/elonmusk/status/1655608985058267139&parsed=true"
- (string) api_key which is your API Key.
- (boolean) parsed will be true if you need data in JSON form. If it is false then you will get raw HTML from twitter.
- (string) url will be the url of the tweet.
📸 Screenshot API
You can take a screenshot of any page using this API. If you want a full page screenshot then just add &fullPage=true to your api url.
curl "https://api.scrapingdog.com/screenshot?api_key=6103077e467766765f5803ed2df7bc8&url=https://www.scrapingdog.com"
- (string) url - Target URL
- (boolean) fullPage - true/false according to your requirement.
Account Information
If you want to monitor your API usage then you can use this API. You just need to pas your API key as a query. You will get your total request limit and and the request you have already used.
curl "https://api.scrapingdog.com/account?api_key=678432hgtded7f4d8cc9d1244d48068af"
- (string) api key
{"requestLimit":1208653,"requestUsed":2341}
Basic Usage
Scrapingdog API exposes a single API endpoint, simply send a GET request to https://api.scrapingdog.com/scrape with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.
import requests
payload = {'api_key': 'APIKEY', 'url': 'https://httpbin.org/ip', 'dynamic':'false'}
resp = requests.get('https://api.scrapingdog.com/scrape', params=payload)
print (resp.text)
- (string) api_key
- (string) url
- (boolean) dynamic
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Javascript Rendering (default=true)
If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Premium plans. To render javascript, simply set dynamic=true and we will use a headless Google Chrome instance to fetch the page. Each request with normal rotating proxies will cost 5 credits and 25 credits with premium proxies.
To fetch the URL without using a headless browser, use the dynamic=false parameter in the GET request.
import requests
payload = {'api_key': 'APIKEY', 'url': 'https://httpbin.org/ip'}
resp = requests.get('https://api.scrapingdog.com/scrape', params=payload)
print (resp.text)
- (string) url
- (string) api_key
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"192.15.81.132"}
</pre>
</body>
</html>
Wait for a fixed amount of time (default=0)
If you are crawling a heavy website then you might need this parameter. This will help you to load complete HTML before returning the results. Our browsers will wait for that particular time before returning the complete HTML.
Use the wait parameter with a value in milliseconds between 0 and 35000.
import requests
payload = {'api_key': 'APIKEY', 'url': 'https://httpbin.org/ip', 'wait':'5000'}
resp = requests.get('https://api.scrapingdog.com/scrape', params=payload)
print (resp.text)
- (string) url
- (integer) wait
Passing Custom Headers
If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set custom_headers=true. Only use this feature in order to get customized results, do not use this feature in order to
avoid blocks, we handle that internally.
import requests
headers = {
'Accept': 'application/json'
'X-MyHeader': '123',
}
payload = {'api_key': 'APIKEY', 'url': 'https://httpbin.org/ip', 'custom_headers':'true'}
resp = requests.get('https://api.scrapingdog.com/scrape', params=payload)
print (resp.text)
- (string) url
- (boolean) custom_headers
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{
"args":{},
"data":"",
"files":{},
"form":{},
"headers": {
"Accept":"*/*",
"Accept-Encoding":"gzip, deflate",
"Cache-Control":"max-age=259200",
"Connection":"close",
"Host":"httpbin.org",
"Referer":"http://httpbin.org",
"Timeout":"10000",
"User-Agent":"curl/7.54.0",
"X-Myheader":"123"
},
"json":null,
"method":"GET",
"origin":"45.72.0.249",
"url":"http://httpbin.org/anything"
}
</pre>
</body>
</html>
Sessions
To reuse the same proxy for multiple requests, simply use the &session_number= boo (e.g. session_number=666). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 60 seconds after the last usage.
import requests
payload = {'api_key': 'APIKEY', 'url': 'https://httpbin.org/ip', 'session_number':'666'}
resp = requests.get('https://api.scrapingdog.com/scrape', params=payload)
print (resp.text)
- (string) url
- (integer) session_number
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Geographic Location
To ensure your requests come from a particular country, please use the ISO code of the country (e.g. country=us). United States (us) geotargeting is available on the Startup plan and higher. PRO plan customers also have access to Canada (ca), United Kingdom (uk), Russia (ru), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Italy (it), China (cn), and Australia (au). Other countries are available to PRO customers upon request.
import requests
payload = {'api_key': 'APIKEY', 'url': 'https://httpbin.org/ip', 'country':'gb'}
resp = requests.get('https://api.scrapingdog.com/scrape', params=payload)
print (resp.text)
- (string) url
- (string) country
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Premium Residential Proxies
For a few particularly difficult to scrape sites, we also maintain a private internal service of residential and mobile IPs. This service is only available to users on the PRO plan or higher. Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium pool will be charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit). To send a request through our premium proxy service, please use the premium=true query.
import requests
payload = {'api_key': 'APIKEY', 'url': 'https://httpbin.org/ip', 'premium':'true'}
resp = requests.get('https://api.scrapingdog.com/scrape', params=payload)
print (resp.text)
- (string) url
- (boolean) premium
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"25.16.48.78"}
</pre>
</body>
</html>
POST/PUT Requests
You can also send a POST/PUT request through Scrapingdog API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.
import requests
payload = {'api_key': 'APIKEY', 'url': 'http://httpbin.org/post'}
resp = requests.post('https://api.scrapingdog.com/scrape', params=payload, data={'foo': 'bar'})
print (resp.text)
- None
{
"args": {},
"data": "{\"foo\":\"bar\"}",
"files": {},
"form": {},
"headers": {
"Accept": "application/json",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "13",
"Content-Type": "application/json; charset=utf-8",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
},
"json": {
"foo": "bar"
},
"method": "POST",
"origin": "25.16.48.78, 25.16.48.78",
"url": "https://httpbin.org/anything"
}
Google Search API
Using Google Search API you can scrape google search results without worrying about the proxy rotation and data parsing. Our API is the fastest, reliable and cheaper too. Each successful request will cost you 20 API credits.
#Through google search query
import requests
payload = {'api_key': 'APIKEY', 'query':'football', 'results':'10', 'country':'gb', 'page':'0'}
resp = requests.get('https://api.scrapingdog.com/google', params=payload)
print (resp.json())
#Through google search URL
import requests
payload = {'api_key': 'APIKEY', 'query':'https://www.google.com/search?q=pizza'}
resp = requests.get('https://api.scrapingdog.com/google', params=payload)
print (resp.text)
- (string) query - Google search query string or a google page url.
- (string) results - Number of results you want to scrape. It goes from 1 to 100.
- (string) country - Name of the country. The name should be in ISO format. Right now we support 15 countries.
- (string) page - It could be any number, starting from 0.
country | Country Name |
---|---|
us | United States |
cn | China |
au | Australia |
de | Germany |
fr | France |
ca | Canada |
it | Italy |
in | India |
ru | Russia |
mx | Mexico |
gb | United Kingdom |
sg | Singapore |
ch | Chile |
nl | Netherlands |
be | Belgium |
{
"Data": [
{
"link": "https://es.wikipedia.org/wiki/F%C3%BAtbol",
"title": "Fútbol - Wikipedia, la enciclopedia libre",,
"description": "El fútbol o futbol (del inglés británico football, traducido como balompié) es un deporte de equipo jugado entre dos conjuntos de once jugadores cada uno y ..."
"position": 1
},
{
"link": "https://en.wikipedia.org/wiki/Football",
"title": "Football - Wikipedia",
"description": "Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. Unqualified, the word football normally means the form of ...",
"position": 2
},
...
]
}
Datacenter and Residential Proxies
Scrapingdog also provides a proxy server. We have a pool of more than 7M residential proxies and 40000 Datacenter proxies. There is no limit to proxies. You can Scrape, track, go anonymous, track ads, etc.
#Datacenter proxies
import requests
proxies = {
"http": "http://scrapingdog:[email protected]:8081"
}
resp = requests.get('http://httpbin.org/ip', proxies=proxies, verify=False)
print (resp.text)
#Residential proxies
import requests
proxies = {
"http": "http://scrapingdog:[email protected]:8081"
}
resp = requests.get('http://httpbin.org/ip', proxies=proxies, verify=False)
print (resp.text)
- The username for the proxy is "scrapingdog" and the password is your API key. You can pass parameters to the proxy by adding them to the API key, separated by hyphen. For example, if you want to use a US residential proxy, the api key would be "5e36726387872864823-country=us". You can geotarget any country. If you want to use datacenter proxy then you don't have to pass any extra parameter with the API key. Also, you can use random residential proxy by passing "country=random" with your API key in the proxy.
- Your code should be configures to not verify the SSL.
- Each residential proxy will cost you 5 requests credits and each datacenter proxy will cost 1 request credit.
{"origin":"25.16.48.78"}
Scrape Linkedin Jobs💼.
With our dedicated Linkedin Jobs Scraper API you can scrape jobs at scale without parsing raw HTML. You just have to pass four query i.e. api_key, geoid, field and the page. One API call will cost 5 requests credit. Learn more about Linkedin Jobs API.
import requests
payload = {'api_key': 'APIKEY', 'field':'Python', 'geoid:'100293800', 'page':'1'}
resp = requests.get('https://api.scrapingdog.com/linkedinjobs', params=payload)
print (resp.json())
- (string) field is the type of job you want to scrape.
- (string) geoid is the unique location id issued by linkedin itself. You can find it inside the linkedin jobs url.
- (string) page is the page number of linkedin jobs page. It should be greater than 0. For each page you will get 25 jobs or less.
Scrape Linkedin User Profile.
Scrapingdog also provides API to scrape linkedin. You just have to pass three query i.e. api_key, type and the linkedin Id of the user linkId. One API call will cost 300 requests credit.
import requests
payload = {'api_key': 'APIKEY', 'type':'profile', 'linkId:'rbranson'}
resp = requests.get('https://api.scrapingdog.com/linkedin', params=payload)
print (resp.json())
- (string) type "profile"
- (string) linkId of the User Profile. You can find it in linkedin URL.
Scrape Linkedin Company Page.
Scrapingdog also provides API to scrape linkedin Company Page. You just have to pass three query i.e. api_key, type and the linkedin Id of the company linkId. One API call will cost 300 requests credit.
import requests
payload = {'api_key': 'APIKEY', 'type':'company', 'linkId:'amazon'}
resp = requests.get('https://api.scrapingdog.com/linkedin', params=payload)
print (resp.json())
- (string) type "company"
- (string) linkId of the Company Page. You can find it in linkedin URL.
🏠 Scrape Zillow Properties.
With this dedicated scraper for zillow you will get parsed data from any property page of Zillow. To access this GET API you have to pass three queries api_key, url and the listing. "listing" parameter helps system to analyze whether the page has a list of properties or not.
import requests
payload = {'api_key': 'APIKEY', 'url':'https://www.zillow.com/homes/for_sale/', 'listing':true}
resp = requests.get('https://api.scrapingdog.com/zillow', params=payload)
print (resp.json())
- (string) api_key which is your API Key.
- (string) url of the zillow page you want to scrape.
- (boolean) listing will be true if the page has multiple property listed on it and false if the page is a dedicated page for a particular property.
Twitter Scraping API
With this dedicated scraper for twitter you will get parsed JSON data from any tweet. To access this GET API you have to pass three queries api_key, parsed and the url.
import requests
payload = {'api_key': 'APIKEY', 'url':'https://twitter.com/elonmusk/status/1655608985058267139', 'parsed':true}
resp = requests.get('https://api.scrapingdog.com/twitter', params=payload)
print (resp.json())
- (string) api_key which is your API Key.
- (boolean) parsed will be true if you need data in JSON form. If it is false then you will get raw HTML from twitter.
- (string) url will be the url of the tweet.
📸 Screenshot API
You can take a screenshot of any page using this API. If you want a full page screenshot then just add &fullPage=true to your api url.
import requests
payload = {'api_key': 'APIKEY', 'url':'https://www.scrapingdog.com', 'fullPage:'true'}
resp = requests.get('https://api.scrapingdog.com/screenshot', params=payload)
print (resp.text)
- (string) url - Target URL
- (boolean) fullPage - true/false according to your requirement.
Account Information
If you want to monitor your API usage then you can use this API. You just need to pas your API key as a query. You will get your total request limit and and the request you have already used.
import requests
payload = {'api_key': 'APIKEY'}
resp = requests.get('https://api.scrapingdog.com/account', params=payload)
print (resp.text)
- (string) api key
{"requestLimit":1208653,"requestUsed":2341}
Basic Usage
Scrapingdog API exposes a single API endpoint, simply send a GET request to https://api.scrapingdog.com/scrape with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/scrape?api_key=APIKEY&url=https://httpbin.org/ip&dynamic=false')
console.log(resp.body)
- (string) api_key
- (string) url
- (boolean) dynamic
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Javascript Rendering (default=true)
If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Premium plans. To render javascript, simply set dynamic=true and we will use a headless Google Chrome instance to fetch the page. Each request with normal rotating proxies will cost 5 credits and 25 credits with premium proxies.
To fetch the URL without using a headless browser, use the dynamic=false parameter in the GET request.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/scrape?api_key=APIKEY&url=https://httpbin.org/ip')
console.log(resp.body)
- (string) url
- (boolean) dynamic
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"192.15.81.132"}
</pre>
</body>
</html>
Wait for a fixed amount of time (default=0)
If you are crawling a heavy website then you might need this parameter. This will help you to load complete HTML before returning the results. Our browsers will wait for that particular time before returning the complete HTML.
Use the wait parameter with a value in milliseconds between 0 and 35000.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/scrape?api_key=APIKEY&url=https://httpbin.org/ip&wait=5000')
console.log(resp.body)
- (string) url
- (integer) wait
Passing Custom Headers
If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set custom_headers=true. Only use this feature in order to get customized results, do not use this feature in order to
avoid blocks, we handle that internally.
const unirest = require('unirest')
var headers={
'Accept': 'application/json',
'X-MyHeader': '123'
}
var resp = await unirest.get('https://api.scrapingdog.com/scrape?api_key=APIKEY&url=https://httpbin.org/ip&dynamic=false').headers(headers)
console.log(resp.body)
- (string) url
- (boolean) custom_headers
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{
"args":{},
"data":"",
"files":{},
"form":{},
"headers": {
"Accept":"*/*",
"Accept-Encoding":"gzip, deflate",
"Cache-Control":"max-age=259200",
"Connection":"close",
"Host":"httpbin.org",
"Referer":"http://httpbin.org",
"Timeout":"10000",
"User-Agent":"curl/7.54.0",
"X-Myheader":"123"
},
"json":null,
"method":"GET",
"origin":"45.72.0.249",
"url":"http://httpbin.org/anything"
}
</pre>
</body>
</html>
Sessions
To reuse the same proxy for multiple requests, simply use the &session_number= boo (e.g. session_number=666). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 60 seconds after the last usage.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/scrape?api_key=APIKEY&url=https://httpbin.org/ip&session_number=123')
console.log(resp.body)
- (string) url
- (integer) session_number
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Geographic Location
To ensure your requests come from a particular country, please use the ISO code of the country (e.g. country=us). United States (us) geotargeting is available on the Startup plan and higher. PRO plan customers also have access to Canada (ca), United Kingdom (uk), Russia (ru), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Italy (it), China (cn), and Australia (au). Other countries are available to PRO customers upon request.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/scrape?api_key=APIKEY&url=https://httpbin.org/ip&country=gb')
console.log(resp.body)
- (string) url
- (string) country
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Premium Residential Proxies
For a few particularly difficult to scrape sites, we also maintain a private internal service of residential and mobile IPs. This service is only available to users on the PRO plan or higher. Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium pool will be charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit). To send a request through our premium proxy service, please use the premium=true query.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/scrape?api_key=APIKEY&url=https://httpbin.org/ip&premium=true')
console.log(resp.body)
- (string) url
- (boolean) premium
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"25.16.48.78"}
</pre>
</body>
</html>
POST/PUT Requests
You can also send a POST/PUT request through Scrapingdog API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.
const unirest = require('unirest')
var headers = {
'Accept': 'application/json',
'X-MyHeader': '123'
}
var data = { "parameter": 23, "foo": "bar" }
var resp = await unirest.post('https://api.scrapingdog.com/scrape?api_key=APIKEY&url=http://httpbin.org/post').headers(headers).send(data)
console.log(resp.body)
- (string) url
{
"args": {},
"data": "{\"foo\":\"bar\"}",
"files": {},
"form": {},
"headers": {
"Accept": "application/json",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "13",
"Content-Type": "application/json; charset=utf-8",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
},
"json": {
"foo": "bar"
},
"method": "POST",
"origin": "25.16.48.78, 25.16.48.78",
"url": "https://httpbin.org/anything"
}
Google Search API
Using Google Search API you can scrape google search results without worrying about the proxy rotation and data parsing. Our API is the fastest, reliable and cheaper too. Each successful request will cost you 20 API credits.
#Through google search query
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/google?api_key=APIKEY&query=footbal&results=10&country=us&page=0')
console.log(resp.body)
#Through google search URL
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/google?api_key=APIKEY&query=https://www.google.com/search?q=pizza')
console.log(resp.body)
- (string) api_key
- (string) query - Google search query string or a google page url.
- (string) results - Number of results you want to scrape. It goes from 1 to 100.
- (string) country - Name of the country. The name should be in ISO format. Right now we support 15 countries.
- (string) page - It could be any number, starting from 0.
country | Country Name |
---|---|
us | United States |
cn | China |
au | Australia |
de | Germany |
fr | France |
ca | Canada |
it | Italy |
in | India |
ru | Russia |
mx | Mexico |
gb | United Kingdom |
sg | Singapore |
ch | Chile |
nl | Netherlands |
be | Belgium |
{
"Data": [
{
"link": "https://es.wikipedia.org/wiki/F%C3%BAtbol",
"title": "Fútbol - Wikipedia, la enciclopedia libre",,
"description": "El fútbol o futbol (del inglés británico football, traducido como balompié) es un deporte de equipo jugado entre dos conjuntos de once jugadores cada uno y ..."
"position": 1
},
{
"link": "https://en.wikipedia.org/wiki/Football",
"title": "Football - Wikipedia",
"description": "Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. Unqualified, the word football normally means the form of ...",
"position": 2
},
...
]
}
Datacenter and Residential Proxies
Scrapingdog also provides a proxy server. We have a pool of more than 7M residential proxies and 40000 Datacenter proxies. There is no limit to proxies. You can Scrape, track, go anonymous, track ads, etc.
#Datacenter proxies
const unirest = require('unirest')
var resp = await unirest.get('https://httpbin.org/ip').proxy('http://scrapingdog:[email protected]:8081')
console.log(resp.body)
#Residential proxies
const unirest = require('unirest')
var resp = await unirest.get('https://httpbin.org/ip').proxy('http://scrapingdog:[email protected]:8081')
console.log(resp.body)
- The username for the proxy is "scrapingdog" and the password is your API key. You can pass parameters to the proxy by adding them to the API key, separated by hyphen. For example, if you want to use a US residential proxy, the api key would be "5e36726387872864823-country=us". You can geotarget any country. If you want to use datacenter proxy then you don't have to pass any extra parameter with the API key. Also, you can use random residential proxy by passing "country=random" with your API key in the proxy.
- Your code should be configures to not verify the SSL.
- Each residential proxy will cost you 5 requests credits and each datacenter proxy will cost 1 request credit.
{"origin":"25.16.48.78"}
Scrape Linkedin Jobs💼.
With our dedicated Linkedin Jobs Scraper API you can scrape jobs at scale without parsing raw HTML. You just have to pass four query i.e. api_key, geoid, field and the page. One API call will cost 5 requests credit. Learn more about Linkedin Jobs API.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/linkedinjobs?api_key=APIKEY&field=Python&geoid=100293800&page=1')
console.log(resp.body)
- (string) field is the type of job you want to scrape.
- (string) geoid is the unique location id issued by linkedin itself. You can find it inside the linkedin jobs url.
- (string) page is the page number of linkedin jobs page. It should be greater than 0. For each page you will get 25 jobs or less.
Scrape Linkedin User Profile.
Scrapingdog also provides API to scrape linkedin. You just have to pass three query i.e. api_key, type and the linkedin Id of the user linkId. One API call will cost 300 requests credit.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/linkedin?api_key=APIKEY&type=profile&linkId=rbranson')
console.log(resp.body)
- (string) api_key
- (string) type "profile"
- (string) linkId of the User Profile. You can find it in linkedin URL.
Scrape Linkedin Company Page.
Scrapingdog also provides API to scrape linkedin Company Page. You just have to pass three query i.e. api_key, type and the linkedin Id of the company linkId. One API call will cost 300 requests credit.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/linkedin?api_key=APIKEY&type=company&linkId=scrapingdog')
console.log(resp.body)
- (string) api_key
- (string) type "company"
- (string) linkId of the Company Page. You can find it in linkedin URL.
🏠 Scrape Zillow Properties.
With this dedicated scraper for zillow you will get parsed data from any property page of Zillow. To access this GET API you have to pass three queries api_key, url and the listing. "listing" parameter helps system to analyze whether the page has a list of properties or not.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/zillow?api_key=APIKEY&url=https://www.zillow.com/homes/for_sale/&listing=true')
console.log(resp.body)
- (string) api_key which is your API Key.
- (string) url of the zillow page you want to scrape.
- (boolean) listing will be true if the page has multiple property listed on it and false if the page is a dedicated page for a particular property.
Twitter Scraping API
With this dedicated scraper for twitter you will get parsed JSON data from any tweet. To access this GET API you have to pass three queries api_key, parsed and the url.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/twitter?api_key=APIKEY&url=https://twitter.com/elonmusk/status/1655608985058267139&parsed=true')
console.log(resp.body)
- (string) api_key which is your API Key.
- (boolean) parsed will be true if you need data in JSON form. If it is false then you will get raw HTML from twitter.
- (string) url will be the url of the tweet.
📸 Screenshot API
You can take a screenshot of any page using this API. If you want a full page screenshot then just add &fullPage=true to your api url.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/screenshot?api_key=APIKEY&url=https://www.scrapingdog.com&fullPage=true')
console.log(resp.body)
- (string) api_key
- (string) url - Target URL
- (boolean) fullPage - true/false according to your requirement.
Account Information
If you want to monitor your API usage then you can use this API. You just need to pas your API key as a query. You will get your total request limit and and the request you have already used.
const unirest = require('unirest')
var resp = await unirest.get('https://api.scrapingdog.com/account?api_key=APIKEY')
console.log(resp.body)
- (string) api_key
{"requestLimit":1208653,"requestUsed":2341}
Basic Usage
Scrapingdog API exposes a single API endpoint, simply send a GET request to https://api.scrapingdog.com/scrape with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.
"https://api.scrapingdog.com/scrape?api_key=APIKEY&url=http://httpbin.org/ip&dynamic=false";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) api_key
- (string) url
- (boolean) dynamic
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Javascript Rendering (default=true)
If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Premium plans. To render javascript, simply set dynamic=true and we will use a headless Google Chrome instance to fetch the page. Each request with normal rotating proxies will cost 5 credits and 25 credits with premium proxies.
To fetch the URL without using a headless browser, use the dynamic=false parameter in the GET request.
"https://api.scrapingdog.com/scrape?api_key=APIKEY&url=http://httpbin.org/ip";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) api_key
- (string) url
- (boolean) dynamic
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"192.15.81.132"}
</pre>
</body>
</html>
Wait for a fixed amount of time (default=0)
If you are crawling a heavy website then you might need this parameter. This will help you to load complete HTML before returning the results. Our browsers will wait for that particular time before returning the complete HTML.
Use the wait parameter with a value in milliseconds between 0 and 35000.
"https://api.scrapingdog.com/scrape?api_key=APIKEY&url=http://httpbin.org/ip&wait=5000";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) api_key
- (string) url
- (integer) wait
Passing Custom Headers
If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set custom_headers=true. Only use this feature in order to get customized results, do not use this feature in order to
avoid blocks, we handle that internally.
"https://api.scrapingdog.com/scrape?api_key=APIKEY&url=http://httpbin.org/ip&custom_headers=true";
$headerArray = array(
"Content-Type: application/json",
"X-MyHeader: 123"
);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headerArray);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) url
- (boolean) custom_headers
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{
"args":{},
"data":"",
"files":{},
"form":{},
"headers": {
"Accept":"*/*",
"Accept-Encoding":"gzip, deflate",
"Cache-Control":"max-age=259200",
"Connection":"close",
"Host":"httpbin.org",
"Referer":"http://httpbin.org",
"Timeout":"10000",
"User-Agent":"curl/7.54.0",
"X-Myheader":"123"
},
"json":null,
"method":"GET",
"origin":"45.72.0.249",
"url":"http://httpbin.org/anything"
}
</pre>
</body>
</html>
Sessions
To reuse the same proxy for multiple requests, simply use the &session_number= boo (e.g. session_number=666). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 60 seconds after the last usage.
"https://api.scrapingdog.com/scrape?api_key=APIKEY&url=http://httpbin.org/ip&session_number=123";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) url
- (integer) session_number
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Geographic Location
To ensure your requests come from a particular country, please use the ISO code of the country (e.g. country=us). United States (us) geotargeting is available on the Startup plan and higher. PRO plan customers also have access to Canada (ca), United Kingdom (uk), Russia (ru), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Italy (it), China (cn), and Australia (au). Other countries are available to PRO customers upon request.
"https://api.scrapingdog.com/scrape?api_key=APIKEY&url=http://httpbin.org/ip&country=gb";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) url
- (string) country
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Premium Residential Proxies
For a few particularly difficult to scrape sites, we also maintain a private internal service of residential and mobile IPs. This service is only available to users on the PRO plan or higher. Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium pool will be charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit). To send a request through our premium proxy service, please use the premium=true query.
"https://api.scrapingdog.com/scrape?api_key=APIKEY&url=http://httpbin.org/ip&premium=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) url
- (boolean) premium
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"25.16.48.78"}
</pre>
</body>
</html>
POST/PUT Requests
You can also send a POST/PUT request through Scrapingdog API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.
https://api.scrapingdog.com/scrape?api_key=APIKEY&url=http://httpbin.org/anything";
# POST/PUT Requests
$postData = ["foo" => "bar"];
$postData = json_encode($postData);
$headers = [
"Content-Type: application/json"
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData); //Post Fields
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
#Form POST Request
$postData = ["foo" => "bar"];
$postData = json_encode($postData);
$headers = [
'Content-Type: application/x-www-form-urlencoded; charset=utf-8',
];
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url = "
- (string) api_key
- (string) url
{
"args": {},
"data": "{\"foo\":\"bar\"}",
"files": {},
"form": {},
"headers": {
"Accept": "application/json",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "13",
"Content-Type": "application/json; charset=utf-8",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
},
"json": {
"foo": "bar"
},
"method": "POST",
"origin": "25.16.48.78, 25.16.48.78",
"url": "https://httpbin.org/anything"
}
Google Search API
Using Google Search API you can scrape google search results without worrying about the proxy rotation and data parsing. Our API is the fastest, reliable and cheaper too. Each successful request will cost you 20 API credits.
#Through google search query
$url = "https://api.scrapingdog.com/google?api_key=APIKEY&query=footbal&results=10&country=us&page=0";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
#Through google search URL
$url = "https://api.scrapingdog.com/google?api_key=APIKEY&query=https://www.google.com/search?q=pizza";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
- (string) api_key
- (string) query - Google search query string or a google page url.
- (string) results - Number of results you want to scrape. It goes from 1 to 100.
- (string) country - Name of the country. The name should be in ISO format. Right now we support 15 countries.
- (string) page - It could be any number, starting from 0.
country | Country Name |
---|---|
us | United States |
cn | China |
au | Australia |
de | Germany |
fr | France |
ca | Canada |
it | Italy |
in | India |
ru | Russia |
mx | Mexico |
gb | United Kingdom |
sg | Singapore |
ch | Chile |
nl | Netherlands |
be | Belgium |
{
"Data": [
{
"link": "https://es.wikipedia.org/wiki/F%C3%BAtbol",
"title": "Fútbol - Wikipedia, la enciclopedia libre",,
"description": "El fútbol o futbol (del inglés británico football, traducido como balompié) es un deporte de equipo jugado entre dos conjuntos de once jugadores cada uno y ..."
"position": 1
},
{
"link": "https://en.wikipedia.org/wiki/Football",
"title": "Football - Wikipedia",
"description": "Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. Unqualified, the word football normally means the form of ...",
"position": 2
},
...
]
}
Datacenter and Residential Proxies
Scrapingdog also provides a proxy server. We have a pool of more than 7M residential proxies and 40000 Datacenter proxies. There is no limit to proxies. You can Scrape, track, go anonymous, track ads, etc.
#Datacenter proxies
"http://httpbin.org/ip");
curl_setopt($ch, CURLOPT_PROXY, "http://scrapingdog:[email protected]:8081");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
var_dump($response);
#Residential proxies
"http://httpbin.org/ip");
curl_setopt($ch, CURLOPT_PROXY, "http://scrapingdog:[email protected]:8081");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
var_dump($response);
- The username for the proxy is "scrapingdog" and the password is your API key. You can pass parameters to the proxy by adding them to the API key, separated by hyphen. For example, if you want to use a US residential proxy, the api key would be "5e36726387872864823-country=us". You can geotarget any country. If you want to use datacenter proxy then you don't have to pass any extra parameter with the API key. Also, you can use random residential proxy by passing "country=random" with your API key in the proxy.
- Your code should be configures to not verify the SSL.
- Each residential proxy will cost you 5 requests credits and each datacenter proxy will cost 1 request credit.
{"origin":"25.16.48.78"}
Scrape Linkedin Jobs💼.
With our dedicated Linkedin Jobs Scraper API you can scrape jobs at scale without parsing raw HTML. You just have to pass four query i.e. api_key, geoid, field and the page. One API call will cost 5 requests credit. Learn more about Linkedin Jobs API.
"https://api.scrapingdog.com/linkedinjobs?api_key=APIKEY&field=Python&geoid=100293800&page=1";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) field is the type of job you want to scrape.
- (string) geoid is the unique location id issued by linkedin itself. You can find it inside the linkedin jobs url.
- (string) page is the page number of linkedin jobs page. It should be greater than 0. For each page you will get 25 jobs or less.
Scrape Linkedin User Profile.
Scrapingdog also provides API to scrape linkedin. You just have to pass three query i.e. api_key, type and the linkedin Id of the user linkId. One API call will cost 300 requests credit.
"https://api.scrapingdog.com/linkedin?api_key=APIKEY&type=profile&linkId=rbranson";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) api_key
- (string) type "profile"
- (string) linkId of the User Profile. You can find it in linkedin URL.
Scrape Linkedin Company Page.
Scrapingdog also provides API to scrape linkedin Company Page. You just have to pass three query i.e. api_key, type and the linkedin Id of the company linkId. One API call will cost 300 requests credit.
"https://api.scrapingdog.com/linkedin?api_key=APIKEY&type=company&linkId=scrapingdog";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) api_key
- (string) type "company"
- (string) linkId of the Company Page. You can find it in linkedin URL.
🏠 Scrape Zillow Properties.
With this dedicated scraper for zillow you will get parsed data from any property page of Zillow. To access this GET API you have to pass three queries api_key, url and the listing. "listing" parameter helps system to analyze whether the page has a list of properties or not.
"https://api.scrapingdog.com/zillow?api_key=APIKEY&url=https://www.zillow.com/homes/for_sale/&listing=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) api_key which is your API Key.
- (string) url of the zillow page you want to scrape.
- (boolean) listing will be true if the page has multiple property listed on it and false if the page is a dedicated page for a particular property.
Twitter Scraping API
With this dedicated scraper for twitter you will get parsed JSON data from any tweet. To access this GET API you have to pass three queries api_key, parsed and the url.
"https://api.scrapingdog.com/twitter?api_key=APIKEY&url=https://twitter.com/elonmusk/status/1655608985058267139&parsed=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) api_key which is your API Key.
- (boolean) parsed will be true if you need data in JSON form. If it is false then you will get raw HTML from twitter.
- (string) url will be the url of the tweet.
📸 Screenshot API
You can take a screenshot of any page using this API. If you want a full page screenshot then just add &fullPage=true to your api url.
"https://api.scrapingdog.com/screenshot?api_key=APIKEY&url=https://www.scrapingdog.com&fullPage=true";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) api_key
- (string) url - Target URL
- (boolean) fullPage - true/false according to your requirement.
Account Information
If you want to monitor your API usage then you can use this API. You just need to pas your API key as a query. You will get your total request limit and and the request you have already used.
"https://api.scrapingdog.com/account?api_key=APIKEY";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
curl_close($ch);
print_r($response);
$url =
- (string) api_key
{"requestLimit":1208653,"requestUsed":2341}
Basic Usage
Scrapingdog API exposes a single API endpoint, simply send a GET request to https://api.scrapingdog.com/scrape with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "http://httpbin.org/ip"
=> false
}
uri = URI('https://api.scrapingdog.com/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key
- (string) url
- (boolean) dynamic
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Javascript Rendering (default=true)
If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Premium plans. To render javascript, simply set dynamic=true and we will use a headless Google Chrome instance to fetch the page. Each request with normal rotating proxies will cost 5 credits and 25 credits with premium proxies.
To fetch the URL without using a headless browser, use the dynamic=false parameter in the GET request.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "http://httpbin.org/ip"
}
uri = URI('https://api.scrapingdog.com/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key
- (string) url
- (boolean) dynamic
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"192.15.81.132"}
</pre>
</body>
</html>
Wait for a fixed amount of time (default=0)
If you are crawling a heavy website then you might need this parameter. This will help you to load complete HTML before returning the results. Our browsers will wait for that particular time before returning the complete HTML.
Use the wait parameter with a value in milliseconds between 0 and 35000.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "http://httpbin.org/ip",
=> 5000
}
uri = URI('https://api.scrapingdog.com/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key
- (string) url
- (integer) wait
Passing Custom Headers
If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set custom_headers=true. Only use this feature in order to get customized results, do not use this feature in order to
avoid blocks, we handle that internally.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "http://httpbin.org/anything",
=> true
}
uri = URI('https://api.scrapingdog.com/scrape')
uri.query = URI.encode_www_form(params)
req = Net::HTTP::Get.new(uri)
req['Accept'] = 'application/json'
req['X-MyHeader'] = '123'
website_content = Net::HTTP.start(uri.hostname, uri.port) {|http|
http.request(req)
}
print(website_content.body)
- (string) url
- (boolean) custom_headers
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{
"args":{},
"data":"",
"files":{},
"form":{},
"headers": {
"Accept":"*/*",
"Accept-Encoding":"gzip, deflate",
"Cache-Control":"max-age=259200",
"Connection":"close",
"Host":"httpbin.org",
"Referer":"http://httpbin.org",
"Timeout":"10000",
"User-Agent":"curl/7.54.0",
"X-Myheader":"123"
},
"json":null,
"method":"GET",
"origin":"45.72.0.249",
"url":"http://httpbin.org/anything"
}
</pre>
</body>
</html>
Sessions
To reuse the same proxy for multiple requests, simply use the &session_number= boo (e.g. session_number=666). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 60 seconds after the last usage.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "http://httpbin.org/ip",
=> 123
}
uri = URI('https://api.scrapingdog.com/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) url
- (string) api_key
- (integer) session_number
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Geographic Location
To ensure your requests come from a particular country, please use the ISO code of the country (e.g. country=us). United States (us) geotargeting is available on the Startup plan and higher. PRO plan customers also have access to Canada (ca), United Kingdom (uk), Russia (ru), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Italy (it), China (cn), and Australia (au). Other countries are available to PRO customers upon request.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "http://httpbin.org/ip",
=> "us"
}
uri = URI('https://api.scrapingdog.com/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key
- (string) url
- (string) country
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Premium Residential Proxies
For a few particularly difficult to scrape sites, we also maintain a private internal service of residential and mobile IPs. This service is only available to users on the PRO plan or higher. Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium pool will be charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit). To send a request through our premium proxy service, please use the premium=true query.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "http://httpbin.org/ip",
=> true
}
uri = URI('https://api.scrapingdog.com/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) url
- (boolean) premium
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"25.16.48.78"}
</pre>
</body>
</html>
POST/PUT Requests
You can also send a POST/PUT request through Scrapingdog API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.
require 'net/http'
require 'json'
## Replace POST with PUT to send a PUT request instead
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/anything"
}
uri = URI('https://api.scrapingdog.com/scrape')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.post(uri, { "foo" => "bar"}.to_json, "Content-Type" => "application/json")
print(website_content.body)
## For form data
params = {
:api_key => "APIKEY",
:url => "http://httpbin.org/anything"
}
uri = URI('https://api.scrapingdog.com/scrape')
uri.query = URI.encode_www_form(params)
req = Net::HTTP::Post.new(uri)
req.set_form_data('foo' => 'bar')
website_content = Net::HTTP.start(uri.hostname, uri.port) {|http|
http.request(req)
}
print(website_content.body)
- (string) api_key
- (string) url
{
"args": {},
"data": "{\"foo\":\"bar\"}",
"files": {},
"form": {},
"headers": {
"Accept": "application/json",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "13",
"Content-Type": "application/json; charset=utf-8",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
},
"json": {
"foo": "bar"
},
"method": "POST",
"origin": "25.16.48.78, 25.16.48.78",
"url": "https://httpbin.org/anything"
}
Google Search API
Using Google Search API you can scrape google search results without worrying about the proxy rotation and data parsing. Our API is the fastest, reliable and cheaper too. Each successful request will cost you 20 API credits.
#Through google search query
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "football",
=> 10,
=> "us",
=> 0
}
uri = URI('https://api.scrapingdog.com/google')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
#Through google search URL
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "https://www.google.com/search?q=pizza"
}
uri = URI('https://api.scrapingdog.com/google')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key
- (string) query - Google search query string or a google page url.
- (string) results - Number of results you want to scrape. It goes from 1 to 100.
- (string) country - Name of the country. The name should be in ISO format. Right now we support 15 countries.
- (string) page - It could be any number, starting from 0.
country | Country Name |
---|---|
us | United States |
cn | China |
au | Australia |
de | Germany |
fr | France |
ca | Canada |
it | Italy |
in | India |
ru | Russia |
mx | Mexico |
gb | United Kingdom |
sg | Singapore |
ch | Chile |
nl | Netherlands |
be | Belgium |
{
"Data": [
{
"link": "https://es.wikipedia.org/wiki/F%C3%BAtbol",
"title": "Fútbol - Wikipedia, la enciclopedia libre",,
"description": "El fútbol o futbol (del inglés británico football, traducido como balompié) es un deporte de equipo jugado entre dos conjuntos de once jugadores cada uno y ..."
"position": 1
},
{
"link": "https://en.wikipedia.org/wiki/Football",
"title": "Football - Wikipedia",
"description": "Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. Unqualified, the word football normally means the form of ...",
"position": 2
},
...
]
}
Datacenter and Residential Proxies
Scrapingdog also provides a proxy server. We have a pool of more than 7M residential proxies and 40000 Datacenter proxies. There is no limit to proxies. You can Scrape, track, go anonymous, track ads, etc.
#Datacenter proxies
require 'httparty'
HTTParty::Basement.default_options.update( : false)
response = HTTParty.get('http://httpbin.org/ip', {
http_proxyaddr: "proxy.scrapingdog.com",
http_proxyport: "8081",
http_proxyuser: "scrapingdog",
http_proxypass: "APIKEY"
})
results = response.body
puts results
#Residential proxies
require 'httparty'
HTTParty::Basement.default_options.update( : false)
response = HTTParty.get('http://httpbin.org/ip', {
http_proxyaddr: "proxy.scrapingdog.com",
http_proxyport: "8081",
http_proxyuser: "scrapingdog",
http_proxypass: "APIKEY-country=random"
})
results = response.body
puts results
- The username for the proxy is "scrapingdog" and the password is your API key. You can pass parameters to the proxy by adding them to the API key, separated by hyphen. For example, if you want to use a US residential proxy, the api key would be "5e36726387872864823-country=us". You can geotarget any country. If you want to use datacenter proxy then you don't have to pass any extra parameter with the API key. Also, you can use random residential proxy by passing "country=random" with your API key in the proxy.
- Your code should be configures to not verify the SSL.
- Each residential proxy will cost you 5 requests credits and each datacenter proxy will cost 1 request credit.
{"origin":"25.16.48.78"}
Scrape Linkedin Jobs💼.
With our dedicated Linkedin Jobs Scraper API you can scrape jobs at scale without parsing raw HTML. You just have to pass four query i.e. api_key, geoid, field and the page. One API call will cost 5 requests credit. Learn more about Linkedin Jobs API.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "Python",
=> "100293800"
=> "1"
}
uri = URI('https://api.scrapingdog.com/linkedinjobs')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) field is the type of job you want to scrape.
- (string) geoid is the unique location id issued by linkedin itself. You can find it inside the linkedin jobs url.
- (string) page is the page number of linkedin jobs page. It should be greater than 0. For each page you will get 25 jobs or less.
Scrape Linkedin User Profile.
Scrapingdog also provides API to scrape linkedin. You just have to pass three query i.e. api_key, type and the linkedin Id of the user linkId. One API call will cost 300 requests credit.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "profile",
=> "rbranson"
}
uri = URI('https://api.scrapingdog.com/linkedin')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key
- (string) type "profile"
- (string) linkId of the User Profile. You can find it in linkedin URL.
Scrape Linkedin Company Page.
Scrapingdog also provides API to scrape linkedin Company Page. You just have to pass three query i.e. api_key, type and the linkedin Id of the company linkId. One API call will cost 300 requests credit.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "company",
=> "amazon"
}
uri = URI('https://api.scrapingdog.com/linkedin')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key
- (string) type "company"
- (string) linkId of the Company Page. You can find it in linkedin URL.
🏠 Scrape Zillow Properties.
With this dedicated scraper for zillow you will get parsed data from any property page of Zillow. To access this GET API you have to pass three queries api_key, url and the listing. "listing" parameter helps system to analyze whether the page has a list of properties or not.
require 'net/http'
require 'json'
params = {
=> "5eaa61a6e562fc52fe763tr516e4653",
=> "https://www.zillow.com/homes/for_sale/",
=> true
}
uri = URI('https://api.scrapingdog.com/zillow')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key which is your API Key.
- (string) url of the zillow page you want to scrape.
- (boolean) listing will be true if the page has multiple property listed on it and false if the page is a dedicated page for a particular property.
Twitter Scraping API
With this dedicated scraper for twitter you will get parsed JSON data from any tweet. To access this GET API you have to pass three queries api_key, parsed and the url.
require 'net/http'
require 'json'
params = {
=> "5eaa61a6e562fc52fe763tr516e4653",
=> "https://twitter.com/elonmusk/status/1655608985058267139",
=> true
}
uri = URI('https://api.scrapingdog.com/twitter')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key which is your API Key.
- (boolean) parsed will be true if you need data in JSON form. If it is false then you will get raw HTML from twitter.
- (string) url will be the url of the tweet.
📸 Screenshot API
You can take a screenshot of any page using this API. If you want a full page screenshot then just add &fullPage=true to your api url.
require 'net/http'
require 'json'
params = {
=> "APIKEY",
=> "https://www.scrapingdog.com",
=> true
}
uri = URI('https://api.scrapingdog.com/screenshot')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key
- (string) url - Target URL
- (boolean) fullPage - true/false according to your requirement.
Account Information
If you want to monitor your API usage then you can use this API. You just need to pas your API key as a query. You will get your total request limit and and the request you have already used.
require 'net/http'
require 'json'
params = {
=> "APIKEY"
}
uri = URI('https://api.scrapingdog.com/account')
uri.query = URI.encode_www_form(params)
website_content = Net::HTTP.get(uri)
print(website_content)
- (string) api_key
{"requestLimit":1208653,"requestUsed":2341}
Basic Usage
Scrapingdog API exposes a single API endpoint, simply send a GET request to https://api.scrapingdog.com/scrape with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/scrape?api_key=" + apiKey + "&url=http://httpbin.org/ip&dynamic=false";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key
- (string) url
- (boolean) dynamic
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Javascript Rendering (default=true)
If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Premium plans. To render javascript, simply set dynamic=true and we will use a headless Google Chrome instance to fetch the page. Each request with normal rotating proxies will cost 5 credits and 25 credits with premium proxies.
To fetch the URL without using a headless browser, use the dynamic=false parameter in the GET request.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/scrape?api_key=" + apiKey + "&url=http://httpbin.org/ip";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key
- (string) url
- (boolean) dynamic
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"192.15.81.132"}
</pre>
</body>
</html>
Wait for a fixed amount of time (default=0)
If you are crawling a heavy website then you might need this parameter. This will help you to load complete HTML before returning the results. Our browsers will wait for that particular time before returning the complete HTML.
Use the wait parameter with a value in milliseconds between 0 and 35000.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/scrape?api_key=" + apiKey + "&url=http://httpbin.org/ip&wait=5000";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key
- (string) url
- (integer) wait
Passing Custom Headers
If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set custom_headers=true. Only use this feature in order to get customized results, do not use this feature in order to
avoid blocks, we handle that internally.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/scrape?api_key=" + apiKey + "&url=http://httpbin.org/anything&custom_headers=true";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection httpURLConnection = (HttpURLConnection) urlForGetRequest.openConnection();
httpURLConnection.setRequestProperty("Content-Type", "application/json");
httpURLConnection.setRequestProperty("X-MyHeader", "123");
httpURLConnection.setRequestMethod("GET");
int responseCode = httpURLConnection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) url
- (boolean) custom_headers
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{
"args":{},
"data":"",
"files":{},
"form":{},
"headers": {
"Accept":"*/*",
"Accept-Encoding":"gzip, deflate",
"Cache-Control":"max-age=259200",
"Connection":"close",
"Host":"httpbin.org",
"Referer":"http://httpbin.org",
"Timeout":"10000",
"User-Agent":"curl/7.54.0",
"X-Myheader":"123"
},
"json":null,
"method":"GET",
"origin":"45.72.0.249",
"url":"http://httpbin.org/anything"
}
</pre>
</body>
</html>
Sessions
To reuse the same proxy for multiple requests, simply use the &session_number= boo (e.g. session_number=666). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 60 seconds after the last usage.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/scrape?api_key=" + apiKey + "&url=http://httpbin.org/ip&session_number=123";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) url
- (string) api_key
- (integer) session_number
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Geographic Location
To ensure your requests come from a particular country, please use the ISO code of the country (e.g. country=us). United States (us) geotargeting is available on the Startup plan and higher. PRO plan customers also have access to Canada (ca), United Kingdom (uk), Russia (ru), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Italy (it), China (cn), and Australia (au). Other countries are available to PRO customers upon request.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/scrape?api_key=" + apiKey + "&url=http://httpbin.org/ip&country=gb";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key
- (string) url
- (string) country
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"27.63.83.45"}
</pre>
</body>
</html>
Premium Residential Proxies
For a few particularly difficult to scrape sites, we also maintain a private internal service of residential and mobile IPs. This service is only available to users on the PRO plan or higher. Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium pool will be charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit). To send a request through our premium proxy service, please use the premium=true query.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/scrape?api_key=" + apiKey + "&url=http://httpbin.org/ip&premium=true";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) url
- (boolean) premium
<html>
<head>
</head>
<body>
<pre style="word-wrap: break-word; white-space: pre-wrap;">
{"origin":"25.16.48.78"}
</pre>
</body>
</html>
POST/PUT Requests
You can also send a POST/PUT request through Scrapingdog API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.
try {
String apiKey = "YOURAPIKEY";
String url = "https://api.scrapingdog.com/scrape?api_key=" + apiKey + "&url=http://httpbin.org/ip";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key
- (string) url
{
"args": {},
"data": "{\"foo\":\"bar\"}",
"files": {},
"form": {},
"headers": {
"Accept": "application/json",
"Accept-Encoding": "gzip, deflate",
"Content-Length": "13",
"Content-Type": "application/json; charset=utf-8",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
},
"json": {
"foo": "bar"
},
"method": "POST",
"origin": "25.16.48.78, 25.16.48.78",
"url": "https://httpbin.org/anything"
}
Google Search API
Using Google Search API you can scrape google search results without worrying about the proxy rotation and data parsing. Our API is the fastest, reliable and cheaper too. Each successful request will cost you 20 API credits.
#Through google search query
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/google?api_key=" + apiKey + "&query=football&page=0&results=10&country=us";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
#Through google search URL
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/google?api_key=" + apiKey + "&query=https://www.google.com/search?q=pizza";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key
- (string) query - Google search query string or a google page url.
- (string) results - Number of results you want to scrape. It goes from 1 to 100.
- (string) country - Name of the country. The name should be in ISO format. Right now we support 15 countries.
- (string) page - It could be any number, starting from 0.
country | Country Name |
---|---|
us | United States |
cn | China |
au | Australia |
de | Germany |
fr | France |
ca | Canada |
it | Italy |
in | India |
ru | Russia |
mx | Mexico |
gb | United Kingdom |
sg | Singapore |
ch | Chile |
nl | Netherlands |
be | Belgium |
{
"Data": [
{
"link": "https://es.wikipedia.org/wiki/F%C3%BAtbol",
"title": "Fútbol - Wikipedia, la enciclopedia libre",,
"description": "El fútbol o futbol (del inglés británico football, traducido como balompié) es un deporte de equipo jugado entre dos conjuntos de once jugadores cada uno y ..."
"position": 1
},
{
"link": "https://en.wikipedia.org/wiki/Football",
"title": "Football - Wikipedia",
"description": "Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal. Unqualified, the word football normally means the form of ...",
"position": 2
},
...
]
}
Datacenter and Residential Proxies
Scrapingdog also provides a proxy server. We have a pool of more than 7M residential proxies and 40000 Datacenter proxies. There is no limit to proxies. You can Scrape, track, go anonymous, track ads, etc.
#Datacenter proxies
try {
String apiKey = "APIKEY";
String proxy = "https://scrapingdog:" + apiKey + "@proxy.scrapingdog.com";
URL server = new URL("https://httpbin.org/ip");
Properties systemProperties = System.getProperties();
systemProperties.setProperty("http.proxyHost", proxy);
systemProperties.setProperty("http.proxyPort", "8001");
HttpURLConnection httpURLConnection = (HttpURLConnection) server.openConnection();
httpURLConnection.connect();
String readLine = null;
int responseCode = httpURLConnection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
#Residential proxies
try {
String apiKey = "APIKEY";
String proxy = "https://scrapingdog:" + apiKey + "[email protected]";
URL server = new URL("https://httpbin.org/ip");
Properties systemProperties = System.getProperties();
systemProperties.setProperty("http.proxyHost", proxy);
systemProperties.setProperty("http.proxyPort", "8001");
HttpURLConnection httpURLConnection = (HttpURLConnection) server.openConnection();
httpURLConnection.connect();
String readLine = null;
int responseCode = httpURLConnection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- The username for the proxy is "scrapingdog" and the password is your API key. You can pass parameters to the proxy by adding them to the API key, separated by hyphen. For example, if you want to use a US residential proxy, the api key would be "5e36726387872864823-country=us". You can geotarget any country. If you want to use datacenter proxy then you don't have to pass any extra parameter with the API key. Also, you can use random residential proxy by passing "country=random" with your API key in the proxy.
- Your code should be configures to not verify the SSL.
- Each residential proxy will cost you 5 requests credits and each datacenter proxy will cost 1 request credit.
{"origin":"25.16.48.78"}
Scrape Linkedin Jobs💼.
With our dedicated Linkedin Jobs Scraper API you can scrape jobs at scale without parsing raw HTML. You just have to pass four query i.e. api_key, geoid, field and the page. One API call will cost 5 requests credit. Learn more about Linkedin Jobs API.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/linkedinjobs?api_key=" + apiKey + "&field=Python&geoid=100293800&page=1";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) field is the type of job you want to scrape.
- (string) geoid is the unique location id issued by linkedin itself. You can find it inside the linkedin jobs url.
- (string) page is the page number of linkedin jobs page. It should be greater than 0. For each page you will get 25 jobs or less.
Scrape Linkedin User Profile.
Scrapingdog also provides API to scrape linkedin. You just have to pass three query i.e. api_key, type and the linkedin Id of the user linkId. One API call will cost 300 requests credit.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/linkedin?api_key=" + apiKey + "&type=profile&linkId=rbranson";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key
- (string) type "profile"
- (string) linkId of the User Profile. You can find it in linkedin URL.
Scrape Linkedin Company Page.
Scrapingdog also provides API to scrape linkedin Company Page. You just have to pass three query i.e. api_key, type and the linkedin Id of the company linkId. One API call will cost 300 requests credit.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/linkedin?api_key=" + apiKey + "&type=company&linkId=scrapingdog";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key
- (string) type "company"
- (string) linkId of the Company Page. You can find it in linkedin URL.
🏠 Scrape Zillow Properties.
With this dedicated scraper for zillow you will get parsed data from any property page of Zillow. To access this GET API you have to pass three queries api_key, url and the listing. "listing" parameter helps system to analyze whether the page has a list of properties or not.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/zillow?api_key=" + apiKey + "&url=https://www.zillow.com/homes/for_sale/&listing=true";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key which is your API Key.
- (string) url of the zillow page you want to scrape.
- (boolean) listing will be true if the page has multiple property listed on it and false if the page is a dedicated page for a particular property.
Twitter Scraping API
With this dedicated scraper for twitter you will get parsed JSON data from any tweet. To access this GET API you have to pass three queries api_key, parsed and the url.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/twitter?api_key=" + apiKey + "&url=https://twitter.com/elonmusk/status/1655608985058267139&parsed=true";
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key which is your API Key.
- (boolean) parsed will be true if you need data in JSON form. If it is false then you will get raw HTML from twitter.
- (string) url will be the url of the tweet.
📸 Screenshot API
You can take a screenshot of any page using this API. If you want a full page screenshot then just add &fullPage=true to your api url.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/screenshot?api_key=" + apiKey + "&url=https://www.scrapingdog.com&fullPage=true;
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key
- (string) url - Target URL
- (boolean) fullPage - true/false according to your requirement.
Account Information
If you want to monitor your API usage then you can use this API. You just need to pas your API key as a query. You will get your total request limit and and the request you have already used.
try {
String apiKey = "APIKEY";
String url = "https://api.scrapingdog.com/account?api_key=" + apiKey;
URL urlForGetRequest = new URL(url);
String readLine = null;
HttpURLConnection conection = (HttpURLConnection) urlForGetRequest.openConnection();
conection.setRequestMethod("GET");
int responseCode = conection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
BufferedReader in = new BufferedReader(new InputStreamReader(conection.getInputStream()));
StringBuffer response = new StringBuffer();
while ((readLine = in.readLine()) != null) {
response.append(readLine);
}
in.close();
System.out.println(response.toString());
} else {
throw new Exception("Error in API Call");
}
} catch (Exception ex) {
ex.printStackTrace();
}
- (string) api_key
{"requestLimit":1208653,"requestUsed":2341}