Scrapingdog API Documentation

Our REST API provides access to all our data and services in a secure (HTTPS) way.
All REST API calls will return JSON or HTML results.


Our API endpoint is: https://api.scrapingdog.com/scrape

Built for Developers


Scrapingdog is quite easy to use and is designed to simplify Web Scraping.

A few things to consider before we get started:
  • Each request will be retried until it can be successfully completed (up to 60 seconds). Remember to set your timeout to 60 seconds to ensure this process goes smoothly. In cases where every request fails in 60 seconds we will return a 500 error, you may retry the request and you will not be charged for the unsuccessful request (you are only charged for successful requests, 200 and 404 status codes). Make sure to catch these errors! They will occur on roughly 1-2% of requests for hard to scrape websites.
  • There is no overage allowed on the free plan, if you exceed 1000 requests per month on the free plan, you will receive a 403 error.
  • Each request will return a string containing the raw html from the page requested, along with any headers and cookies.
  • If you exceed your plan concurrent connection limit, the API will respond with a 429 status code, this can be solved by slowing down your request rate

Authentication

You can authenticate to our API by providing your API key which you can obtain in the member area.

All requests must be made to our endpoint via HTTPS. You must authenticate for all API requests. Major rules are discussed here

Basic Usage

Scrapingdog API exposes a single API endpoint, simply send a GET request to https://api.scrapingdog.com/scrape with two query string parameters, api_key which contains your API key, and url which contains the url you would like to scrape.


curl "https://api.scrapingdog.com/scrape?api_key=5e5a97e5b1ca5b194f42da86&url=http://httpbin.org/ip"
Arguments:
  • (string) url

Response:
<html>
  <head>
  </head>
  <body>
    <pre style="word-wrap: break-word; white-space: pre-wrap;">
      {"origin":"27.63.83.45"}
    </pre>
  </body>
</html>

Javascript Rendering

If you are crawling a page that requires you to render the javascript on the page, we can fetch these pages using a headless browser. This feature is only available on the Premium plans. To render javascript, simply set dynamic=true and we will use a headless Google Chrome instance to fetch the page:


curl "https://api.scrapingdog.com/scrape?api_key=5e750530f030026c843fbefc646&url=http://httpbin.org/ip&dynamic=true"
queries:
  • (string) url
  • (boolean) dynamic

Response:
<html>
  <head>
  </head>
  <body>
    <pre style="word-wrap: break-word; white-space: pre-wrap;">
      {"origin":"192.15.81.132"}
    </pre>
  </body>
</html>

Passing Custom Headers

If you would like to keep the original request headers in order to pass through custom headers (user agents, cookies, etc.), simply set custom_headers=true. Only use this feature in order to get customized results, do not use this feature in order to avoid blocks, we handle that internally.


curl --header "X-customheader: bar" \
"https://api.scrapingdog.com/scrape?api_key=5e5a97e5b1ca5b194f42da86fr444356&url=http://httpbin.org/anything&custom_headers=true"
queries:
  • (string) url
  • (boolean) custom_headers

Response:
<html>
  <head>
  </head>
  <body>
    <pre style="word-wrap: break-word; white-space: pre-wrap;">
    {
      "args":{},
      "data":"",
      "files":{},
      "form":{},
      "headers": {
        "Accept":"*/*",
        "Accept-Encoding":"gzip, deflate",
        "Cache-Control":"max-age=259200",
        "Connection":"close",
        "Host":"httpbin.org",
        "Referer":"http://httpbin.org",
        "Timeout":"10000",
        "User-Agent":"curl/7.54.0",
        "X-Myheader":"123"
      },
      "json":null,
      "method":"GET",
      "origin":"45.72.0.249",
      "url":"http://httpbin.org/anything"
    }
    </pre>
  </body>
</html>

Sessions

To reuse the same proxy for multiple requests, simply use the &session_number= boo (e.g. session_number=666). The value of session can be any integer, simply send a new integer to create a new session (this will allow you to continue using the same proxy for each request with that session number). Sessions expire 60 seconds after the last usage.


curl "https://api.scrapingdog.com/scrape?api_key=5e3a0e5a97e5b1ca5b194f42da86&url=http://httpbin.org/ip&session_number=666"
curl "https://api.scrapingdog.com/scrape?api_key=5e3a0e5a97e5b1ca5b194f42da86&url=http://httpbin.org/ip&session_number=666"
queries:
  • (string) url
  • (integer) session_number

Response:
<html>
    <head>
    </head>
    <body>
      <pre style="word-wrap: break-word; white-space: pre-wrap;">
        {"origin":"27.63.83.45"}
      </pre>
    </body>
  </html>
  <html>
    <head>
    </head>
    <body>
      <pre style="word-wrap: break-word; white-space: pre-wrap;">
        {"origin":"27.63.83.45"}
      </pre>
    </body>
  </html>

Geographic Location

To ensure your requests come from a particular country, please use the ISO code of the country (e.g. country=us). United States (us) geotargeting is available on the Startup plan and higher. PRO plan customers also have access to Canada (ca), United Kingdom (uk), Russia (ru), Germany (de), France (fr), Spain (es), Brazil (br), Mexico (mx), India (in), Italy (it), China (cn), and Australia (au). Other countries are available to PRO customers upon request.


curl "https://api.scrapingdog.com/scrape?api_key=3e3a09b6ecde9f83856906c5e27dd646&url=http://httpbin.org/ip&country=us"
queries:
  • (string) url
  • (string) country

Response:
<html>
  <head>
  </head>
  <body>
    <pre style="word-wrap: break-word; white-space: pre-wrap;">
      {"origin":"27.63.83.45"}
    </pre>
  </body>
</html>

Premium Residential Proxies

For a few particularly difficult to scrape sites, we also maintain a private internal service of residential and mobile IPs. This service is only available to users on the PRO plan or higher. Requests through our premium residential and mobile pool are charged at 10 times the normal rate (every successful request will count as 10 API calls against your monthly limit), each request that uses both rendering javascript and our premium pool will be charged at 25 times the normal rate (every successful request will count as 25 API calls against your monthly limit). To send a request through our premium proxy service, please use the premium=true query.


curl "https://api.scrapingdog.com/scrape?api_key=5e5a97e5b1ca5b194f42da86d646&url=http://httpbin.org/ip&premium=true"
queries:
  • (string) url
  • (boolean) premium

Response:
<html>
  <head>
  </head>
  <body>
    <pre style="word-wrap: break-word; white-space: pre-wrap;">
      {"origin":"25.16.48.78"}
    </pre>
  </body>
</html>

POST/PUT Requests

You can also send a POST/PUT request through Scrapingdog API. The return value will be stringified, if you want to use it as JSON, you will want to parse it into a JSON object.


# To send PUT request just replace POST with PUT
curl -d 'foo=bar' \
-X POST \
"https://api.scrapingdog.com/scrape?api_key=5e5a97e5b1ca5b194f42da86c5e27dd646&url=http://httpbin.org/anything"


# For form data
curl -H 'Content-Type: application/x-www-form-urlencoded' \
-F 'foo=bar'
-X POST \
"https://api.scrapingdog.com/scrape?api_key=5e5a97e5b1ca5b194f42da86e27dd646&url=http://httpbin.org/anything"
queries:
  • None

Response:
{
  "args": {},
  "data": "{\"foo\":\"bar\"}",
  "files": {},
  "form": {},
  "headers": {
    "Accept": "application/json",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "13",
    "Content-Type": "application/json; charset=utf-8",
    "Host": "httpbin.org",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
  },
  "json": {
    "foo": "bar"
  },
  "method": "POST",
  "origin": "25.16.48.78, 25.16.48.78",
  "url": "https://httpbin.org/anything"
}

Get JSON response directly from your scraped HTML data

Scrapingdog also provides a feature to directly get JSON response from HTML data. Just past the name of "attribute", "name of the attribute" & "tag" as queries in the api.


curl "https://api.scrapingdog.com/scrape?api_key=3e3a09b6ecde9f83856906c5e27dd646&url=http://httpbin.org/ip&attribute=class&name=highlight&tag=div"
queries:
  • (string) attribute HTML attribute type class, id,etc
  • (string) name Name of the attribute
  • (string) tag Tag associates with that attribute i.e. div,a,span,etc

Response:
  {"origin":"25.16.48.78"}

Scrape Linkedin User Profile.

Scrapingdog also provides API to scrape linkedin. You just have to pass three query i.e. api_key, type and the linkedin Id of the user linkId. One API call will cost 250 requests credit.


curl "https://api.scrapingdog.com/linkedin/?api_key=5eaa61a6e562fc52fe763tr516e4653&type=profile&linkId=rbranson"
queries:
  • (string) type "profile"
  • (string) linkId of the User Profile. You can find it in linkedin URL.

Scrape Linkedin Company Page.

Scrapingdog also provides API to scrape linkedin Company Page. You just have to pass three query i.e. api_key, type and the linkedin Id of the company linkId. One API call will cost 250 requests credit.


curl "https://api.scrapingdog.com/linkedin/?api_key=5eaa61a6e562fc52fe763tr516e4653&type=company&linkId=scrapingdog"
queries:
  • (string) type "company"
  • (string) linkId of the Company Page. You can find it in linkedin URL.