Nodejs offers multiple HTTP client options and Axios is one of them. Axios has by far the best community support compared to other libraries like Fetch, Unirest, etc. It provides a stable API through which you can make XMLHttpRequests. Using a middleware-like function you can customize the pattern of request. It supports GET, POST, DELETE, PUT, etc methods. Axios is built on top of Promises, making it easy to work with asynchronous code and response handling.
Axios is a very powerful tool when it comes to web scraping. In this article, we will understand how we can scrape a website with the combination of proxy and Axios.
Requirements
For this tutorial, I hope you have already installed the latest version of Nodejs on your machine. If not then you can download it from here.
Now, let’s set up our coding playground. First, we will create a folder to keep our Nodejs script.
mkdir axiostut
cd axiostut
npm init
npm i axios
Once this is done we can create a JS file in which we will learn how Axios works with proxy. I am naming the file as axiostut.js
.
How Axios works?
In this section, I will explain step by step how Axios works.
How to make a Basic GET Request
Let’s first start with the basic HTTP GET request. For this entire tutorial, we are going to use this website for scraping.
const axios = require('axios');
async function scraping() {
try {
// Make a GET request to https://httpbin.org/ip
const response = await axios.get('https://httpbin.org/ip');
// Check if the request was successful (status code 200)
if (response.status === 200) {
// Parse JSON response
const ipAddress = response.data.origin;
// Log or use the IP address
console.log('Your IP address is:', ipAddress);
} else {
console.error(`Error: ${response.status} - ${response.statusText}`);
}
} catch (error) {
// Handle errors
console.error('Error:', error.message);
}
}
// Call the function to get the IP address
scraping();
Let me explain the code in brief.
- We use the
axios.get
method to make a GET request to https://httpbin.org/ip. - The response is checked for a successful status code
200
. - If the request is successful, the IP address is extracted from the JSON response and logged.
- If there’s an error during the request, it is caught and logged.
How to use a proxy with Axios?
For this example, we can use any public proxy. Take any free proxy, from this list.
const axios = require('axios');
let config= {
method: 'get',
url: 'https://httpbin.org/ip',
proxy: {
host: '20.157.194.61',
port: 80
}
};
async function getIpAddress() {
try {
// Make a GET request to https://httpbin.org/ip
const response = await axios(config);
// Check if the request was successful (status code 200)
if (response.status === 200) {
// Parse JSON response
const ipAddress = response.data.origin;
// Log or use the IP address
console.log('Your IP address is:', ipAddress);
} else {
console.error(`Error: ${response.status} - ${response.statusText}`);
}
} catch (error) {
// Handle errors
console.error('Error:', error.message);
}
}
// Call the function to get the IP address
getIpAddress();
In the above code, config
is an object that contains configuration options for making an HTTP request using the Axios library. Here’s a breakdown of the properties in the config
object.
method: 'get'
-Specifies the HTTP method for the request. In this case, it is set to ‘get,’ indicating an HTTP GET request.url: 'https://httpbin.org/ip'
-Specifies the target URL for the HTTP request. The request will be made to the ‘https://httpbin.org/ip‘ endpoint, which returns information about the requester’s IP address.proxy: { host: '20.157.194.61', port: 80 }
-Specifies a proxy configuration for the request. Theproxy
property is an object that includes thehost
(IP address) andport
of the proxy server. This configuration is optional and is used here to demonstrate how to request a proxy.
How to use a password-protected proxy with Axios?
To use the proxy which is protected by a password you can simply pass the username and the password to the config object.
const axios = require('axios');
let config= {
method: 'get',
url: 'https://httpbin.org/ip',
proxy: {
host: '94.103.159.29',
port: 8080,
auth: {
username: 'Your-Username',
password: 'Your-Password',
}
}
};
async function getIpAddress() {
try {
// Make a GET request to https://httpbin.org/ip
const response = await axios(config);
// Check if the request was successful (status code 200)
if (response.status === 200) {
// Parse JSON response
const ipAddress = response.data.origin;
// Log or use the IP address
console.log('Your IP address is:', ipAddress);
} else {
console.error(`Error: ${response.status} - ${response.statusText}`);
}
} catch (error) {
// Handle errors
console.error('Error:', error.message);
}
}
// Call the function to get the IP address
getIpAddress();
We have just passed an auth object with the properties username and password. Once you run this code you will get this output.
Your IP address is: 94.103.159.29
How to rotate proxies with Axios?
Many crawler-sensitive websites like Amazon, Walmart, LinkedIn, etc will block you if you keep scraping these websites with just a single IP. Headers are important too but changing the IPs on every request is as critically important.
const axios = require('axios');
let proxy_arr=[
{host: '69.51.19.191',
port: 8080,
auth: {
username: 'Your-Username',
password: 'Your-Password',
}},
{host: '69.51.19.193',
port: 8080,
auth: {
username: 'Your-Username',
password: 'Your-Password',
}},
{host: '69.51.19.195',
port: 8080,
auth: {
username: 'Your-Username',
password: 'Your-Password',
}},
{host: '69.51.19.207',
port: 8080,
auth: {
username: 'Your-Username',
password: 'Your-Password',
}},
{host: '69.51.19.220',
port: 8080,
auth: {
username: 'Your-Username',
password: 'Your-Password',
}}]
let config= {
method: 'get',
url: 'https://httpbin.org/ip',
proxy:proxy_arr[Math.floor(Math.random() * 5)]
};
async function getIpAddress() {
try {
// Make a GET request to https://httpbin.org/ip
const response = await axios(config);
// Check if the request was successful (status code 200)
if (response.status === 200) {
// Parse JSON response
const ipAddress = response.data.origin;
// Log or use the IP address
console.log('Your IP address is:', ipAddress);
} else {
console.error(`Error: ${response.status} - ${response.statusText}`);
}
} catch (error) {
// Handle errors
console.error('Error:', error.message);
}
}
// Call the function to get the IP address
getIpAddress();
In the above code, I have created a proxy array that contains five proxy objects. Using Math.random
function we are passing these proxies on a random basis to the config object.
Now, every request will go through a different proxy and the chances of getting your scraper getting blocked will be very low.
How to use Scrapingdog Proxy with Axios?
For small-scale scraping above methods are fine and will do the job. But if you want to scrape millions of pages then you have to go with some premium web scraping API that can take this proxy management on an autopilot mode. You simply have to send a GET request and the API will handle all these headaches for you.
In this section, I will show you how Scrapingdog proxy can be used for scraping purposes. First of all, you have to sign up for the free pack from here.
The free pack will provide you with a generous 1000 credits and that is enough for you to test the service before proceeding with the paid plan. Once you sign up you will get an API key on your dashboard.
You have to pass this API key in the below code as your proxy password. You can read more about the proxies from the documentation.
const axios = require('axios');
let config= {
method: 'get',
url: 'http://httpbin.org/ip',
proxy:{
host: 'proxy.scrapingdog.com',
port: 8081,
auth: {
username: 'scrapingdog',
password: 'Your-API-key'
}
}
}
async function getIpAddress() {
try {
// Make a GET request to https://httpbin.org/ip
const response = await axios(config);
// Check if the request was successful (status code 200)
if (response.status === 200) {
// Parse JSON response
const ipAddress = response.data.origin;
// Log or use the IP address
console.log('Your IP address is:', ipAddress);
} else {
console.error(`Error: ${response.status} - ${response.statusText}`);
}
} catch (error) {
// Handle errors
console.error('Error:', error.message);
}
}
// Call the function to get the IP address
getIpAddress();
Scrapingdog has a proxy pool of more than 10M proxies which makes large-scale scraping seamless. Once you run this code after placing your API key, every run will print a new IP on the console.
It is a very economical solution for large-scale scraping. You just have to focus on data collection and the rest will be managed by Scrapingdog.
Conclusion
Axios is a very popular choice when it comes to web scraping and in this article, we saw how Axios can be used with proxies in different scenarios. We also understood the importance of proxy rotation and how APIs like Scrapingdog can take all the hassle of proxy rotation on autopilot mode.
I hope you like this little tutorial and if you do then please do not forget to share it with your friends and on social media.
Additional Resources
And there’s the list! At this point, you should feel comfortable writing your first web scraper to gather data from any website. Here are a few additional resources that you may find helpful during your web scraping journey: