Email Scraping has become a popular and efficient method for obtaining valuable online contact information. By learning how to scrape emails, businesses and individuals can expand their networks, gather leads, and conduct market research more effectively.
Email scraping can help businesses generate leads more often. You can enrich your CRM engines by scraping emails. However, a strategic approach is required to collect emails in bulk.
In this article, we will use Python and Google Sheets to collect emails from the web. First, we will use Web Scraping API to collect emails of prospects who can be targeted for product sales. Later, we will use Google Sheets to extract all the emails from the web page. So, even if you are a non-coder you can use the second method to collect emails.
Getting Started with the Essentials
In this article, we will use requests and beautifulsoup Python libraries for collecting leads. requests
will be used for making the HTTP connection with the target website and BeautifulSoup
will be used for parsing the email from raw HTML downloaded through requests
.
Create a folder and install these two libraries using pip.
mkdir emails
pip install requests
pip install beautifulsoup4
Once done create a python file inside this folder and name it whatever you like. I am naming it as emails.py
.
How to scrape emails from any website
Before starting you have to sign up for the free pack of Scrapingdog from here. Once you have created the account, Scrapingdog will add free 1000 credits to your account for scraping any website.
Your personal API key is available in the dashboard. You’ll need to use this API key in the Python script we’re about to code.
Now, let’s say I am a sales guy in an IT company and I have a responsibility to bring more business to the company through cold emailing. So, my first thought would be “How to find email contacts for companies seeking IT services?”.
In this case, I have to extract emails from the web. Of course, many companies need IT services. But here we will just target marketing agencies working in New York. We will create a web scraper using Scrapingdog’s Google Search scraping API.
If we want to search for emails of digital marketing companies in New York then this Google query would work for us.
import requests
api_key = "Your-API-Key"
url = "https://api.scrapingdog.com/google"
params = {
"api_key": api_key,
"query": ""digital marketing agency" "New York" "email" "@gmail.com" OR "@yahoo.com"",
"results": 10,
"country": "us",
"page": 0
"advance_search": false
}
response = requests.get(url, params=params)
if response.status_code == 200:
data = response.json()
print(data)
else:
print(f"Request failed with status code: {response.status_code}")
When you run this code, you’ll see a beautifully formatted JSON that looks like this.
{
"menu_items": [
{
"title": "Books",
"link": "https://www.google.com/search?q=%22digital+marketing+agency%22+%22New+York%22+%22email%22+%22%40gmail.com%22+OR+%22%40yahoo.com%22&sca_esv=f9749d82eb8de094&gl=us&hl=en&tbm=bks&source=lnms&sa=X&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQ_AUIBigB",
"position": 1
},
{
"title": "News",
"link": "https://www.google.com/search?q=%22digital+marketing+agency%22+%22New+York%22+%22email%22+%22%40gmail.com%22+OR+%22%40yahoo.com%22&sca_esv=f9749d82eb8de094&gl=us&hl=en&tbm=nws&source=lnms&sa=X&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQ_AUIBygC",
"position": 2
},
{
"title": "Videos",
"link": "https://www.google.com/search?q=%22digital+marketing+agency%22+%22New+York%22+%22email%22+%22%40gmail.com%22+OR+%22%40yahoo.com%22&sca_esv=f9749d82eb8de094&gl=us&hl=en&tbm=vid&source=lnms&sa=X&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQ_AUICCgD",
"position": 3
},
{
"title": "Images",
"link": "https://www.google.com/search?q=%22digital+marketing+agency%22+%22New+York%22+%22email%22+%22%40gmail.com%22+OR+%22%40yahoo.com%22&sca_esv=f9749d82eb8de094&gl=us&hl=en&tbm=isch&source=lnms&sa=X&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQ_AUICSgE",
"position": 4
},
{
"title": "Maps",
"link": "https://www.google.com/url?url=https://maps.google.com/maps%3Fq%3D%2522digital%2Bmarketing%2Bagency%2522%2B%2522New%2BYork%2522%2B%2522email%2522%2B%2522%2540gmail.com%2522%2BOR%2B%2522%2540yahoo.com%2522%26gl%3Dus%26hl%3Den%26um%3D1%26ie%3DUTF-8%26ved%3D1t:200713%26ictx%3D111&rct=j&q=&esrc=s&opi=89978449&sa=U&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQiaAMCAooBQ&usg=AOvVaw1WZ-lPjfwusxNjwkcT6sRe",
"position": 5
},
{
"title": "Shopping",
"link": "https://www.google.com/url?url=/search%3Fq%3D%2522digital%2Bmarketing%2Bagency%2522%2B%2522New%2BYork%2522%2B%2522email%2522%2B%2522%2540gmail.com%2522%2BOR%2B%2522%2540yahoo.com%2522%26sca_esv%3Df9749d82eb8de094%26gl%3Dus%26hl%3Den%26tbm%3Dshop%26source%3Dlnms%26ved%3D1t:200713%26ictx%3D111&rct=j&q=&esrc=s&opi=89978449&sa=U&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQiaAMCAsoBg&usg=AOvVaw2qgQahYBJeDI3OL-yXJZaF",
"position": 6
}
],
"organic_results": [
{
"title": "10+ Digital Marketing Companies in New York- 2024",
"displayed_link": "https://www.dmthriveagency.com › Blogs",
"snippet": "WebFX provides the best Digital Marketing services in New York. WebFX offers SEO, SEM, email marketing, website development, etc. They have a dedicated digital ...",
"link": "https://www.dmthriveagency.com/digital-marketing-companies-in-new-york/",
"rank": 1
},
{
"title": "Fututodo: local Digital Marketing Agency from New York",
"displayed_link": "https://fututodo.com",
"snippet": "thisismyemail@gmail.com. Logo. We create end-to-end. Full-service, local New York digital marketing agency that's all-in on results. We help businesses ...",
"link": "https://fututodo.com/",
"rank": 2
},
{
"title": "Limelight Digital Agency - Digital Marketing Agency - New York",
"displayed_link": "https://reportgarden.com › Agencies › New York › Limelight Digital Agency",
"snippet": "limelightdevs@gmail.com · 347-853-6977. Founded in ... Digital Marketing Agency ... Content Marketing Email Marketing Social Media Management Web Design Development ...",
"link": "https://reportgarden.com/agencies/limelight-digital-agency/",
"rank": 3
},
{
"title": "Digital Marketing in New york - - Social Theka",
"displayed_link": "https://socialtheka.com › digital-marketing-in-new-york",
"snippet": "socialtheka@gmail.com · +91 78887-35337 · +91 6280-614 518 · LOGO · Theka Story · SEO ... Digital Marketing Agency in New York. overlay. NEW YORK. Top-Notch ...",
"link": "https://socialtheka.com/digital-marketing-in-new-york/",
"rank": 4
},
{
"title": "Top Digital Marketing Agency in New York City | Expert Services",
"displayed_link": "https://www.adaptracorp.com",
"snippet": "13477843553 · adaptracorp@gmail.com. Adaptra Corp. Adaptra Corp. Premier Digital Marketing Agency in New York City. At our digital marketing agency in New York ...",
"link": "https://www.adaptracorp.com/",
"rank": 5
},
{
"title": "M&M Social Media: SEO, Web Design | Long Island, NY",
"displayed_link": "https://www.mnmsocialmedia.com",
"snippet": "Sign up with your email address to receive social media tips, the latest updates and more! ... New York. Contact M&M Social Media's online marketing ...",
"link": "https://www.mnmsocialmedia.com/",
"rank": 6
},
{
"title": "New York Digital Marketing Agency",
"displayed_link": "https://www.matthewleedma.com › ... › Search Engine Optimization (SEO)",
"snippet": "Jun 22, 2024 · New York Digital Marketing Agency: Elevate Your Business to New Heights with MLDMA ... Contact us via our form, email, or phone. We'll ...",
"link": "https://www.matthewleedma.com/post/new-york-digital-marketing-agency",
"rank": 7
},
{
"title": "ADMA - A Digital Marketing Agency - Nextdoor",
"displayed_link": "https://nextdoor.com › pages › adma-a-digital-marketing-agency-new-york-city-ny",
"snippet": "We are ADMA, your local Digital Marketing Agency right here in Staten Island, New York. ... email. Our Lead Generation services are designed to bring you ...",
"link": "https://nextdoor.com/pages/adma-a-digital-marketing-agency-new-york-city-ny/",
"rank": 8
},
{
"title": "Contact us - Chauncey Agency",
"displayed_link": "https://chauncey.agency › contact-us",
"snippet": "Sunday 9AM-5PM. Digital Marketing Agency Chauncey Agency Brooklyn New York ... Email: 385chaunceyagency@gmail.com. 929-302-0020 and 973-666-0694; 1703 ...",
"link": "https://chauncey.agency/contact-us/",
"rank": 9
},
{
"title": "Award Winning Digital Marketing Agency in NYC - Geek in NY",
"displayed_link": "https://www.geekinny.com › award-winning-digital-marketing-agency-in-nyc",
"snippet": "Henceforth, choose our award-winning digital marketing agency in NYC. New York City, New York: January, 2019 ... Email marketing is digital marketing, whereas ...",
"link": "https://www.geekinny.com/award-winning-digital-marketing-agency-in-nyc/",
"rank": 10
}
],
"pagination": {
"page_no": {}
}
}
If you check some of the snippet
properties you will see some emails in it. Now, we will use regular expressions to extract emails from snippet
.
Using Regular Expression to extract the emails
We have to prepare a regular expression that can extract the emails from snippet
text.
r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
Using this email pattern we can extract the emails from text. Let’s apply it to the JSON results.
import requests
import re
api_key = "Your-api-key"
url = "https://api.scrapingdog.com/google"
email_pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
allemails=[]
params = {
"api_key": api_key,
"query": '"digital marketing agency" "New York" "email" "@gmail.com" OR "@yahoo.com"',
"results": 10,
"country": "us",
"page": 0,
"advance_search": 'false'
}
response = requests.get(url, params=params)
if response.status_code == 200:
data = response.json()
total_results= len(data['organic_results'])
for i in range (0,total_results):
emails = re.findall(email_pattern, data['organic_results'][i]['snippet'])
allemails.append(emails)
print(allemails)
else:
print(f"Request failed with status code: {response.status_code}")
Once you run this code you will get this list of emails.
[[], ['thisismyemail@gmail.com'], ['socialtheka@gmail.com'], ['limelightdevs@gmail.com'], ['adaptracorp@gmail.com'], [], ['dmthriveagency@gmail.com'], [], [], ['jettercy@gmail.com']]
Before sending the emails it’s important to do a verify all these emails.
Scraping emails through Google Sheets
If you are a non-coder, scraping with Python would be difficult. But we can do the same job with Google Sheets. Scraping such information with Google Sheets would be more productive comparatively. You can just scrape and push the data directly to your CRM engine. This method is beneficial when learning how to build email lists for marketing, as it allows for a streamlined and organized approach to gathering contact information efficiently.
In this example, I will scrape the emails of faculty members of Princeton University. We will use built-in functions like IMPORTHTML
and IMPORTXML
of Google Sheets to scrape and parse the results. I would recommend you to read web scraping with Google Sheets before proceeding with this section.
In the above image, you can see the XPath query for the email of a single user.
//*[@id='block-system-main']/div/div[2]/div[1]/div[2]/div[4]/span[1]
Above XPath can select the first email only. I need the emails of all the faculty members. So for that, I would have to find an XPath that can select all the emails at once.
//*[@id='block-system-main']/div/div[2]/div/div[2]/div[4]/span[1]
I got all the emails selected with the above XPath. Now, we can apply the formula to the Google sheet.
=IMPORTXML("https://www.cs.princeton.edu/people/faculty", "//*[@id='block-system-main']/div/div[2]/div/div[2]/div[4]/span[1]")
Once you apply this formula you get this result.
We got the username and the domain name but we have to join them and remove those brackets to create a legit email. We have to create a formula for this one too.
=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2:B, " ", ""), "(", ""), ")", ""))
ARRAYFORMULA
applies multiple SUBSTITUTE
functions to every cell in column B
starting from cell B2
downwards. Here’s what it does step-by-step:
ARRAYFORMULA allows a single formula to operate on a range of cells, so you don’t have to copy the formula to each cell manually.
SUBSTITUTE
function replaces a specific character with nothing (i.e., removes it)
- The innermost
SUBSTITUTE(B2:B, " ", "")
removes all spaces from each cell in theB2:B
range. - The second
SUBSTITUTE(..., "(", "")
removes all opening parentheses(
. - The outermost
SUBSTITUTE(..., ")", "")
removes all closing parentheses)
.
3. After processing each cell in the range B2:B
, the formula removes spaces and parentheses, leaving you with a cleaned string in each cell where the formula is applied.
This formula is useful for standardizing or “cleaning” data where email addresses or other text strings might have unwanted characters like spaces and parentheses.
I got all the emails with this formula and now I can send emails to all of them. Of course, every website will have a different approach to collecting emails. It’s also important to verify all the email addresses before proceeding. Now, I can push this data to any lead generation pipeline or any CRM engine.
Similarly, you can collect the names of all the faculty members. I am leaving that task for you.
Conclusion
We saw how we can use both technical and non-technical methods to extract emails from any website. First, using Python we scraped Google search results to collect emails. Then in the next section, we used Google Sheets. Later we also learned how we can clean the garbage string to collect useful data.
You have to change the email extraction strategy with every new website. But more or less the approach will remain the same.