GET 10% OFF on all Annual Plans. Use Code - FETCH2024

How To Scrape Email Addresses From Any Website

Scraping Email from any website

Table of Contents

Email Scraping has become a popular and efficient method for obtaining valuable online contact information. By learning how to scrape emails, businesses and individuals can expand their networks, gather leads, and conduct market research more effectively.

Email scraping can help businesses generate leads more often. You can enrich your CRM engines by scraping emails. However, a strategic approach is required to collect emails in bulk.

In this article, we will use Python and Google Sheets to collect emails from the web. First, we will use Web Scraping API to collect emails of prospects who can be targeted for product sales. Later, we will use Google Sheets to extract all the emails from the web page. So, even if you are a non-coder you can use the second method to collect emails.

Getting Started with the Essentials

In this article, we will use requests and beautifulsoup Python libraries for collecting leads. requests will be used for making the HTTP connection with the target website and BeautifulSoup will be used for parsing the email from raw HTML downloaded through requests.

Create a folder and install these two libraries using pip.

				
					mkdir emails
pip install requests
pip install beautifulsoup4
				
			

Once done create a python file inside this folder and name it whatever you like. I am naming it as emails.py.

How to scrape emails from any website

Before starting you have to sign up for the free pack of Scrapingdog from here. Once you have created the account, Scrapingdog will add free 1000 credits to your account for scraping any website.

scrapingdog dashboard

Your personal API key is available in the dashboard. You’ll need to use this API key in the Python script we’re about to code.

Now, let’s say I am a sales guy in an IT company and I have a responsibility to bring more business to the company through cold emailing. So, my first thought would be “How to find email contacts for companies seeking IT services?”.

In this case, I have to extract emails from the web. Of course, many companies need IT services. But here we will just target marketing agencies working in New York. We will create a web scraper using Scrapingdog’s Google Search scraping API.

If we want to search for emails of digital marketing companies in New York then this Google query would work for us.

				
					import requests
  
api_key = "Your-API-Key"
url = "https://api.scrapingdog.com/google"
  
params = {
    "api_key": api_key,
    "query": ""digital marketing agency" "New York" "email" "@gmail.com" OR "@yahoo.com"",
    "results": 10,
    "country": "us",
    "page": 0
    "advance_search": false
}
  
response = requests.get(url, params=params)
  
if response.status_code == 200:
    data = response.json()
    print(data)
else:
    print(f"Request failed with status code: {response.status_code}")
				
			

When you run this code, you’ll see a beautifully formatted JSON that looks like this.

				
					{
  "menu_items": [
    {
      "title": "Books",
      "link": "https://www.google.com/search?q=%22digital+marketing+agency%22+%22New+York%22+%22email%22+%22%40gmail.com%22+OR+%22%40yahoo.com%22&sca_esv=f9749d82eb8de094&gl=us&hl=en&tbm=bks&source=lnms&sa=X&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQ_AUIBigB",
      "position": 1
    },
    {
      "title": "News",
      "link": "https://www.google.com/search?q=%22digital+marketing+agency%22+%22New+York%22+%22email%22+%22%40gmail.com%22+OR+%22%40yahoo.com%22&sca_esv=f9749d82eb8de094&gl=us&hl=en&tbm=nws&source=lnms&sa=X&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQ_AUIBygC",
      "position": 2
    },
    {
      "title": "Videos",
      "link": "https://www.google.com/search?q=%22digital+marketing+agency%22+%22New+York%22+%22email%22+%22%40gmail.com%22+OR+%22%40yahoo.com%22&sca_esv=f9749d82eb8de094&gl=us&hl=en&tbm=vid&source=lnms&sa=X&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQ_AUICCgD",
      "position": 3
    },
    {
      "title": "Images",
      "link": "https://www.google.com/search?q=%22digital+marketing+agency%22+%22New+York%22+%22email%22+%22%40gmail.com%22+OR+%22%40yahoo.com%22&sca_esv=f9749d82eb8de094&gl=us&hl=en&tbm=isch&source=lnms&sa=X&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQ_AUICSgE",
      "position": 4
    },
    {
      "title": "Maps",
      "link": "https://www.google.com/url?url=https://maps.google.com/maps%3Fq%3D%2522digital%2Bmarketing%2Bagency%2522%2B%2522New%2BYork%2522%2B%2522email%2522%2B%2522%2540gmail.com%2522%2BOR%2B%2522%2540yahoo.com%2522%26gl%3Dus%26hl%3Den%26um%3D1%26ie%3DUTF-8%26ved%3D1t:200713%26ictx%3D111&rct=j&q=&esrc=s&opi=89978449&sa=U&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQiaAMCAooBQ&usg=AOvVaw1WZ-lPjfwusxNjwkcT6sRe",
      "position": 5
    },
    {
      "title": "Shopping",
      "link": "https://www.google.com/url?url=/search%3Fq%3D%2522digital%2Bmarketing%2Bagency%2522%2B%2522New%2BYork%2522%2B%2522email%2522%2B%2522%2540gmail.com%2522%2BOR%2B%2522%2540yahoo.com%2522%26sca_esv%3Df9749d82eb8de094%26gl%3Dus%26hl%3Den%26tbm%3Dshop%26source%3Dlnms%26ved%3D1t:200713%26ictx%3D111&rct=j&q=&esrc=s&opi=89978449&sa=U&ved=0ahUKEwiW34nP1LOJAxVJlYkEHUd0D4kQiaAMCAsoBg&usg=AOvVaw2qgQahYBJeDI3OL-yXJZaF",
      "position": 6
    }
  ],
  "organic_results": [
    {
      "title": "10+ Digital Marketing Companies in New York- 2024",
      "displayed_link": "https://www.dmthriveagency.com › Blogs",
      "snippet": "WebFX provides the best Digital Marketing services in New York. WebFX offers SEO, SEM, email marketing, website development, etc. They have a dedicated digital ...",
      "link": "https://www.dmthriveagency.com/digital-marketing-companies-in-new-york/",
      "rank": 1
    },
    {
      "title": "Fututodo: local Digital Marketing Agency from New York",
      "displayed_link": "https://fututodo.com",
      "snippet": "thisismyemail@gmail.com. Logo. We create end-to-end. Full-service, local New York digital marketing agency that's all-in on results. We help businesses ...",
      "link": "https://fututodo.com/",
      "rank": 2
    },
    {
      "title": "Limelight Digital Agency - Digital Marketing Agency - New York",
      "displayed_link": "https://reportgarden.com › Agencies › New York › Limelight Digital Agency",
      "snippet": "limelightdevs@gmail.com · 347-853-6977. Founded in ... Digital Marketing Agency ... Content Marketing Email Marketing Social Media Management Web Design Development ...",
      "link": "https://reportgarden.com/agencies/limelight-digital-agency/",
      "rank": 3
    },
    {
      "title": "Digital Marketing in New york - - Social Theka",
      "displayed_link": "https://socialtheka.com › digital-marketing-in-new-york",
      "snippet": "socialtheka@gmail.com · +91 78887-35337 · +91 6280-614 518 · LOGO · Theka Story · SEO ... Digital Marketing Agency in New York. overlay. NEW YORK. Top-Notch ...",
      "link": "https://socialtheka.com/digital-marketing-in-new-york/",
      "rank": 4
    },
    {
      "title": "Top Digital Marketing Agency in New York City | Expert Services",
      "displayed_link": "https://www.adaptracorp.com",
      "snippet": "13477843553 · adaptracorp@gmail.com. Adaptra Corp. Adaptra Corp. Premier Digital Marketing Agency in New York City. At our digital marketing agency in New York ...",
      "link": "https://www.adaptracorp.com/",
      "rank": 5
    },
    {
      "title": "M&M Social Media: SEO, Web Design | Long Island, NY",
      "displayed_link": "https://www.mnmsocialmedia.com",
      "snippet": "Sign up with your email address to receive social media tips, the latest updates and more! ... New York. Contact M&M Social Media's online marketing ...",
      "link": "https://www.mnmsocialmedia.com/",
      "rank": 6
    },
    {
      "title": "New York Digital Marketing Agency",
      "displayed_link": "https://www.matthewleedma.com › ... › Search Engine Optimization (SEO)",
      "snippet": "Jun 22, 2024 · New York Digital Marketing Agency: Elevate Your Business to New Heights with MLDMA ... Contact us via our form, email, or phone. We'll ...",
      "link": "https://www.matthewleedma.com/post/new-york-digital-marketing-agency",
      "rank": 7
    },
    {
      "title": "ADMA - A Digital Marketing Agency - Nextdoor",
      "displayed_link": "https://nextdoor.com › pages › adma-a-digital-marketing-agency-new-york-city-ny",
      "snippet": "We are ADMA, your local Digital Marketing Agency right here in Staten Island, New York. ... email. Our Lead Generation services are designed to bring you ...",
      "link": "https://nextdoor.com/pages/adma-a-digital-marketing-agency-new-york-city-ny/",
      "rank": 8
    },
    {
      "title": "Contact us - Chauncey Agency",
      "displayed_link": "https://chauncey.agency › contact-us",
      "snippet": "Sunday 9AM-5PM. Digital Marketing Agency Chauncey Agency Brooklyn New York ... Email: 385chaunceyagency@gmail.com. 929-302-0020 and 973-666-0694; 1703 ...",
      "link": "https://chauncey.agency/contact-us/",
      "rank": 9
    },
    {
      "title": "Award Winning Digital Marketing Agency in NYC - Geek in NY",
      "displayed_link": "https://www.geekinny.com › award-winning-digital-marketing-agency-in-nyc",
      "snippet": "Henceforth, choose our award-winning digital marketing agency in NYC. New York City, New York: January, 2019 ... Email marketing is digital marketing, whereas ...",
      "link": "https://www.geekinny.com/award-winning-digital-marketing-agency-in-nyc/",
      "rank": 10
    }
  ],
  "pagination": {
    "page_no": {}
  }
}
				
			

If you check some of the snippet properties you will see some emails in it. Now, we will use regular expressions to extract emails from snippet.

Using Regular Expression to extract the emails

We have to prepare a regular expression that can extract the emails from snippet text.

				
					r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
				
			

Using this email pattern we can extract the emails from text. Let’s apply it to the JSON results.

				
					import requests
import re

api_key = "Your-api-key"
url = "https://api.scrapingdog.com/google"

email_pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
allemails=[]

params = {
    "api_key": api_key,
    "query": '"digital marketing agency" "New York" "email" "@gmail.com" OR "@yahoo.com"',
    "results": 10,
    "country": "us",
    "page": 0,
    "advance_search": 'false'
}

response = requests.get(url, params=params)

if response.status_code == 200:
    data = response.json()
    total_results= len(data['organic_results'])
    for i in range (0,total_results):        
        emails = re.findall(email_pattern, data['organic_results'][i]['snippet'])
        allemails.append(emails)

    print(allemails)
else:
    print(f"Request failed with status code: {response.status_code}")
				
			

Once you run this code you will get this list of emails.

				
					[[], ['thisismyemail@gmail.com'], ['socialtheka@gmail.com'], ['limelightdevs@gmail.com'], ['adaptracorp@gmail.com'], [], ['dmthriveagency@gmail.com'], [], [], ['jettercy@gmail.com']]
				
			

Before sending the emails it’s important to do a verify all these emails.

Scraping emails through Google Sheets

If you are a non-coder, scraping with Python would be difficult. But we can do the same job with Google Sheets. Scraping such information with Google Sheets would be more productive comparatively. You can just scrape and push the data directly to your CRM engine. This method is beneficial when learning how to build email lists for marketing, as it allows for a streamlined and organized approach to gathering contact information efficiently.

In this example, I will scrape the emails of faculty members of Princeton University. We will use built-in functions like IMPORTHTML and IMPORTXML of Google Sheets to scrape and parse the results. I would recommend you to read web scraping with Google Sheets before proceeding with this section.

In the above image, you can see the XPath query for the email of a single user.

				
					//*[@id='block-system-main']/div/div[2]/div[1]/div[2]/div[4]/span[1]

				
			

Above XPath can select the first email only. I need the emails of all the faculty members. So for that, I would have to find an XPath that can select all the emails at once.

				
					//*[@id='block-system-main']/div/div[2]/div/div[2]/div[4]/span[1]

				
			

I got all the emails selected with the above XPath. Now, we can apply the formula to the Google sheet.

				
					=IMPORTXML("https://www.cs.princeton.edu/people/faculty", "//*[@id='block-system-main']/div/div[2]/div/div[2]/div[4]/span[1]")

				
			

Once you apply this formula you get this result.

We got the username and the domain name but we have to join them and remove those brackets to create a legit email. We have to create a formula for this one too.

				
					=ARRAYFORMULA(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2:B, " ", ""), "(", ""), ")", ""))

				
			

ARRAYFORMULA applies multiple SUBSTITUTE functions to every cell in column B starting from cell B2 downwards. Here’s what it does step-by-step:

  1. ARRAYFORMULA allows a single formula to operate on a range of cells, so you don’t have to copy the formula to each cell manually.
  2. SUBSTITUTE function replaces a specific character with nothing (i.e., removes it)
  • The innermost SUBSTITUTE(B2:B, " ", "") removes all spaces from each cell in the B2:B range.
  • The second SUBSTITUTE(..., "(", "") removes all opening parentheses (.
  • The outermost SUBSTITUTE(..., ")", "") removes all closing parentheses ).

3. After processing each cell in the range B2:B, the formula removes spaces and parentheses, leaving you with a cleaned string in each cell where the formula is applied.

This formula is useful for standardizing or “cleaning” data where email addresses or other text strings might have unwanted characters like spaces and parentheses.

I got all the emails with this formula and now I can send emails to all of them. Of course, every website will have a different approach to collecting emails. It’s also important to verify all the email addresses before proceeding. Now, I can push this data to any lead generation pipeline or any CRM engine.

Similarly, you can collect the names of all the faculty members. I am leaving that task for you.

Conclusion

We saw how we can use both technical and non-technical methods to extract emails from any website. First, using Python we scraped Google search results to collect emails. Then in the next section, we used Google Sheets. Later we also learned how we can clean the garbage string to collect useful data.

You have to change the email extraction strategy with every new website. But more or less the approach will remain the same.

Additional Resources

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked
My name is Manthan Koolwal and I am the founder of scrapingdog.com. I love creating scraper and seamless data pipelines.
Manthan Koolwal

Web Scraping with Scrapingdog

Scrape the web without the hassle of getting blocked

Recent Blogs

Building Make.com automation for linkedin profile scraping

Automating LinkedIn Profile Scraping using LinkedIn Scraper API & Make.com

In this read, we have used make.com, Scrapingdog's LinkedIn profile scraper API & Google sheets to extract data LinkedIn profiles. You can automate this process in Make.com by running a scheduler.

How to Scrape Google Local Results using Scrapingdog’s Google Local API

In this read, we have used Python & Scrapingdog's Google Local API to extract results from local results. Further, we have given a code to save the extracted data in CSV.