A Comprehensive Guide on Web Scraping with C#

TL;DR

C# + Selenium: install via NuGet, add ChromeDriver; open page, search, extract result titles.
Dynamic pages: click / await and retry until elements appear; add a 10 s timeout.
Hardening waits: simulate slow networks with ChromeNetworkConditions.
For scale / anti-bot and proxy rotation, the post recommends using a scraping API.

C# Web scraping could be a go-to choice since the language offers a wide range of tools. In this tutorial, we will use this programming language with Selenium. We will be designing a simple web scraper in part 1 and in part 2 we will scrape a dynamic website.

Web Scraping with Selenium (Part-I)

I am going to explain everything with a very simple step-by-step approach. So, first of all, I am going to create a new project.

I have created a demo project by the name of web scraping then we will install the Selenium library. We can do this by clicking the Tools tab and then the NuGet package manager.

Opening the NuGet Package Manager in Visual Studio

Then you can search for selenium and then I will pick the solution and install it.

Installing the Selenium package through NuGet in Visual Studio

Now, we also need to download the web driver which you can download from here.

Now, I am going to put it inside the bin folder so that I don’t have to specify a path within the code. I am going to open the folder in File Explorer

1bin > debug

And then you can just drag the chromedriver file to the debug folder. We are done with the installation. Now, we will start coding the scraper.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6  {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11           IWebDriver driver = new ChromeDriver();
12        }
13     }
14  }

When you want to make a controllable browser you just have to create an instance of this webdriver and since I am using the chrome web driver I am going to make a chrome fiber and I am just going to import all the packages.

If I run this code it will open a new browser. For this post, we are going to make a Google search scraper.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6  {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11           IWebDriver driver = new ChromeDriver();
12           driver.Navigate().GoToUrl("https://www.google.com")
13        }
14     }
15  }

Now, here we are navigating our page to the Google home page.

The Google home page opened by the Selenium C# script

Our scraper will scrape all those titles, descriptions, links, etc. So, to do that, we have to make it a search.

Chrome DevTools inspecting the Google search input field

There are many ways by which you can detect this input field like class name, id, or even XPath. Here we are going to use the XPath.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6  {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11           IWebDriver driver = new ChromeDriver();
12           driver.Navigate().GoToUrl("https://www.google.com")
13           var element = driver.findElement(By.XPath("/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/input"));        
14        }
15     }
16  }

Then we have to type a keyword in that input field in order to make some Google searches. We are going to use the SendKeys function for that. Then we are going to submit the query using Submit function.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11           IWebDriver driver = new ChromeDriver();
12           driver.Navigate().GoToUrl("https://www.google.com")
13           var element = driver.findElement(By.XPath("/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/input"));
14           element.SendKeys("webshop")
15           element.Submit();
16        }
17     }
18 }

So, our c# web scraper will be able to search for keyword webshop. Now, we will expand this scraper so that it will be able to scrape titles and other data from Google searches.

The Google search results page to be scraped

When we visit the above page we can do the same thing that we did with the input field, find some random element, and copy the XPath. Now, the problem in doing this is when you do it using XPath it will be for a very specific element and it might not apply to other elements.

Now, if you notice the XPath of two random titles, then you will find out that they are in a sequence and we can use a for loop to scrape those titles.

1XPath1 — //*[@id="rso"]/div[9]/div/div[1]/div[1]/a/h3 XPath2 — //*[@id="rso"]/div[10]/div/div[1]/div[1]/a/h3

So, now we can do something like this to find the elements using those XPaths.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11           IWebDriver driver = new ChromeDriver();
12           driver.Navigate().GoToUrl("https://www.google.com")
13           var element = driver.findElement(By.XPath("/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/input"));
14           element.SendKeys("webshop")
15           element.Submit();
16           var titles = driver.FindElements(By.XPath("//*[@id="rso"]/div[1]/div/div[1]/div[1]/a/h3"))
17        }
18     }
19 }

Now, we can loop through these elements and each one of these will be contained within this collection of elements.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11           IWebDriver driver = new ChromeDriver();
12           driver.Navigate().GoToUrl("https://www.google.com")
13           var element = driver.findElement(By.XPath("/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/input"));
14           element.SendKeys("webshop")
15           element.Submit();
16           var titles = driver.FindElements(By.XPath("//*[@id="rso"]/div[1]/div/div[1]/div[1]/a/h3"))
17           foreach(var title in titles)
18             {
19               Console.WriteLine(title.Text)
20             }
21             
22        }
23     }
24 }

When we run this we get this.

Console output showing the scraped Google search result titles

As you can notice we have managed to scrape all the titles and it has ignored the ads. This is one way you can do it. I did it using XPath but you can also do it using class names. Let’s try it with the class name as well.

When you inspect any element you will find the class name.

Our target class name is LC20lb, you can notice that in the above image. We will use Javascript to scrape it.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11          IWebDriver driver = new ChromeDriver();
12          driver.Navigate().GoToUrl("https://www.google.com")
13          var element = driver.findElement(By.XPath("/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/input"));
14          element.SendKeys("webshop")
15          element.Submit();
16          var titles = driver.FindElements(By.ClassName("LC20lb"))
17          foreach(var title in titles)
18           {
19             Console.WriteLine(title.Text)
20           }
21        }
22     }
23 }

When we run this code, we get the same results.

Web Scraping with Selenium C# (Part-II)

The ReverbNation page with its dynamic content area highlighted

This purple area is dynamic content and that means that if I click on one of these menu items, the purple area will be changed to some other element but the navbar will remain the same.

So, we are going to scrape these collections. So, let’s get started.

The ReverbNation collections page to be scraped

First, we will scrape content names. Content name is stored under class card__contents

Chrome DevTools showing the content names stored under the card__contents class

There are 19 elements with class card__contents. The first step is to grab all those elements using the FindElements function and create a list.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11          driver = new ChromeDriver();
12          driver.Navigate().GoToUrl("https://www.reverbnation.com")
13          var collections = driver.FindElements(By.ClassName("card__contents"));
14        }
15     }
16 }

Now, we will loop through that list to get the text out of those elements.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11          driver = new ChromeDriver();
12          driver.Navigate().GoToUrl("https://www.reverbnation.com")
13          var collections = driver.FindElements(By.ClassName("card__contents"));
14          foreach(var collection in collections)
15          Console.WriteLine(collection.Text) 
16        }
17     }
18 }

We are going to open the home page and then we will click on the Discover button at the top of the page before finding any card contents.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11          driver = new ChromeDriver();
12          driver.Navigate().GoToUrl("https://www.reverbnation.com") var discoverButton = driver.FindElement(By.Id("menu-item-discover"));
13          discoverButton.Click(); var collections = driver.FindElements(By.ClassName("card__contents"));
14          foreach(var collection in collections)
15          Console.WriteLine(collection.Text)
16        }
17     } 
18 }

Now, actually, we have a C# web scraper right here but since it’s dynamic content we will not really get any elements or collection names. So, I am going to just run it and show you.

Console showing no output because the dynamic content had not loaded yet

You can see nothing gets printed out.

Now, we will make a new method. If you were making a real web scraper you would probably want to have another class, and then instead of a static class or static method like I did, you would have a method on the web scraper class.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11          driver = new ChromeDriver();
12          driver.Navigate().GoToUrl("https://www.reverbnation.com")var discoverButton = driver.FindElement(By.Id("menu-item-discover"));
13          discoverButton.Click();var collections = driver.FindElements(By.ClassName("card__contents"));
14          foreach(var collection in collections)
15          Console.WriteLine(collection.Text)
16        }
17          static IReadOnlyCollection<IWebElement> FindElements(By by)
18            {
19              return driver.FindElements(by);
20            }        
21     }
22 }

Now, if I run this function, it will work the same as the above code (collections variable).

What I wanted to do is basically to run a loop and then for each iteration, it will try and find the elements by using this and if the collections contain elements then it will return them otherwise it will try again and again.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11          driver = new ChromeDriver();
12          driver.Navigate().GoToUrl("https://www.reverbnation.com")var discoverButton = driver.FindElement(By.Id("menu-item-discover"));
13          discoverButton.Click();var collections = FindElements(By.ClassName("card__contents"));
14          foreach(var collection in collections)
15          Console.WriteLine(collection.Text)
16        }
17          static IReadOnlyCollection<IWebElement> FindElements(By by)
18            {
19              while(true)
20              {
21                var elements = driver.FindElements(by);
22                if(elements.Count > 0)
23                return elements;
24                Thread.Sleep(10); 
25               }
26             }
27     }        
28 }

You can see we are only returning when it contains elements otherwise it will not return anything. We have also created a thread that goes to sleep after 10 milliseconds.

Now, we can test our script once again.

Console output showing the scraped content names after adding a wait

So, it works and it also prints out other stuff but that doesn’t matter. Now, the problem is we have a while true and while that might be okay in some cases in other cases it might not be.

Take for example you have a proxy, it runs, and all of a sudden it doesn’t work anymore. So, it’s just going to look in this while loop forever, or maybe the document doesn’t even contain these elements.

We have to put a limit on this to avoid problems. Hence, I am just going to use a stopwatch.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11          driver = new ChromeDriver();
12          driver.Navigate().GoToUrl("https://www.reverbnation.com")var discoverButton = driver.FindElement(By.Id("menu-item-discover"));
13          discoverButton.Click();var collections = FindElements(By.ClassName("card__contents"));
14          foreach(var collection in collections)
15          Console.WriteLine(collection.Text)
16        }
17          static IReadOnlyCollection<IWebElement> FindElements(By by)
18            {
19              Stopwatch w = Stopwatch.StartNew(); 
20              while(w.ElapsedMilliseconds < 10 * 1000)
21                {
22                  var elements = driver.FindElements(by);
23                  if(elements.Count > 0)
24                  return elements;
25                  Thread.Sleep(10); 
26                }
27                  return null;
28            }
29     }
30 }

So, if the elapsed time is more significant than 10 seconds our scraper will return null. Let’s test it.

Console output testing the scraper's wait-and-retry logic on a slow connection

Now, what if we have a very slow internet connection? For that, we will use ChomeNetworkConditions.

Slow Internet Connection

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11          ChromeNetworkConditions conditions = new ChromeNetworkConditions();
12          conditions.DownloadThroughput = 25 * 1000;
13          conditions.UploadThroughput = 10 * 1000;
14          conditions.Latency = TimeSpan.From.Milliseconds(1); driver = new ChromeDriver();
15          driver.Navigate().GoToUrl("https://www.reverbnation.com")var discoverButton = driver.FindElement(By.Id("menu-item-discover"));
16          discoverButton.Click();var collections = FindElements(By.ClassName("card__contents"));
17          foreach(var collection in collections)
18          Console.WriteLine(collection.Text)
19         }
20           static IReadOnlyCollection<IWebElement> FindElements(By by)
21             {
22               Stopwatch w = Stopwatch.StartNew();
23               while(w.ElapsedMilliseconds < 10 * 1000)
24                 {
25                   var elements = driver.FindElements(by);
26                   if(elements.Count > 0)
27                   return elements;
28                   Thread.Sleep(10);
29                 }
30                   return null;
31             }
32     }
33 }

Since we have saved the chrome driver as a web driver so I’m just going to make the driver as ChromeDriver.

1using System;
2using System.Collections.Generic;
3using System.Linq;
4using System.Text;
5using System.Threading.Tasks;namespace webscraping
6 {
7     class Program
8     {
9        static void Main(string[] args)
10        {
11          ChromeNetworkConditions conditions = new ChromeNetworkConditions();
12          conditions.DownloadThroughput = 25 * 1000;
13          conditions.UploadThroughput = 10 * 1000;
14          conditions.Latency = TimeSpan.From.Milliseconds(1);driver = new ChromeDriver();
15          (driver as ChromeDriver).NetworkConditions = conditions; driver.Navigate().GoToUrl("https://www.reverbnation.com")var discoverButton = 
16          driver.FindElement(By.Id("menu-item-discover"));
17          discoverButton.Click();var collections = FindElements(By.ClassName("card__contents"));
18          foreach(var collection in collections)
19          Console.WriteLine(collection.Text)
20          Console.WriteLine("done");
21       }
22         static IReadOnlyCollection<IWebElement> FindElements(By by)
23           {
24             Stopwatch w = Stopwatch.StartNew();
25             while(w.ElapsedMilliseconds < 10 * 1000)
26               {
27                 var elements = driver.FindElements(by);
28                 if(elements.Count > 0)
29                 return elements;
30                 Thread.Sleep(10);
31               }
32                 return new ReadOnlyCollection<IWebElement>(new List<IWebElement>);
33            }
34     }
35 }

So, now it should simulate a slow connection. This is a very important part of creating a web scraper where we learned about handling slow connections as well.

Since you often want to have many proxies running and maybe some of the proxies could be slow.

Here are Some Key Takeaways:

The guide introduces how to perform web scraping using C#, a popular language for .NET developers.
It explains how to send HTTP requests from C# and retrieve website content programmatically.
You learn how to parse HTML using libraries like HtmlAgilityPack to extract structured data from web pages.
The tutorial shows how to loop through elements, select specific data fields, and collect them into usable formats.
It also highlights best practices like handling errors, respecting website policies, and managing request timing.

Conclusion

In this tutorial, we created a very simple web scraper using selenium and C#. First, we created a scraper where we make a Google search and then scrape those results. In the second section, we focused on dynamic web scraping and slow connection proxies. Now, you are trained to create commercial scrapers for your new business,

Although C# web scraper can get most of your work done, some websites that have anti-scraping measures could block your IP after some time when you deploy your scrapers.

To scale the process of web scraping, and managing proxy/IP rotation I would recommend you use an API for web scraping API.

Feel free to message us to inquire about anything you need clarification on web scraping with C#.

Additional Resources

Here are a few additional resources that you may find helpful during your web scraping journey:

Frequently Asked Questions

Is C# good for web scraping?

Yes, C# and Python are good for web scrapping. You need to select the right programming language as per your expertise level. You can also learn web scraping with python easily.

Which language is best for web scraping? Although Python is the best scraping language to do web scraping. C# is also used in many instances.