How to Build a B2B Prospect List with Web Scraping

Building a B2B prospect list with web scraping is really two problems. One is knowing who you want to sell to — sharp enough that the list is useful, not just big. The other is getting that list out of the places your prospects actually live, at a pace that doesn't break every time a site changes.

This guide walks through both: define an ICP, translate it into data points you can actually observe, pick the sources those data points live on, and extract them cleanly. LinkedIn is the anchor source for most B2B motions — we'll talk about how to get data out of it via API rather than a brittle scraper.

Step 1: Start with ICP, not a tool list

The most common prospecting mistake is starting with "let's scrape X site" before defining who you're selling to. The scraper is the easy part. The list is only useful if the people on it are the right people.

An Ideal Customer Profile (ICP) is a description of the company most likely to need your product. Specific beats comprehensive: industry, company size, geography, tech stack, team composition, stage. "Mid-market SaaS with 50–200 employees in North America that has a dedicated RevOps team" is actionable. "Tech companies" is not.

Tip: the ICP isn't permanent. It tightens as you learn which segments close and which don't. Treat it as a living filter, not a one-time exercise.

Step 2: Translate the ICP into observable data points

An ICP describes the company. Data points are the signals that let you recognize one without asking.

Take a company selling cloud-based phone services. Their ICP is "companies that use the phone heavily." There's no database of "companies that use the phone a lot," so you work backwards into signals that correlate:

Presence of a support or sales hotline on the website
Size and growth rate of the sales or customer support team
Recent sales hiring — a team that just doubled needs phones for every new rep
Visible job posts for SDRs, AEs, or support roles

None of these signals is the thing itself, but together they're a strong proxy. The exercise: for every field in your ICP, list two or three observable data points that stand in for it. That's the feature set your prospect list needs to capture.

Tip: pick data points you can actually collect at scale. "Sales team growth rate" is only useful if you can observe it for every company on your list — not just your best guesses.

Step 3: Pick the sources that match your target

Once you know what to observe, the source selection mostly writes itself. Different sources cover different parts of the B2B graph.

LinkedIn — the primary source for most B2B motions

For anything that touches companies, people, roles, tenure, team composition, or hiring velocity, LinkedIn is the richest public source. Company pages give you headcount and industry; Sales Navigator filters let you slice by seniority, function, tenure, recent posting activity, and employee growth. Recruiter Lite adds deeper talent graph queries.

What LinkedIn gives you that nothing else does: which companies are hiring which roles right now, which teams are growing, who changed jobs in the last 90 days, and which decision-makers are active in the feed. This is the raw material for intent-shaped prospecting.

To pull LinkedIn data programmatically, you have three options: a browser extension (fragile, tied to one session), a bespoke scraper (maintenance tax forever), or an API. Edges is the API route — one key, structured JSON, covering LinkedIn core, Sales Navigator, and Recruiter Lite. See the "Extract" section below.

The company website

The company's own site is often the cleanest source for a handful of fields: contact information displayed publicly, support lines, product pages, pricing tiers, and — if they publish them — job listings. Scraping websites is mechanically simple, but every site is different, so you're writing per-site parsers or using a general-purpose HTML scraping library.

Google Maps and public directories

For local businesses, tradespeople, and service providers, LinkedIn coverage is thin. Google Maps / Google Business profiles, Yellow Pages equivalents, and public business registers (Companies House in the UK, Sirene in France, OpenCorporates elsewhere) are better fits. These are not Edges' territory — use a dedicated scraper or public dataset.

Industry-specific directories

Vertical directories can be gold: WeddingWire for wedding vendors, Autotrader for dealerships, G2 / Capterra for software buyers. Coverage varies wildly, so spot-check a handful of records before committing to a full pull.

Step 4: Extract the data without gluing 14 tools together

This is where a lot of prospect-list projects go sideways. You end up with one script for LinkedIn, another for websites, a third for directories, a contact resolver bolted on, and a spreadsheet nobody trusts.

The reframe: separate the sources by shape, and use the right tool for each shape.

For LinkedIn — the biggest source for most B2B lists — Edges is a LinkedIn automation API. You call four surfaces:

Search API — run LinkedIn, Sales Navigator, and Recruiter Lite queries and get results as JSON. This is how you materialize "companies that match my ICP" into a concrete list.
Profile & Company Data (Enrichment) API — for each company or person on the list, pull LinkedIn-native fields: experience, education, headcount, industry, descriptions, employee distributions. Structured JSON, not a DOM scrape.
Signals & Intent API — profile viewers, company viewers, job changes, activity, and Sales Navigator metrics. This is the layer that turns a static list into a live feed.
Messaging & Outreach API — once you've picked the right subset, connection requests, messages, and InMail go through the same API.

What Edges does not do: email finding, phone lookup, website scraping, directory scraping, waterfall enrichment across providers, or workflow orchestration. It's the LinkedIn layer — the pipe, not the dashboard. Pair it with whatever contact resolver and orchestrator you already use.

For websites and directories, a general-purpose scraping tool or a custom script is the right shape. For contact resolution (business emails, direct-dials), use a dedicated provider — there are good ones, and they're not what Edges is.

For orchestration, your own code, a job runner, or an internal orchestrator ties the pieces together and writes rows to your warehouse, CRM, or prospect list.

Step 5: Turning the extracted data into a working prospect list

A list that isn't routed anywhere is a spreadsheet, not a pipeline. Three things make the difference:

Write to a durable store. A warehouse table or CRM object beats a CSV. You want to re-query it, refresh it, and feed it to sequences.
Refresh on a cadence. Job moves, headcount changes, and new hires are the things that make a list interesting. A list scraped once and never updated decays within weeks.
Score and segment. Not every row is equal. Rank by signal strength — recency of a job change, size of the team growth, match quality against ICP — so reps work the top of the list first.

That's what separates "we exported some LinkedIn data" from "we have a working prospect database." The scraping part is table stakes; the routing, refresh, and scoring are where the work is.

If you're building the LinkedIn layer of this motion and want an API instead of a scraper, Edges is the shape. One key, documented actions, consistent JSON across LinkedIn, Sales Navigator, and Recruiter Lite.

Book a demo to walk through the API on your ICP and your specific data points.