How to build a B2B prospect list with web scraping: define ICP, translate it into observable data points, pick sources (LinkedIn first), and extract via API — without gluing 14 tools together.
Building a B2B prospect list with web scraping is really two problems. One is knowing who you want to sell to — sharp enough that the list is useful, not just big. The other is getting that list out of the places your prospects actually live, at a pace that doesn't break every time a site changes.
This guide walks through both: define an ICP, translate it into data points you can actually observe, pick the sources those data points live on, and extract them cleanly. LinkedIn is the anchor source for most B2B motions — we'll talk about how to get data out of it via API rather than a brittle scraper.
The most common prospecting mistake is starting with "let's scrape X site" before defining who you're selling to. The scraper is the easy part. The list is only useful if the people on it are the right people.
An Ideal Customer Profile (ICP) is a description of the company most likely to need your product. Specific beats comprehensive: industry, company size, geography, tech stack, team composition, stage. "Mid-market SaaS with 50–200 employees in North America that has a dedicated RevOps team" is actionable. "Tech companies" is not.
Tip: the ICP isn't permanent. It tightens as you learn which segments close and which don't. Treat it as a living filter, not a one-time exercise.
An ICP describes the company. Data points are the signals that let you recognize one without asking.
Take a company selling cloud-based phone services. Their ICP is "companies that use the phone heavily." There's no database of "companies that use the phone a lot," so you work backwards into signals that correlate:
None of these signals is the thing itself, but together they're a strong proxy. The exercise: for every field in your ICP, list two or three observable data points that stand in for it. That's the feature set your prospect list needs to capture.
Tip: pick data points you can actually collect at scale. "Sales team growth rate" is only useful if you can observe it for every company on your list — not just your best guesses.
Once you know what to observe, the source selection mostly writes itself. Different sources cover different parts of the B2B graph.
For anything that touches companies, people, roles, tenure, team composition, or hiring velocity, LinkedIn is the richest public source. Company pages give you headcount and industry; Sales Navigator filters let you slice by seniority, function, tenure, recent posting activity, and employee growth. Recruiter Lite adds deeper talent graph queries.
What LinkedIn gives you that nothing else does: which companies are hiring which roles right now, which teams are growing, who changed jobs in the last 90 days, and which decision-makers are active in the feed. This is the raw material for intent-shaped prospecting.
To pull LinkedIn data programmatically, you have three options: a browser extension (fragile, tied to one session), a bespoke scraper (maintenance tax forever), or an API. Edges is the API route — one key, structured JSON, covering LinkedIn core, Sales Navigator, and Recruiter Lite. See the "Extract" section below.
The company's own site is often the cleanest source for a handful of fields: contact information displayed publicly, support lines, product pages, pricing tiers, and — if they publish them — job listings. Scraping websites is mechanically simple, but every site is different, so you're writing per-site parsers or using a general-purpose HTML scraping library.
For local businesses, tradespeople, and service providers, LinkedIn coverage is thin. Google Maps / Google Business profiles, Yellow Pages equivalents, and public business registers (Companies House in the UK, Sirene in France, OpenCorporates elsewhere) are better fits. These are not Edges' territory — use a dedicated scraper or public dataset.
Vertical directories can be gold: WeddingWire for wedding vendors, Autotrader for dealerships, G2 / Capterra for software buyers. Coverage varies wildly, so spot-check a handful of records before committing to a full pull.
This is where a lot of prospect-list projects go sideways. You end up with one script for LinkedIn, another for websites, a third for directories, a contact resolver bolted on, and a spreadsheet nobody trusts.
The reframe: separate the sources by shape, and use the right tool for each shape.
For LinkedIn — the biggest source for most B2B lists — Edges is a LinkedIn automation API. You call four surfaces:
What Edges does not do: email finding, phone lookup, website scraping, directory scraping, waterfall enrichment across providers, or workflow orchestration. It's the LinkedIn layer — the pipe, not the dashboard. Pair it with whatever contact resolver and orchestrator you already use.
For websites and directories, a general-purpose scraping tool or a custom script is the right shape. For contact resolution (business emails, direct-dials), use a dedicated provider — there are good ones, and they're not what Edges is.
For orchestration, your own code, a job runner, or an internal orchestrator ties the pieces together and writes rows to your warehouse, CRM, or prospect list.
A list that isn't routed anywhere is a spreadsheet, not a pipeline. Three things make the difference:
That's what separates "we exported some LinkedIn data" from "we have a working prospect database." The scraping part is table stakes; the routing, refresh, and scoring are where the work is.
If you're building the LinkedIn layer of this motion and want an API instead of a scraper, Edges is the shape. One key, documented actions, consistent JSON across LinkedIn, Sales Navigator, and Recruiter Lite.
Book a demo to walk through the API on your ICP and your specific data points.