GithubHelp home page GithubHelp logo

awyugan / crunchbase-scraper Goto Github PK

View Code? Open in Web Editor NEW

This project forked from codercurious/crunchbase-scraper

0.0 0.0 0.0 21 KB

Scrape crunchbase companies, people, investors, acquisitions data including website urls, social urls, emails, phone numbers, employee count, funding information etc.

Home Page: https://apify.com/curious_coder/crunchbase-scraper?fpr=ve081&fp_sid=github_crunchbase-scraper

crunchbase-scraper's Introduction

Crunchbase Scraper

Interested in using this scraper? Get it here: Crunchbase Scraper

Demo video

Crunchbase is a platform where you can discover innovative companies, connect with the people behind them, and uncover new opportunities. It has become a prime source of business information for millions of users around the world.

Features

  • Scrape crunchbase company pages
  • Scrape crunchbase profile URLs
  • Find organizations by website in bulk
  • Scrape crunchbase company search results
  • Scrape crunchbase funding search results
  • Scrape crunchbase contacts search results
  • Scrape crunchbase investors search results
  • Scrape crunchbase aquisitions search results
  • Scrape crunchbase events search results
  • Scrape crunchbase schools search results
  • Scrape crunchbase hubs search results
  • Scrape crunchbase people search results

You can extract following data from crunchbase using this scraper:

Company data fields

๐Ÿ” Identifier ๐Ÿ‘ฅ Num Employees Enum ๐Ÿ“ฆ Categories
๐ŸŒ Location Identifiers ๐Ÿ“„ Short Description ๐Ÿ“ˆ Rank Org Company
๐ŸŒ Website ๐Ÿฆ Twitter ๐Ÿ“˜ Facebook
๐Ÿ”— LinkedIn ๐Ÿ“ง Contact Email โ˜Ž๏ธ Phone Number
๐Ÿ’ฐ Stock Symbol ๐Ÿ“ฐ Num Articles ๐Ÿท๏ธ Hub Tags
๐Ÿ“ Description ๐Ÿ“œ Job Posting Link Source ๐ŸŒ† Location Group Identifiers
๐ŸŒˆ Diversity Spotlights ๐Ÿ’ต Revenue Range ๐Ÿญ Operating Status
๐Ÿ“… Exited On ๐Ÿ“… Founded On ๐Ÿ“… Closed On
๐Ÿข Company Type ๐Ÿ’ผ Investor Type ๐Ÿš€ Investor Stage
๐Ÿ“Š Num Portfolio Organizations ๐Ÿ’ณ Num Investments/Funding Rounds ๐Ÿ’ผ Num Lead Investments
๐Ÿšช Num Diversity Spotlight Investments ๐Ÿšช Num Exits ๐Ÿšช Num Exits IPO
๐ŸŽ“ Program Type ๐Ÿ“… Program Application Deadline โณ Program Duration
๐Ÿซ School Type ๐Ÿ“š School Program ๐Ÿ‘ฉโ€๐ŸŽ“ Num Enrollments
๐Ÿ‘ฉโ€๐ŸŽ“ Num Founder Alumni ๐Ÿ“š School Method ๐ŸŽ“ Num Alumni
๐Ÿ—‚ Category Groups ๐Ÿ‘ฅ Num Founders ๐Ÿ‘ฅ Founder Identifiers
๐Ÿ“ˆ Num Funding Rounds ๐Ÿ’ผ Funding Stage ๐Ÿ“… Last Funding At
๐Ÿ’ฐ Last Funding Total ๐Ÿ’ผ Last Funding Type ๐Ÿ’ฐ Last Equity Funding Total
๐Ÿ’ผ Last Equity Funding Type ๐Ÿ’ฐ Equity Funding Total ๐Ÿ’ฐ Funding Total
๐Ÿ‘ฅ Num Lead Investors ๐Ÿ‘ฅ Num Investors ๐Ÿ‘ฅ Investor Identifiers
๐Ÿ’ผ Num Acquisitions ๐Ÿ Acquisition Status ๐Ÿ” Acquisition Identifier
๐Ÿ’ฐ Acquisition Price ๐Ÿ“… Acquisition Announced On ๐Ÿ” Acquirer Identifier
๐Ÿ Acquisition Type ๐Ÿ“œ Acquisition Terms ๐Ÿ’ผ IPO Status
๐Ÿ“… Went Public On ๐Ÿ“… Delisted On ๐Ÿ’ฐ IPO Valuation
๐Ÿ’ฐ IPO Amount Raised ๐Ÿ“ˆ Stock Exchange Symbol ๐Ÿ“… Last Layoff Date
๐Ÿ“… Last Key Employee Change Date ๐Ÿ“… Num Event Appearances ๐Ÿ“ˆ Rank Org
๐Ÿ“ˆ Rank Org School ๐Ÿ“‰ Rank Delta D7 ๐Ÿ“‰ Rank Delta D30
๐Ÿข Num Org Similarities ๐Ÿ‘” Contact Job Departments ๐Ÿ‘ฅ Num Contacts
๐Ÿ‘ฅ Num Private Contacts ๐Ÿ“ˆ SEMrush Visits Latest Month โณ SEMrush Visits MoM %
โณ SEMrush Visits Latest 6 Months Avg โณ SEMrush Visit Duration โณ SEMrush Visit Pageviews
โณ SEMrush Visit Duration MoM % โณ SEMrush Visit Pageview MoM % โณ SEMrush Bounce Rate
โณ SEMrush Bounce Rate MoM % โ† SEMrush Global Rank โ† SEMrush Global Rank MoM
โณ SEMrush Global Rank MoM % ๐Ÿ’ป Builtwith Num Technologies Used ๐Ÿ“ฑ Apptopia Total Apps
๐Ÿ“ฑ Apptopia Total Downloads ๐Ÿข Siftery Num Products โš™๏ธ IPqwery Num Patent Granted
โš™๏ธ IPqwery Num Trademark Registered ๐Ÿ“Š IPqwery Popular Patent Category ๐Ÿ“Š IPqwery Popular Trademark Class
๐Ÿ’ฐ Aberdeen Site IT Spend ๐Ÿ’ฐ PrivCo Valuation Range ๐Ÿ“… PrivCo Valuation Timestamp
๐Ÿ“ Num Private Notes ๐Ÿท๏ธ Private Tags

Person data fields

๐Ÿ” Identifier ๐Ÿ‘” Primary Job Title ๐Ÿ‘ฅ Primary Organization
๐ŸŒ Location Identifiers ๐Ÿ“ˆ Rank Person ๐Ÿ“˜ Facebook
๐Ÿ”— LinkedIn ๐Ÿฆ Twitter โšง Gender
๐Ÿ‘จโ€๐Ÿฆฑ First Name ๐Ÿ‘จโ€๐Ÿฆณ Last Name ๐Ÿ“ Description
๐ŸŒ† Location Group Identifiers ๐Ÿ“ฐ Num Articles ๐Ÿ‘ฉโ€๐Ÿซ Attended Schools
๐Ÿข Num Founded Organizations ๐Ÿข Current Organizations ๐Ÿญ Num Portfolio Organizations
๐Ÿญ Num Investments/Funding Rounds ๐Ÿ† Num Partner Investments ๐Ÿฅ‡ Num Lead Investments
๐Ÿšช Num Exits ๐Ÿšช Num Diversity Spotlight Investments ๐Ÿšช Num Exits IPO
๐Ÿ“… Num Event Appearances ๐Ÿ“‰ Rank Delta D7 ๐Ÿ“‰ Rank Delta D30
๐Ÿ“‰ Rank Delta D90

Crunchbase data API

The actor stores results in a dataset. You can export data in various formats such as CSV, JSON, XLS, etc. You can scrape and access data on demand using API. For more information, Go to Crunchbase scraper API integration page

Importance of Crunchbase Data

Data from Crunchbase is highly sought after. It can provide invaluable insights about startups, their funding rounds, key individuals involved, and much more. Therefore, scraping this data can equip businesses with information necessary for decision-making and strategy development.

Why Apify for Crunchbase Data Scraping?

Apify is a web scraping and automation platform. It allows you to extract data, automate workflows, and integrate with your existing software. It's flexible, easy to use, and scalable, making it a top choice for many businesses.

Crunchbase Data Scraper: The Apify Actor

Crunchbase Data Scraper is a specific Apify actor that focuses on retrieving data from Crunchbase.

How Does It Work?

This actor is programmed to navigate through the Crunchbase's complex website structure, find the relevant data, and extract it in a structured, usable format.

Features of the Crunchbase Data Scraper

The Crunchbase Data Scraper actor offers features such as being able to extract company profiles, key person profiles, funding rounds, acquisitions, and more. It provides a lot of flexibility, allowing you to specify what data you want.

Use Cases

Crunchbase Data Scraper is highly beneficial for market researchers, sales teams, data analysts, and more. It helps streamline various processes, from lead generation to industry analysis.

Sample output data for company search results

{
	"uuid": "e36f580e-6c0e-47de-accf-15de75f62cc9",
	"name": "Stability AI",
	"type": "organization",
	"imageUrl": "https://res.cloudinary.com/crunchbase-production/image/upload/c_lpad,h_25,w_25,f_auto,b_white,q_auto:eco,dpr_1/yngvetlwqatjdqwmxg9g",
	"link": "https://www.crunchbase.com/organization/stability-ai",
	"numberOfEmployees": [
		51,
		100
	],
	"website": {
		"value": "https://stability.ai"
	},
	"linkedin": {
		"value": "https://www.linkedin.com/company/stability-ai"
	},
	"short_description": "Stability AI is an artificial intelligence-driven visual art startup that designs and implements open AI tools.",
	"categories": [
		{
			"entity_def_id": "category",
			"permalink": "artificial-intelligence",
			"uuid": "c4d8caf3-5fe7-359b-f9f2-2d708378e4ee",
			"value": "Artificial Intelligence"
		},
		{
			"entity_def_id": "category",
			"permalink": "image-recognition",
			"uuid": "af9307c9-6413-72ae-aac7-4391df240dd2",
			"value": "Image Recognition"
		},
		{
			"entity_def_id": "category",
			"permalink": "information-technology-dbca",
			"uuid": "dbca89fa-f083-5438-b4ad-d3fdeceb78e7",
			"value": "Information Technology"
		},
		{
			"entity_def_id": "category",
			"permalink": "software",
			"uuid": "c08b5441-a05b-9777-b7a6-012728caddd9",
			"value": "Software"
		}
	],
	"location_identifiers": [
		{
			"permalink": "london-england",
			"uuid": "aad17950-576b-8c44-8fd4-f44dbeb59220",
			"location_type": "city",
			"entity_def_id": "location",
			"value": "London"
		},
		{
			"permalink": "england-united-kingdom",
			"uuid": "79eb923b-9e93-e0db-2fe0-75f0c430c2cb",
			"location_type": "region",
			"entity_def_id": "location",
			"value": "England"
		},
		{
			"permalink": "united-kingdom",
			"uuid": "a30e342c-1742-6b1c-66e9-461de680e54b",
			"location_type": "country",
			"entity_def_id": "location",
			"value": "United Kingdom"
		},
		{
			"permalink": "europe",
			"uuid": "6106f5dc-823e-5da8-40d7-51612c0b2c4e",
			"location_type": "continent",
			"entity_def_id": "location",
			"value": "Europe"
		}
	],
	"twitter": {
		"value": "https://www.twitter.com/stabilityai"
	},
	"contact_email": "[email protected]",
	"rank_org_company": 40
}

Documentation

This JSON data represents companies as a result of a search performed in the Crunchbase database.

Here are the descriptions for each field in the JSON data:

  • uuid: The unique identifier for the company in the database. Each uuid is a string following the standard UUID format.
  • name: The official name of the company.
  • type: This field indicates the type of the entry. For this particular entry, the type is 'organization'.
  • imageUrl: URL of the company's logo or relevant image.
  • link: The direct link to the company's profile on Crunchbase.
  • numberOfEmployees: An array indicating the range of the company's employee count.
  • website: An object that contains the value field which provides the company's official website URL.
  • linkedin: An object that contains the value field which provides the LinkedIn profile URL of the company.
  • short_description: A brief description of the company and its primary functions or industry.
  • categories: An array of category objects that the company falls under. Each object in the array has the following fields:
    • entity_def_id: The identifier of the category entity.
    • permalink: A URL-friendly version of the category name.
    • uuid: The unique identifier for the category.
    • value: The actual name of the category.
  • location_identifiers: An array of location objects that correspond to the company's location. Each object in the array has the following fields:
    • permalink: A URL-friendly version of the location name.
    • uuid: The unique identifier for the location.
    • location_type: The type of the location. It could be 'city', 'region', 'country', or 'continent'.
    • entity_def_id: The identifier of the location entity.
    • value: The actual name of the location.
  • twitter: An object that contains the value field which provides the Twitter profile URL of the company.
  • contact_email: The contact email address for the company.
  • rank_org_company: The company's rank among other companies in the Crunchbase database.

The structure of this JSON data makes it easy to parse and use in various applications, such as website scrapers, data analysis tools, and so on. Remember, however, to respect the data usage terms and conditions of Crunchbase when using their data.

crunchbase-scraper's People

Contributors

codercurious avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.