
Diffbot
The easiest way to integrate external web data at scale.
Related Content
Diffbot operates as a knowledge-as-a-service provider, utilizing artificial intelligence to create a comprehensive and autonomous map of the web. The company was founded in 2008 by Mike Tung, who drew from his background in AI research at Stanford and as a software engineer to pioneer a method for structuring web data based on how humans visually interpret pages, rather than relying on code. This approach uses computer vision and machine learning algorithms to convert the unstructured public web into a structured, queryable database.
The core of Diffbot's offering is its Knowledge Graph, a vast, self-updating database of interlinked entities such as people, organizations, products, and articles. This Knowledge Graph is constructed by continuously crawling the web, automatically extracting information from pages, and using natural language processing to understand and link facts. As of 2019, it contained over two billion entities and ten trillion facts. The company's business model is subscription-based, offering tiered plans (Startup, Plus, and Enterprise) that provide access to its various products through a credit system. Clients range from startups to large enterprises like Microsoft, DuckDuckGo, and Adobe, who leverage Diffbot for applications such as market intelligence, data mining, and content aggregation.
Diffbot's product suite is delivered via APIs and includes several key tools. The automatic Extraction APIs can identify and pull structured data from any URL, categorizing pages into types like articles, products, or discussions. Crawlbot allows users to perform custom crawls of specific websites, creating structured databases from the content. The Natural Language API enables users to build their own knowledge graphs from unstructured text. Additionally, the Enhance product enriches existing datasets with information from the Knowledge Graph. These services are accessible programmatically and through integrations with platforms like Excel, Google Sheets, and Tableau. The company has secured $12.5 million in funding over four rounds, with a notable $10 million Series A in 2016 led by Tencent and Felicis Ventures.
Keywords: knowledge as a service, web data extraction, automatic web scraping, Knowledge Graph, AI-powered data extraction, structured data API, natural language processing, computer vision for data, market intelligence, firmographic data, data enrichment, web crawling, entity recognition, sentiment analysis, lead generation, competitive analysis, data mining, Diffbot Query Language (DQL), knowledge base construction, enterprise search