MCPHub LabRegistrydatahub-project/datahub
datahub-project

datahub project/datahub

Built by datahub-project โ€ข 11,729 stars

What is datahub project/datahub?

The Metadata Platform for your Data and AI Stack

How to use datahub project/datahub?

1. Install a compatible MCP client (like Claude Desktop). 2. Open your configuration settings. 3. Add datahub project/datahub using the following command: npx @modelcontextprotocol/datahub-project-datahub 4. Restart the client and verify the new tools are active.
๐Ÿ›ก๏ธ Scoped (Restricted)
npx @modelcontextprotocol/datahub-project-datahub --scope restricted
๐Ÿ”“ Unrestricted Access
npx @modelcontextprotocol/datahub-project-datahub

Key Features

Native MCP Protocol Support
Real-time Tool Activation & Execution
Verified High-performance Implementation
Secure Resource & Context Handling

Optimized Use Cases

Extending AI models with custom local capabilities
Automating system workflows via natural language
Connecting external data sources to LLM context windows

datahub project/datahub FAQ

Q

Is datahub project/datahub safe?

Yes, datahub project/datahub follows the standardized Model Context Protocol security patterns and only executes tools with explicit user-granted permissions.

Q

Is datahub project/datahub up to date?

datahub project/datahub is currently active in the registry with 11,729 stars on GitHub, indicating its reliability and community support.

Q

Are there any limits for datahub project/datahub?

Usage limits depend on the specific implementation of the MCP server and your system resources. Refer to the official documentation below for technical details.

Official Documentation

View on GitHub
<!--HOSTED_DOCS_ONLY import useBaseUrl from '@docusaurus/useBaseUrl'; export const Logo = (props) => { return ( <div style={{ display: "flex", justifyContent: "center", padding: "20px", height: "190px" }}> <img alt="DataHub Logo" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/datahub-logo-color-mark.svg" {...props} /> </div> ); }; <Logo /> <!-- HOSTED_DOCS_ONLY--> <p align="center"> <a href="https://datahub.com"> <img alt="DataHub" src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/datahub-logo-color-mark.svg" height="150" /> </a> </p> <!-- -->

The #1 Open Source AI Data Catalog

Enterprise-grade metadata platform enabling discovery, governance, and observability across your entire data ecosystem

<p align="center"> <a href="https://github.com/datahub-project/datahub/actions/workflows/build-and-test.yml"> <img src="https://github.com/datahub-project/datahub/actions/workflows/build-and-test.yml/badge.svg" alt="Build Status" /> </a> <a href="https://pypi.org/project/acryl-datahub/"> <img src="https://img.shields.io/pypi/v/acryl-datahub.svg" alt="PyPI Version" /> </a> <a href="https://pypi.org/project/acryl-datahub/"> <img src="https://img.shields.io/pypi/dm/acryl-datahub.svg" alt="PyPI Downloads" /> </a> <a href="https://hub.docker.com/r/linkedin/datahub-gms"> <img src="https://img.shields.io/docker/pulls/linkedin/datahub-gms.svg" alt="Docker Pulls" /> </a> <br /> <a href="https://datahub.com/slack?utm_source=github&utm_medium=readme&utm_campaign=github_readme"> <img src="https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social" alt="Join Slack" /> </a> <a href="https://www.youtube.com/channel/UC3qFQC5IiwR5fvWEqi_tJ5w"> <img src="https://img.shields.io/youtube/channel/subscribers/UC3qFQC5IiwR5fvWEqi_tJ5w?style=social&logo=youtube&label=Subscribe" alt="YouTube Subscribers" /> </a> <a href="https://datahub.com/blog/"> <img src="https://img.shields.io/badge/blog-read-red.svg?style=social&logo=medium" alt="DataHub Blog" /> </a> <a href="https://github.com/datahub-project/datahub/graphs/contributors"> <img src="https://img.shields.io/github/contributors/datahub-project/datahub.svg" alt="Contributors" /> </a> <a href="https://github.com/datahub-project/datahub/stargazers"> <img src="https://img.shields.io/github/stars/datahub-project/datahub.svg?style=social&label=Star" alt="GitHub Stars" /> </a> <a href="https://github.com/datahub-project/datahub/blob/master/LICENSE"> <img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License" /> </a> </p> <p align="center"> <a href="https://datahub.com/free-trial/"><b>Free Cloud Trial</b></a> โ€ข <a href="https://docs.datahub.com/docs/quickstart"><b>Quick Start</b></a> โ€ข <a href="https://demo.datahub.com"><b>Live Demo</b></a> โ€ข <a href="https://docs.datahub.com"><b>Documentation</b></a> โ€ข <a href="https://datahub.com/slack"><b>Slack Community</b></a> โ€ข <a href="https://www.youtube.com/@datahubproject"><b>YouTube</b></a> </p> <p align="center"> <i>Built with โค๏ธ by <a href="https://datahub.com">DataHub</a> and <a href="https://engineering.linkedin.com">LinkedIn</a></i> </p>
<p align="center"> <a href="https://demo.datahub.com"> <img width="90%" src="https://raw.githubusercontent.com/datahub-project/static-assets/refs/heads/main/imgs/demos/datahub-tour.gif" alt="DataHub Product Tour" /> </a> </p> <p align="center"> <i>Search, discover, and understand your data with DataHub's unified metadata platform</i> </p>

๐Ÿค– NEW: Connect AI Agents to DataHub via Model Context Protocol (MCP)

<p align="center"> <a href="https://youtu.be/aVWJsw7RJ8c?t=568"> <img width="600" src="https://raw.githubusercontent.com/datahub-project/static-assets/refs/heads/main/imgs/demos/mcp-demo.gif" alt="DataHub MCP Demo - Query metadata with AI agents" /> </a> <br/> <i>โ–ถ๏ธ Click to watch full demo on YouTube</i> </p>

Connect your AI coding assistants (Cursor, Claude Desktop, Cline) directly to DataHub. Query metadata with natural language: "What datasets contain PII?" or "Show me lineage for this table"

Quick setup:

npx -y @acryldata/mcp-server-datahub init

Learn more โ†’


What is DataHub?

๐Ÿ” Finding the right DataHub? This is the open-source metadata platform at datahub.com (GitHub: datahub-project/datahub). It was previously hosted at datahubproject.io, which now redirects to datahub.com. This project is not related to datahub.io, which is a separate public dataset hosting service. See the FAQ below.

DataHub is the #1 open-source AI data catalog that enables discovery, governance, and observability across your entire data ecosystem. Originally built at LinkedIn, DataHub now powers data discovery at thousands of organizations worldwide, managing millions of data assets.

The Challenge: Modern data stacks are fragmented across dozens of toolsโ€”warehouses, lakes, BI platforms, ML systems, AI agents, orchestration engines. Finding the right data, understanding its lineage, and ensuring governance is like searching through a maze blindfolded.

The DataHub Solution: DataHub acts as the central nervous system for your data stackโ€”connecting all your tools through real-time streaming or batch ingestion to create a unified metadata graph. Unlike static catalogs, DataHub keeps your metadata fresh and actionableโ€”powering both human teams and AI agents.

DataHub for Humans and AI

Why DataHub?

  • ๐Ÿš€ Battle-Tested at Scale: Born at LinkedIn to handle hyperscale data, now proven at thousands of organizations worldwide managing millions of data assets
  • โšก Real-Time Streaming: Metadata updates in seconds, not hours or days
  • ๐Ÿค– AI-Ready: Native support for AI agents via MCP, LLM integrations, and context management
  • ๐Ÿ”Œ Pioneering Ingestion Architecture: Flexible push/pull framework (widely adopted by other catalogs) with 80+ production-grade connectors extracting deep metadataโ€”column lineage, usage stats, profiling, and quality metrics
  • ๐Ÿ‘จโ€๐Ÿ’ป Developer-First: Rich APIs (GraphQL, OpenAPI), Python + Java SDKs, CLI tools
  • ๐Ÿข Enterprise Ready: Battle-tested security, authentication, authorization, and audit trails
  • ๐ŸŒ Open Source: Apache 2.0 licensed, vendor-neutral, community-driven

๐Ÿง  The Context Foundation

Essential for modern data teams and reliable AI agents:


๐Ÿ“‘ Table of Contents


โ“ Frequently Asked Questions

<details> <summary><b>Is this the same project as datahub.io?</b></summary>

No. datahub.io is a completely separate project โ€” a public dataset hosting service with no affiliation to this project. DataHub (this project) is an open-source metadata platform for data discovery, governance, and observability, hosted at datahub.com and developed at github.com/datahub-project/datahub.

</details> <details> <summary><b>What happened to datahubproject.io?</b></summary>

DataHub was previously hosted at datahubproject.io. That domain now redirects to datahub.com. All documentation has moved to docs.datahub.com. If you find references to datahubproject.io in blog posts or tutorials, they refer to this same project โ€” just under its former domain.

</details> <details> <summary><b>Is DataHub related to LinkedIn's internal DataHub?</b></summary>

Yes. DataHub was originally built at LinkedIn to manage metadata at scale across their data ecosystem. LinkedIn open-sourced DataHub in 2020. It has since grown into an independent community project under the datahub-project GitHub organization, now hosted at datahub.com.

</details> <details> <summary><b>How do I install the DataHub metadata platform?</b></summary>
pip install acryl-datahub
datahub docker quickstart

See the Quick Start section below for full instructions. The PyPI package is acryl-datahub.

</details>

๐ŸŽจ See DataHub in Action

<table> <tr> <td width="50%"> <img src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/search/search-results-page.png" alt="Universal Search" width="100%"/> <p align="center"><b>๐Ÿ” Universal Search</b><br/>Find any data asset instantly across your entire stack</p> </td> <td width="50%"> <img src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/lineage/column-level-lineage-v3.png" alt="Column-Level Lineage" width="100%"/> <p align="center"><b>๐Ÿ“Š Column-Level Lineage</b><br/>Trace data flow from source to consumption</p> </td> </tr> <tr> <td width="50%"> <img src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-dataset-stats.png" alt="Rich Dataset Profiles" width="100%"/> <p align="center"><b>๐Ÿ“‹ Rich Dataset Profiles</b><br/>Schema, statistics, documentation, and ownership</p> </td> <td width="50%"> <img src="https://raw.githubusercontent.com/datahub-project/static-assets/main/imgs/feature-tags-terms-domains.png" alt="Governance Dashboard" width="100%"/> <p align="center"><b>๐Ÿ›๏ธ Governance Dashboard</b><br/>Manage policies, tags, and compliance</p> </td> </tr> </table>

โ–ถ๏ธ Watch DataHub in Action:


๐Ÿš€ Quick Start

Option 1: Try the Hosted Demo (Fastest)

No installation required. Explore a fully-loaded DataHub instance with sample data instantly:

๐ŸŒ Launch Live Demo: demo.datahub.com

Option 2: Run Locally with Python (Recommended)

Get DataHub running on your machine in under 2 minutes:

# Prerequisites: Docker Desktop with 8GB+ RAM allocated

# Upgrade pip and install DataHub CLI
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade acryl-datahub

# Launch DataHub locally via Docker
datahub docker quickstart

# Access DataHub at http://localhost:9002
# Default credentials: datahub / datahub

Note: You can also use uv or other Python package managers instead of pip.

What's included:

  • โœ… Full Stack: GMS backend, React UI, Elasticsearch, MySQL, and Kafka.
  • โœ… Sample Data: Pre-loaded datasets, lineage, and owners for exploration.
  • โœ… Ingestion Ready: Fully prepared to connect your own local or cloud data sources.

Option 3: Run from Source (For Contributors)

Best for advanced users who want to modify the core codebase or run directly from the repository:

# Clone the repository
git clone https://github.com/datahub-project/datahub.git
cd datahub

# Start all services with docker-compose
./docker/quickstart.sh

# Access DataHub at http://localhost:9002
# Default credentials: datahub / datahub

Next Steps


๐Ÿ“ฆ Installation Options

DataHub supports three deployment models:

โ†’ See all deployment guides (AWS, Azure, GCP, environment variables)


๐Ÿ—๏ธ Architecture Overview

  • โœ… Streaming-First: Real-time metadata updates via Kafka
  • โœ… API-First: All features accessible via APIs
  • โœ… Extensible: Plugin architecture for custom entity types
  • โœ… Scalable: Proven to 10M+ assets and O(1B) relationships at LinkedIn and other companies in production
  • โœ… Cloud-Native: Designed for Kubernetes deployment

โ†’ Full architecture breakdown: components, storage layer, APIs, and design decisions


๐Ÿ’ป Use Cases & Examples

<details> <summary><b>Example 1: Ingest Metadata from Snowflake</b></summary>

Use Case: Extract table metadata, column schemas, and usage statistics from Snowflake data warehouse.

Prerequisites:

  • DataHub instance running (local or remote)
  • Snowflake account with read permissions
  • DataHub CLI installed (pip install 'acryl-datahub[snowflake]')
# snowflake_recipe.yml
source:
  type: snowflake
  config:
    # Connection details
    account_id: "xy12345.us-east-1"
    warehouse: "COMPUTE_WH"
    username: "${SNOWFLAKE_USER}"
    password: "${SNOWFLAKE_PASSWORD}"

    # Optional: Filter specific databases
    database_pattern:
      allow:
        - "ANALYTICS_DB"
        - "MARKETING_DB"

sink:
  type: datahub-rest
  config:
    server: "http://localhost:8080"
# Run ingestion
datahub ingest -c snowflake_recipe.yml

# Expected output:
# โœ“ Connecting to Snowflake...
# โœ“ Discovered 150 tables in ANALYTICS_DB
# โœ“ Discovered 75 tables in MARKETING_DB
# โœ“ Ingesting metadata...
# โœ“ Successfully ingested 225 datasets to DataHub

What gets ingested:

  • Table and view schemas (columns, data types, descriptions)
  • Table statistics (row counts, size, last modified)
  • Lineage information (upstream/downstream tables)
  • Usage statistics (query frequency, top users)
</details>
<details> <summary><b>Example 2: Search for Datasets via Python SDK</b></summary>

Use Case: Programmatically search DataHub catalog and retrieve dataset metadata.

Prerequisites:

  • DataHub instance accessible
  • Python 3.8+ installed
  • DataHub Python package installed (pip install 'acryl-datahub[datahub-rest]')
from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph

# Initialize DataHub client
config = DatahubClientConfig(server="http://localhost:8080")
graph = DataHubGraph(config)

# Search for datasets containing "customer"
# Returns up to 10 most relevant results
results = graph.search(
    entity="dataset",
    query="customer",
    count=10
)

# Process and display results
for result in results:
    print(f"Found: {result.entity.urn}")
    print(f"  Name: {result.entity.name}")
    print(f"  Platform: {result.entity.platform}")
    print(f"  Description: {result.entity.properties.description}")
    print("---")

# Example output:
# Found: urn:li:dataset:(urn:li:dataPlatform:snowflake,analytics.customer_profiles,PROD)
#   Name: customer_profiles
#   Platform: snowflake
#   Description: Aggregated customer data from CRM and transactions
# ---

Response format: Each result contains:

  • urn: Unique resource identifier for the dataset
  • name: Human-readable dataset name
  • platform: Source platform (snowflake, bigquery, etc.)
  • properties: Additional metadata (description, tags, owners, etc.)
</details>
<details> <summary><b>Example 3: Query Lineage via GraphQL</b></summary>

Use Case: Retrieve upstream and downstream dependencies for a specific dataset.

Prerequisites:

  • DataHub GMS endpoint accessible
  • Dataset URN available from search or ingestion

GraphQL Query:

query GetLineage {
  dataset(
    urn: "urn:li:dataset:(urn:li:dataPlatform:snowflake,analytics.customer_profiles,PROD)"
  ) {
    # Get upstream dependencies (source tables)
    upstream: lineage(input: { direction: UPSTREAM }) {
      entities {
        urn
        ... on Dataset {
          name
          platform {
            name
          }
        }
      }
    }

    # Get downstream dependencies (consuming tables/dashboards)
    downstream: lineage(input: { direction: DOWNSTREAM }) {
      entities {
        urn
        type
        ... on Dataset {
          name
          platform {
            name
          }
        }
        ... on Dashboard {
          dashboardId
          tool
        }
      }
    }
  }
}

Execute via cURL:

curl -X POST http://localhost:8080/api/graphql \
  -H "Content-Type: application/json" \
  -d '{"query": "query GetLineage { ... }"}'

Response structure:

  • upstream: Array of datasets that feed into this dataset
  • downstream: Array of datasets, dashboards, or ML models that consume this dataset
  • Each entity includes URN, type, and basic metadata
</details>
<details> <summary><b>Example 4: Add Documentation via Python API</b></summary>

Use Case: Programmatically add or update dataset documentation and custom properties.

Prerequisites:

  • DataHub Python SDK installed
  • Write permissions to DataHub instance
  • Dataset already exists in DataHub (from ingestion)
from datahub.metadata.schema_classes import DatasetPropertiesClass
from datahub.emitter.mce_builder import make_dataset_urn
from datahub.emitter.rest_emitter import DatahubRestEmitter

# Create emitter to send metadata to DataHub
emitter = DatahubRestEmitter("http://localhost:8080")

# Create dataset URN (unique identifier)
dataset_urn = make_dataset_urn(
    platform="snowflake",
    name="analytics.customer_profiles",
    env="PROD"
)

# Define dataset properties
properties = DatasetPropertiesClass(
    description="""
    Customer profiles aggregated from CRM and transaction data.

    **Update Schedule:** Updated nightly via Airflow DAG `customer_profile_etl`
    **Data Retention:** 7 years for compliance
    **Owner:** Data Platform Team
    """,
    customProperties={
        "owner_team": "data-platform",
        "update_frequency": "daily",
        "data_sensitivity": "PII",
        "upstream_dag": "customer_profile_etl",
        "business_domain": "customer_analytics"
    }
)

# Emit metadata to DataHub
emitter.emit_mcp(
    entityUrn=dataset_urn,
    aspectName="datasetProperties",
    aspect=properties
)

print(f"โœ“ Successfully updated documentation for {dataset_urn}")

What this does:

  1. Adds rich markdown documentation visible in DataHub UI
  2. Sets custom properties for governance and discovery
  3. Makes dataset searchable by custom property values
  4. Enables filtered searches (e.g., "show me all PII datasets")
</details>
<details> <summary><b>Example 5: Connect AI Coding Assistants via Model Context Protocol</b></summary>

Use Case: Enable AI agents (Cursor, Claude Desktop, Cline) to query DataHub metadata directly from your IDE or development environment.

Prerequisites:

  • DataHub instance running and accessible
  • MCP-compatible AI tool installed (Cursor, Claude Desktop, Cline, etc.)
  • Node.js 18+ installed

Quick Setup:

# Initialize MCP server for DataHub
npx -y @acryldata/mcp-server-datahub init

# Follow the interactive prompts to configure:
# - DataHub GMS endpoint (e.g., http://localhost:8080)
# - Authentication token (if required)
# - MCP server settings

Configure your AI tool:

For Claude Desktop, add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "datahub": {
      "command": "npx",
      "args": ["-y", "@acryldata/mcp-server-datahub"]
    }
  }
}

For Cursor, configure in Settings โ†’ Features โ†’ MCP Servers

What you can ask your AI:

  • "What datasets contain customer PII in production?"
  • "Show me the lineage for analytics.revenue_table"
  • "Who owns the 'Revenue Dashboard' in Looker?"
  • "Find all datasets in the marketing domain"
  • "What's the schema for user_events table?"
  • "List datasets tagged as 'critical' or 'sensitive'"

Example conversation:

You: "What datasets are owned by the data-platform team?"

AI: Based on DataHub metadata, here are the datasets owned by data-platform:
- urn:li:dataset:(urn:li:dataPlatform:snowflake,analytics.customer_profiles,PROD)
  Name: customer_profiles
  Platform: Snowflake
  Description: Aggregated customer data from CRM and transactions

- urn:li:dataset:(urn:li:dataPlatform:bigquery,marketing.campaign_performance,PROD)
  Name: campaign_performance
  Platform: BigQuery
  Description: Marketing campaign metrics and ROI tracking

[... more results]

Benefits:

  • โœ… Query metadata without leaving your IDE
  • โœ… Natural language interface (no SQL/GraphQL needed)
  • โœ… Real-time access to DataHub's metadata graph
  • โœ… Understand data context while coding
  • โœ… Discover relevant datasets for your task

๐Ÿ“– Full Documentation: MCP Server for DataHub

</details>

Common Use Cases

Use CaseDescriptionLearn More
๐Ÿ” Data DiscoveryHelp users find the right data for analytics and MLGuide
๐Ÿ“Š Impact AnalysisUnderstand downstream impact before making changesLineage Docs
๐Ÿ›๏ธ Data GovernanceEnforce policies, classify PII, manage accessGovernance Guide
๐Ÿ”” Data QualityMonitor freshness, volumes, schema changesQuality Checks
๐Ÿ“š DocumentationCentralize data documentation and knowledgeDocs Features
๐Ÿ‘ฅ CollaborationFoster data culture with discussions and ownershipCollaboration

๐Ÿ“ DataHub in Action

Learn from teams using DataHub in production and get practical guidance:

<table> <tr> <td width="33%"> <h3><a href="https://datahub.com/blog/metadata-in-action-tips-and-tricks-from-the-field/">๐Ÿ† Best Practices from the Field</a></h3> <p>Real-world metadata strategies from teams at Grab, Slack, and Checkout.com who manage data at scale.</p> <sub><i>Case Studies</i></sub> </td> <td width="33%"> <h3><a href="https://datahub.com/blog/the-what-why-and-how-of-data-contracts/">๐Ÿ“‹ Data Contracts: How to Use Them</a></h3> <p>Practical guide to implementing data contracts between producers and consumers for quality and accountability.</p> <sub><i>Implementation Guide</i></sub> </td> <td width="33%"> <h3><a href="https://datahub.com/blog/datahub-mcp-server-block-ai-agents-use-case/">๐Ÿค– How Block Powers AI Agents with DataHub</a></h3> <p>Real-world case study: scaling data governance and AI operations across 50+ platforms using MCP.</p> <sub><i>AI Case Study</i></sub> </td> </tr> </table> <p align="center"> <a href="https://datahub.com/blog/"><b>โ†’ Explore all posts on our blog</b></a> </p>

๐Ÿข Trusted by Industry Leaders

3,000+ organizations run DataHub in production worldwide โ€” across both open-source deployments and DataHub Cloud โ€” from hyperscale tech companies to regulated financial institutions and healthcare providers.

By Industry

๐Ÿ›’ E-Commerce & Retail: Etsy โ€ข Experius โ€ข Klarna โ€ข LinkedIn โ€ข MediaMarkt Saturn โ€ข Uphold โ€ข Wealthsimple โ€ข Wolt

๐Ÿฅ Healthcare & Life Sciences: CVS Health โ€ข IOMED โ€ข Optum

โœˆ๏ธ Travel & Transportation: Cabify โ€ข DFDS โ€ข Expedia Group โ€ข Hurb โ€ข Peloton โ€ข Viasat

๐Ÿ“š Education & EdTech: ClassDojo โ€ข Coursera โ€ข Udemy

๐Ÿ’ฐ Financial Services: Banksalad โ€ข Block โ€ข Chime โ€ข FIS โ€ข Funding Circle โ€ข GEICO โ€ข Inter&Co โ€ข N26 โ€ข Santander โ€ข Shanghai HuaRui Bank โ€ข Stash โ€ข Visa

๐ŸŽฎ Gaming, Entertainment & Streaming: Netflix โ€ข Razer โ€ข Showroomprive โ€ข TypeForm โ€ข UKEN Games โ€ข Zynga

๐Ÿš€ Technology & SaaS: Adevinta โ€ข Apple โ€ข Digital Turbine โ€ข DPG Media โ€ข Foursquare โ€ข Geotab โ€ข HashiCorp โ€ข hipages โ€ข inovex โ€ข KPN โ€ข Miro โ€ข MYOB โ€ข Notion โ€ข Okta โ€ข Rippling โ€ข Saxo Bank โ€ข Slack โ€ข ThoughtWorks โ€ข Twilio โ€ข Wikimedia โ€ข WP Engine

๐Ÿ“Š Data & Analytics: ABLY โ€ข DefinedCrowd โ€ข Grofers โ€ข Haibo Technology โ€ข Moloco โ€ข PITS Global Data Recovery Services โ€ข SpotHero

And thousands more across DataHub Core and DataHub Cloud.

Featured Case Studies

Using DataHub? Please feel free to add your organization to the list if we missed it โ€” open a PR or let us know on Slack.


๐ŸŒ DataHub Ecosystem

DataHub is part of a rich ecosystem of tools and integrations.

Official Repositories

RepositoryDescriptionLinks
datahubCore platform: metadata model, services, connectors, and web UIDocs
datahub-actionsFramework for responding to metadata changes in real-timeGuide
datahub-helmProduction-ready Helm charts for Kubernetes deploymentCharts
static-assetsLogos, images, and brand assets for DataHub-

Community Plugins & Integrations

ProjectDescriptionMaintainer
datahub-toolsPython tools for GraphQL endpoint interactionNotion
dbt-impact-actionGitHub Action for dbt change impact analysisAcryl Data
business-glossary-sync-actionSync business glossary via GitHub PRsAcryl Data
mcp-server-datahubModel Context Protocol server for AI integrationAcryl Data
meta-worldRecipes, custom sources, and transformationsCommunity

Integrations by Category

๐Ÿ“Š BI & Analytics: Tableau โ€ข Looker โ€ข Power BI โ€ข Superset โ€ข Metabase โ€ข Mode โ€ข Redash

๐Ÿ—„๏ธ Data Warehouses: Snowflake โ€ข BigQuery โ€ข Redshift โ€ข Databricks โ€ข Synapse โ€ข ClickHouse

๐Ÿ”„ Data Orchestration: Airflow โ€ข dbt โ€ข Dagster โ€ข Prefect โ€ข Luigi

๐Ÿค– ML Platforms: SageMaker โ€ข MLflow โ€ข Feast โ€ข Kubeflow โ€ข Weights & Biases

๐Ÿ”— Data Integration: Fivetran โ€ข Airbyte โ€ข Stitch โ€ข Matillion

View all 80+ integrations โ†’


๐Ÿ’ฌ Community & Support

Join thousands of data practitioners building with DataHub!

๐Ÿ—“๏ธ Town Halls

Next Town Hall:

Last Town Hall:

โ†’ View all past recordings

๐Ÿ’ฌ Get Help & Connect

ChannelPurposeLink
Slack CommunityReal-time chat, questions, announcementsJoin 14,000+ members
GitHub DiscussionsTechnical discussions, feature requestsStart a Discussion
GitHub IssuesBug reports, feature requestsOpen an Issue
Stack OverflowTechnical Q&A (tag: datahub)Ask a Question
YouTubeTutorials, demos, talksSubscribe
LinkedInCompany updates, blogsFollow Us
Twitter/XQuick updates, community highlightsFollow @datahubproject

๐Ÿ“ง Stay Updated

๐ŸŽ“ Learning Resources


๐Ÿค Contributing

We โค๏ธ contributions from the community! See CONTRIBUTING.md for setup, guidelines, and ways to get involved.

Browse Good First Issues to get started!


๐Ÿ“š Resources & Learning

๐Ÿ“ฐ Featured Content

Blog Posts & Articles:

Conference Talks:

Podcasts:

๐Ÿ”— Important Links

ResourceURL
๐Ÿ“– Official Documentationhttps://docs.datahub.com
๐Ÿ  Project Websitehttps://datahub.com
๐ŸŒ Live Demohttps://demo.datahub.com
๐Ÿ“Š Roadmaphttps://feature-requests.datahubproject.io/roadmap
๐Ÿ—“๏ธ Town Hall Schedulehttps://docs.datahub.com/docs/townhalls
๐Ÿ’ฌ Slack Communityhttps://datahub.com/slack
๐Ÿ“บ YouTube Channelhttps://youtube.com/@datahubproject
๐Ÿ“ Bloghttps://datahub.com/blog/
๐Ÿ”— LinkedInhttps://www.linkedin.com/company/72009941
๐Ÿฆ Twitter/Xhttps://twitter.com/datahubproject
๐Ÿ”’ Securityhttps://docs.datahub.com/docs/security

๐Ÿ“„ License

DataHub is open source software released under the Apache License 2.0.

Copyright 2015-2025 LinkedIn Corporation
Copyright 2025-Present DataHub Project Contributors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.

What this means:

  • โœ… Commercial use allowed
  • โœ… Modification allowed
  • โœ… Distribution allowed
  • โœ… Patent use allowed
  • โœ… Private use allowed

Learn more: Choose a License - Apache 2.0


<p align="center"> <b>โญ If you find DataHub useful, please star the repository! โญ</b> </p> <p align="center"> Made with โค๏ธ by the DataHub community </p>

Global Ranking

8.5
Trust ScoreMCPHub Index

Based on codebase health & activity.

Manual Config

{ "mcpServers": { "datahub-project-datahub": { "command": "npx", "args": ["datahub-project-datahub"] } } }