The community for discovering and evaluating data sources

An open framework for documenting, evaluating, and reusing data sources, tools, skills, and use cases.

Sources

Open data sources — portals (L1), datasets (L2), and fragmented sources (L3) — with publisher, license, access, and join-key metadata.

Evaluations

Quality evaluations of sources using the W3C Data Quality Vocabulary: dimensions, metrics, ratings, and recommended actions.

Use cases

Documented reuse scenarios that combine sources and tools, with workflows, join strategies, and outcomes.

Tools

Tools and MCP servers for accessing, validating, cleaning, reconciling, and publishing data, with legal-risk guidance.

Skills

Reusable methods and capability packages documented in YAML and Markdown, with inputs, outputs, and references.

Glossary

Definitions of the framework's core concepts: source levels, DCAT, DQV, join keys, MCP, and provenance.

A common layer for reusable source metadata

Source Commons Framework is a public, versioned catalogue. Sources, evaluations, use cases, tools, and skills are described with one interoperable schema so that data engineers, AI engineers, researchers, and public-interest organizations can find, trust, and reuse them, and feed them to agents through standards like the Model Context Protocol.

Records are stored as open CSV and Markdown in a public GitHub repository and mirror to a live database, keeping provenance and history transparent.

Frequently asked questions

What is the Source Commons Framework?

The Source Commons Framework (SCF) is an open framework and directory for documenting, evaluating, and reusing data sources, tools, skills, and use cases. It provides a shared metadata schema, aligned with DCAT, DQV, schema.org, and Wikidata, so practitioners can find sources, judge their quality, and reuse them reliably.

What do the source levels L1, L2, and L3 mean?

L1 is a portal: a catalogue or institutional access point hosting many datasets. L2 is a dataset: a specific dataset, file collection, API, or stable access path. L3 is a fragmented source: a useful source that needs extraction, parsing, reconciliation, or cross-referencing before reuse.

How are data sources evaluated?

Sources are assessed with quality evaluations based on the W3C Data Quality Vocabulary (DQV). Each evaluation records a quality dimension and metric, a 1 to 5 rating, evaluator and confidence, a quality annotation, and a recommended action. Ratings are aggregated into an average shown on the source page.

What is an MCP server in the framework?

An MCP server is a Model Context Protocol connection that lets agents or users access, inspect, or transform sources and tools. MCP servers are recorded as tools with the category 'MCP server', documenting transport, endpoint, authentication, and the tools, resources, and prompts they expose.

How can I add a source, tool, or skill?

Use the guided contribution flow to draft a record, link it to related items, and open a pull request to the public GitHub repository. Contributions are reviewed and merged into the framework's CSV and Markdown files.

Read the glossary of framework terms