The community for discovering and evaluating data sources
An open framework for documenting, evaluating, and reusing data sources, tools, skills, and use cases.
Open data sources — portals (L1), datasets (L2), and fragmented sources (L3) — with publisher, license, access, and join-key metadata.
EvaluationsQuality evaluations of sources using the W3C Data Quality Vocabulary: dimensions, metrics, ratings, and recommended actions.
Use casesDocumented reuse scenarios that combine sources and tools, with workflows, join strategies, and outcomes.
ToolsTools and MCP servers for accessing, validating, cleaning, reconciling, and publishing data, with legal-risk guidance.
SkillsReusable methods and capability packages documented in YAML and Markdown, with inputs, outputs, and references.
GlossaryDefinitions of the framework's core concepts: source levels, DCAT, DQV, join keys, MCP, and provenance.
A common layer for reusable source metadata
Source Commons Framework is a public, versioned catalogue. Sources, evaluations, use cases, tools, and skills are described with one interoperable schema so that data engineers, AI engineers, researchers, and public-interest organizations can find, trust, and reuse them, and feed them to agents through standards like the Model Context Protocol.
Records are stored as open CSV and Markdown in a public GitHub repository and mirror to a live database, keeping provenance and history transparent.
- Provenance firstEvery record links to its publisher, landing page, and stable identifiers.
- Interoperable by designMetadata aligns with DCAT, DQV, schema.org, and Wikidata.
- Reuse-orientedJoin keys, access methods, and legal-risk notes make sources actually reusable.
- Open and versionedRecords live in public CSVs and Markdown on GitHub, open to contribution.
Frequently asked questions
What is the Source Commons Framework?
The Source Commons Framework (SCF) is an open framework and directory for documenting, evaluating, and reusing data sources, tools, skills, and use cases. It provides a shared metadata schema, aligned with DCAT, DQV, schema.org, and Wikidata, so practitioners can find sources, judge their quality, and reuse them reliably.
What do the source levels L1, L2, and L3 mean?
L1 is a portal: a catalogue or institutional access point hosting many datasets. L2 is a dataset: a specific dataset, file collection, API, or stable access path. L3 is a fragmented source: a useful source that needs extraction, parsing, reconciliation, or cross-referencing before reuse.
How are data sources evaluated?
Sources are assessed with quality evaluations based on the W3C Data Quality Vocabulary (DQV). Each evaluation records a quality dimension and metric, a 1 to 5 rating, evaluator and confidence, a quality annotation, and a recommended action. Ratings are aggregated into an average shown on the source page.
What is an MCP server in the framework?
An MCP server is a Model Context Protocol connection that lets agents or users access, inspect, or transform sources and tools. MCP servers are recorded as tools with the category 'MCP server', documenting transport, endpoint, authentication, and the tools, resources, and prompts they expose.
How can I add a source, tool, or skill?
Use the guided contribution flow to draft a record, link it to related items, and open a pull request to the public GitHub repository. Contributions are reviewed and merged into the framework's CSV and Markdown files.