Docs

A practical reference for the framework CSV files, fields, controlled values, and contribution formats.

View source Markdown

This document defines the Source Commons Framework CSV files, required fields, allowed values, formats, and examples.

The framework uses CSV because it is easy to review in pull requests, easy to validate, and compatible with many data tools. Multiple values in one cell must be separated with a semicolon and a space: value one; value two.

Field Prefixes

PrefixSourceMeaning
dct_Dublin Core TermsGeneral resource metadata such as title, description, publisher, identifier, license, location, and time.
dcat_DCAT 3Catalogue, dataset, distribution, data service, access, and download metadata.
dqv_Data Quality VocabularyQuality dimensions, metrics, measurements, methods, and annotations.
wikidata_WikidataEntity identifiers, topics, related entities, and curated entity-property relationships.
github_GitHub REST APIRepository, user, organization, source code, maintainer, and reproducibility metadata.
hf_Hugging Face Hub APIHub repository, model, dataset, Space, discussion, license, task, and provenance metadata.
mcp_Model Context ProtocolMCP server metadata, exposed tools/resources/prompts, transport, endpoint, authentication, and security notes.
sc_Source CommonsFramework extension terms used when existing standards do not cover operational reuse details.

CSV Files

FilePurpose
data/sources.csvDocuments source records, access paths, legal context, join keys, DCAT mapping, Wikidata entities, GitHub links, and Hugging Face links when relevant.
data/tools.csvDocuments tools used to discover, extract, clean, validate, publish, or evaluate sources, including Wikidata, GitHub, and Hugging Face links when relevant.
data/evaluations.csvStores dated evaluations of source quality, access, legal risk, and recommended actions.
data/use-cases.csvDocuments practical reuse workflows that connect sources, tools, outputs, audiences, limits, topic identities, and reproducible GitHub references.
data/skills.csvIndexes reusable Source Commons skills and links each CSV row to its canonical Markdown file.
skills/*.mdStores skill documentation with generated YAML frontmatter followed by human-readable Markdown content.

General Formats

FormatRuleExample
IdentifierStable ASCII identifier with an uppercase prefix.SRC-CA-TORONTO-BUILDING-PERMITS
URLAbsolute URL beginning with https:// when available.https://open.toronto.ca/dataset/building-permits-active-permits/
Wikidata itemA Wikidata QID. Store the QID as the stable value; a label may be included for review in list fields.Q142 or Employment (Q12737077)
Wikidata propertyA Wikidata PID used only for curated advanced relationships.P137
Wikidata advanced relationCompact JSON object with a Wikidata property and item value. Multiple objects may be separated with ; .{"property":"P137","value":"Q95"}
GitHub resource idRepository full name for repositories, or login for users and organizations.openrefine/openrefine or github
DateISO 8601 calendar date.2026-05-30
Date timeISO 8601 date time with timezone when known.2026-05-30T14:32:00Z
Temporal coverageISO 8601 date, year, date interval, or clear text when the source only publishes a human label.2020-01-01/2025-12-31
CountryPrefer ISO 3166-1 alpha-2 for country codes; plain country names are allowed when the source uses them.CA
RegionPrefer ISO 3166-2 for subdivisions when available; otherwise use the official source label.CA-ON
BooleanLowercase true or false; use Unknown when the answer was not checked.false
Multiple valuesSeparate values with ; .CSV; JSON; API
Related record listSeparate Source Commons record ids with ; .SRC-CA-TORONTO-BUILDING-PERMITS; TOOL-FRICTIONLESS
Empty valueLeave blank only when not applicable or not yet known. Use N/A, Unknown, or Not checked when that distinction matters.Not checked

Every CSV can link to existing Source Commons items without a fixed limit. Use these optional fields when a relationship is useful but is not already represented by a more specific field such as source_ids, tools_used, datasets, use_cases, related_skills, or crosswalk_source_ids.

FieldLinks toCardinalityNotes
related_source_idsdata/sources.csvMultipleRelated datasets, portals, crosswalk candidates, or upstream/downstream sources.
related_tool_idsdata/tools.csvMultipleTools that support, evaluate, transform, inspect, or publish the record.
related_use_case_idsdata/use-cases.csvMultipleReuse cases demonstrated, supported, or affected by the record.
related_skill_idsdata/skills.csvMultipleSkills that explain, reuse, or complement the record.
related_evaluation_idsdata/evaluations.csvMultipleEvaluations that qualify, review, or contextualize the record.

Relationship fields are identifiers, not public contributor names. The website may also mirror these fields into a relational database so the catalogue can show connected records without reading every CSV cell.

Controlled Values

These values are Source Commons controlled values unless a field notes an external standard.

Field groupAllowed values
Source levelL1_portal, L2_dataset, L3_fragmented_source
Risk levelLow, Medium, High, Unknown
ConfidenceLow, Medium, High, Unknown
Metadata statusAvailable, Partial, Not found, Not checked, N/A
Required booleantrue, false, Unknown
Access rightsPublic, Open, Restricted, Gated, Private, Unknown
Account requiredtrue, false, Unknown
Authentication methodNone, API key, OAuth, Token, User agent, Login, IP allowlist, Unknown
Rate limit scopeIP, Account, Token, Endpoint, Organization, None, Unknown
Rate limit periodsecond, minute, hour, day, month, year, Unknown
Access cost typeFree, Paid, Freemium, Restricted, Unknown
Pricing modelfree_open_access, free_with_fair_use, subscription, per_request, per_seat, bulk_license, institutional_license, custom_quote, unknown
StructurednessStructured, Semi-structured, Unstructured, Mixed, Unknown
Fragmentation levelLow, Medium, High, Unknown
Extraction difficultyLow, Medium, High, Unknown
Scraping positionNo scraping needed, Scraping unnecessary if API is used, Scraping discouraged when API exists, Scraping allowed with limits, Scraping legally unclear, Scraping prohibited, Unknown
GitHub resource typerepository, user, organization, N/A
Hugging Face repository typemodel, dataset, space, N/A
DQV expected datatypeboolean, integer, decimal, string, date, dateTime, uri, duration, percentage, count
RatingInteger from 1 to 5
Reuse statusDraft, Tested, Reusable, Published, Archived, Deprecated, Unknown
Skill statusdraft, experimental, stable, deprecated, archived

Source Levels

ValueMeaning
L1_portalA portal, catalogue, registry, or institutional access point that hosts multiple datasets or records.
L2_datasetA specific dataset, file collection, API, repository dataset, or stable data access path.
L3_fragmented_sourceA source that requires extraction, parsing, reconciliation, cross-referencing, or transformation before reuse.

DCAT Connections

For structured open data, DCAT fields can be prefilled from open data portal APIs. Examples include CKAN package APIs, data.gov harvest records, data.gouv.fr dataset APIs, and open.canada.ca catalogue APIs.

When a portal exposes DCAT or DCAT-like JSON, contributors should use the official API or catalogue export to populate fields such as title, description, publisher, identifier, landing page, access URL, download URL, media type, license, spatial coverage, temporal coverage, themes, and keywords.

Source Commons does not replace DCAT. It keeps DCAT-compatible fields and adds operational fields for access method, pricing, rate limits, legal risk, extraction difficulty, join keys, crosswalks, and workflow context.

Wikidata Connections

Wikidata fields make Source Commons records easier to reconcile with a shared knowledge graph. They should be treated as identifiers and assisted relationships, not as free-text tags.

Level 1 is the optional wikidata_id field on entity-bearing tables. It stores one primary Wikidata item for the source, tool, or use case when a clear item exists. Examples include Q142 for France, Q95 for Google, Q8908 for Wikimedia Foundation, and Q720467 for Hugging Face.

Evaluation rows do not get a separate wikidata_id in this version because they describe dated observations about another record. They inherit entity context through source_id.

Level 2 adds assisted source relationships with wikidata_main_topics and wikidata_related_entities. Autocomplete should search Wikidata labels and aliases, while the stored value should preserve the QID. For example, a France Travail open data source could use Employment (Q12737077) as a main topic and France Travail (Q124556307); Ministry of Labour (Q3406276) as related entities.

Level 3 uses wikidata_advanced_relations for curated property-value relationships. Store each relation as a compact object such as {"property":"P137","value":"Q95"}, where P137 is a Wikidata property and Q95 is a Wikidata item. Because relation objects contain commas, quote the CSV cell when a relation is present. The UI should offer a small curated property list for common relationships such as owner, publisher, funder, operator, and covered area, with property search available only inside the advanced relation control.

Contributors should not choose freely from all Wikidata properties in normal editing. Properties such as P31, P279, P361, P527, P749, P1269, and P2578 may be useful in expert curation, but they are too broad or specialized for default contributor workflows.

GitHub Connections

GitHub fields connect a source, tool, or use case to a primary repository, person account, or organization account when that link helps explain provenance, implementation, maintenance, issue tracking, or reproducibility.

Use github_resource_type to choose repository, user, organization, or N/A. Use github_resource_id for the repository full name or account login. Examples include openrefine/openrefine for a repository, github for an organization, and octocat for a user account.

Use github_url for the public GitHub URL and github_api_url for the REST API metadata endpoint when checked. Examples include https://github.com/openrefine/openrefine, https://api.github.com/repos/openrefine/openrefine, https://api.github.com/orgs/github, and https://api.github.com/users/octocat.

GitHub links should be factual and relevant. A record should not link to a promotional account when an official repository, maintainer organization, or reproducibility repository is available.

Hugging Face Connections

Hugging Face metadata appears only in fields beginning with hf_.

These fields can point to models, datasets, Spaces, discussions, or workflow dependencies. They should be filled from public Hugging Face Hub API responses or repository metadata. Contributors should still verify licenses, gated access, model cards, dataset cards, and redistribution limits before relying on a repository.

MCP Connections

MCP fields document Model Context Protocol servers that help users or agents access, inspect, transform, or reuse sources, tools, and skills.

MCP servers should usually be recorded in data/tools.csv with tool_category set to MCP server. A server should be added when it provides a concrete operational interface, not merely because a project mentions MCP.

Use mcp_server_id for a stable identifier such as MCP-DATAGOUV. Use mcp_transport to describe how the server is accessed, for example stdio, http, or sse. Use mcp_tools_exposed, mcp_resources_exposed, and mcp_prompts_exposed to summarize what the server makes available.

Skills may reference supporting MCP servers through the existing mcp_servers field. Use cases may reference MCP servers through tools_used when the server is part of the workflow.

MCP records should include security notes when the server requires credentials, accesses private files, calls external APIs, or can trigger write actions.

Required Fields

Required fields are the minimum fields needed for a pull request to be reviewable. Operational details such as account requirements, authentication, rate limits, pricing, join keys, and legal notes are encouraged when known, but they are optional so public datasets can be submitted without unnecessary friction.

FileRequired fields
data/sources.csvsource_id, source_level, dct_title, dct_description, dct_publisher, dcat_landing_page, access_method
data/tools.csvtool_id, dct_title, tool_category, tool_homepage, open_source_license, typical_tasks, legal_risk_level
data/evaluations.csvevaluation_id, source_id, dqv_dimension, dqv_metric_uri, dqv_metric, dqv_expected_datatype, rating_1_5, confidence, evaluator_id
data/use-cases.csvuse_case_id, dct_title, source_ids, sector, workflow_summary, tools_used, legal_aspects, impact, source_join_keys_used, join_strategy
data/skills.csvskill_id, title, summary, version, status, keywords, solves, skill_url, markdown_path

data/sources.csv

FieldRequiredCardinalityType or formatValuesExampleNotes
source_idYesSingleIdentifierPrefix SRC-SRC-CA-TORONTO-BUILDING-PERMITSStable Source Commons source id.
source_levelYesSingle selectControlled valueSource level valuesL2_datasetUse L1_portal for catalogues, L2_dataset for stable datasets, and L3_fragmented_source when extraction or reconciliation is needed.
dct_titleYesSingleTextSource titleBuilding Permits - Active PermitsMaps to Dublin Core title.
dct_descriptionYesSingleTextSource descriptionInformation on currently active building applications and permits in Toronto.Keep concise and factual.
dct_publisherYesSingleText or URIPublisher name or identifierCity of TorontoMaps to Dublin Core publisher.
dct_identifierNoSingleText or URIOfficial identifiertoronto-building-permits-activeUse the source identifier, catalogue id, DOI, accession number, or repository id.
wikidata_idNoSingleWikidata itemQIDQ142Primary Wikidata item for the source when one clear item exists.
dcat_landing_pageYesSingleURLOfficial landing pagehttps://open.toronto.ca/dataset/building-permits-active-permits/Maps to DCAT landing page.
dcat_access_urlNoSingle or multipleURL listAccess URL valueshttps://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show?id=building-permits-active-permitsMaps to DCAT access URL.
dcat_download_urlNoSingle or multipleURL listDownload URL valueshttps://example.org/download.csvMaps to DCAT download URL.
dcat_distribution_formatNoMultiple select or textFormat listIANA media label, file extension, or source format labelCSV; JSON; APIMaps to DCAT distribution format.
dcat_media_typeNoSingle or multipleIANA media typeMedia type valuestext/csvMaps to DCAT media type.
dcat_endpoint_urlNoSingleURLAPI endpoint URLhttps://example.org/api/searchMaps to DCAT endpoint URL for a data service.
dcat_prefill_urlNoSingleURLPortal metadata URLhttps://open.canada.ca/data/en/api/3/action/package_show?id=exampleThe API or metadata record used to fill DCAT fields.
dcat_prefill_statusNoSingle selectControlled valueMetadata status valuesAvailableStatus of the portal metadata prefill.
dct_access_rightsNoSingle select or source textControlled value or official source labelAccess rights valuesPublicMaps to Dublin Core access rights. Public open data submissions may default to Public.
dct_licenseNoSingleText or URLLicense name, SPDX id, rights statement, or terms URLCity of Toronto Open Data LicencePrefer official license URLs or SPDX ids when possible.
dct_spatialNoSingle or multipleCountry, region, geometry, URI, or textISO 3166 codes preferred for countries and regionsCA-ONMaps to Dublin Core spatial coverage.
dct_temporalNoSingleTemporal coverageISO 8601 date, year, interval, or source label2020-01-01/2025-12-31Maps to Dublin Core temporal coverage.
dcat_themeNoSingle or multipleText, URI, or controlled source themeTheme values from source catalogue when availableUrban planningMaps to DCAT theme.
dcat_keywordsNoMultipleText listKeywordsbuilding permits; development; constructionMaps to DCAT keyword.
wikidata_main_topicsNoMultipleWikidata item listQIDs or labels with QIDsEmployment (Q12737077)Main concepts covered by the source. Store the QID as the stable value.
wikidata_related_entitiesNoMultipleWikidata item listQIDs or labels with QIDsFrance Travail (Q124556307); Ministry of Labour (Q3406276)Organizations, places, people, products, laws, or other entities related to the source.
wikidata_advanced_relationsNoMultipleWikidata advanced relation listCurated property-value objects{"property":"P137","value":"Q95"}Advanced relationships such as operator, owner, publisher, funder, or covered area.
access_methodYesSingle or multipleText listAPI, bulk download, file download, web page, repository files, request process, exportOpen data API; file downloadPractical access route for users.
access_account_requiredNoSingle selectBoolean-liketrue, false, UnknownfalseWhether account creation is required. Optional unless access requires an account.
auth_methodNoSingle or multiple selectControlled value listAuthentication method valuesNoneUse None when no authentication is required. Optional for simple public downloads.
rate_limit_declaredNoSingle selectBoolean-liketrue, false, UnknownUnknownWhether the source declares a rate limit. Optional unless the contributor knows the limit.
rate_limit_scopeNoSingle selectControlled valueRate limit scope valuesIPRequired when a rate limit is known.
rate_limit_valueNoSingleNumberPositive number10Numeric rate limit value.
rate_limit_periodNoSingle selectControlled valueRate limit period valuessecondPeriod for rate_limit_value.
rate_limit_notesNoSingleTextNotesUse conservative polling when no limit is published.Explain fair use or undocumented limits.
access_cost_typeNoSingle selectControlled valueAccess cost type valuesFreeBroad access cost class. Public open data may default to Free.
pricing_modelNoSingle selectControlled valuePricing model valuesfree_open_accessPractical pricing model. Public open data may default to free_open_access.
pricing_currencyNoSingleISO 4217 code or N/ACurrency codeUSDUse only when pricing applies.
pricing_amountNoSingleDecimal number or N/APrice amount0Use N/A when no price applies.
pricing_unitNoSingle select or textPricing unitrequest, month, year, seat, dataset, download, N/AmonthUnit attached to pricing_amount.
pricing_notesNoSingleTextNotesNo fee declared for official API or downloads.Explain uncertain pricing.
update_frequencyNoSingle select or source textFrequencyreal-time, hourly, daily, weekly, monthly, quarterly, annual, irregular, unknown, or official DCAT frequency URIdailyCan be prefilled from DCAT accrual periodicity.
structurednessNoSingle selectControlled valueStructuredness valuesStructuredDescribes machine readability.
fragmentation_levelNoSingle selectControlled valueFragmentation level valuesLowHow scattered the source is.
extraction_difficultyNoSingle selectControlled valueExtraction difficulty valuesLowEstimated effort to turn the source into usable data.
stable_identifiersNoSingle or multipleText listIdentifier namespermit_number; addressStable ids provided by the source.
join_keysNoMultipleText listField namespermit_number; address; wardConcrete fields usable for joins.
legal_risk_levelNoSingle selectControlled valueRisk level valuesLowOverall legal and ethical risk. Recommended when legal or ethical reuse constraints are known.
legal_risk_notesNoSingleTextNotesOpen data license; geocoding and personal data minimization should be checked.Include license, terms, privacy, copyright, and redistribution issues.
scraping_positionNoSingle selectControlled valueScraping position valuesNo scraping neededPrefer official APIs, exports, and bulk downloads.
documented_exception_casesNoSingleTextNotesUse API rather than rendered pages.Known exceptions, limits, or safer access paths.
dct_conforms_toNoMultipleText or URI listStandards and profilesDCAT-3; DQV; SourceCommons-ODSP-0.1Maps to Dublin Core conforms to.
join_key_typesNoMultiple selectControlled value listentity_id, entity_name, organization, person, geography, address, date, time, document_id, filing_id, topic, category, amount, version, otheraddress; geography; dateOptional types of join keys when known.
join_key_examplesNoMultipleText listExamplespermit_number=24 123456; ward=10Show realistic values.
join_key_granularityNoSingleTextGranularity labelPermit and parcel levelOptional description of the row, entity, observation, document, or event level.
join_key_confidenceNoSingle selectControlled valueConfidence valuesHighConfidence in joining this source to others.
crosswalk_source_idsNoMultipleIdentifier listSource idsSRC-CA-HOC-HANSARDOther Source Commons sources that can be linked.
crosswalk_notesNoSingleTextNotesAddress and ward keys allow joins with planning sources.Include caveats and matching methods.
github_resource_typeNoSingle selectControlled valueGitHub resource type valuesorganizationPrimary GitHub resource type linked to this source.
github_resource_idNoSingleGitHub resource idRepository full name or account logingithubRepository owner/name, organization login, or user login.
github_urlNoSingleURLGitHub URLhttps://github.com/githubPublic GitHub URL for the linked resource.
github_api_urlNoSingleURLGitHub REST API URLhttps://api.github.com/orgs/githubAPI metadata endpoint for the linked resource.
github_metadata_statusNoSingle selectControlled valueMetadata status valuesAvailableStatus of GitHub metadata verification.
hf_repo_typeNoSingle selectControlled valueHugging Face repository type valuesdatasetUse only when the source is represented on Hugging Face.
hf_repo_idNoSinglenamespace/nameHugging Face repository idrabuahmad/climatecheckRepository id from the Hub.
hf_api_urlNoSingleURLHub API URLhttps://huggingface.co/api/datasets/rabuahmad/climatecheckAPI metadata endpoint.
hf_metadata_statusNoSingle selectControlled valueMetadata status valuesAvailableStatus of Hub metadata.

data/tools.csv

FieldRequiredCardinalityType or formatValuesExampleNotes
tool_idYesSingleIdentifierPrefix TOOL-TOOL-FRICTIONLESSStable Source Commons tool id.
dct_titleYesSingleTextTool nameFrictionless FrameworkMaps to Dublin Core title.
wikidata_idNoSingleWikidata itemQIDQ720467Primary Wikidata item for the tool or platform when one clear item exists.
tool_categoryYesSingle or multiple selectControlled value listAI repository metadata, Validation, Cleaning, Reconciliation, Extraction, Browser automation, Analytics, Publishing, Geocoding, Quality evaluation, MCP server, OtherValidationTool role in the framework.
tool_homepageYesSingleURLOfficial URLhttps://framework.frictionlessdata.io/Prefer official documentation or repository.
open_source_licenseYesSingleSPDX id, license name, access statement, or UnknownLicense valuesMITFor closed or hosted tools, document access terms.
input_formatsNoMultipleText listFormatsCSV; Excel; JSONInput formats the tool can process.
output_formatsNoMultipleText listFormatsvalidated CSV; schema reportsOutput formats the tool can produce.
source_levels_supportedNoMultiple selectControlled value listSource level valuesL2_dataset; L3_fragmented_sourceSource levels where the tool is useful.
typical_tasksYesSingleTextTask summaryValidate CSV structure, infer schemas, create data packages.Keep operational and concrete.
legal_use_guidanceNoSingleTextGuidanceLegal risk depends on the source being processed.Explain safe use boundaries.
legal_risk_levelYesSingle selectControlled valueRisk level valuesLowRisk from typical use of the tool.
recommended_controlsNoSingleTextControlsKeep original source URL, license, retrieval date, and transformation logs.Safeguards for responsible use.
dct_conforms_toNoMultipleText or URI listStandards and profilesData Package; DCAT-compatible metadata mapping; SourceCommons-ODSP-0.1Standards or APIs the tool supports.
github_resource_typeNoSingle selectControlled valueGitHub resource type valuesrepositoryPrimary GitHub resource type linked to the tool.
github_resource_idNoSingleGitHub resource idRepository full name or account loginfrictionlessdata/frameworkRepository owner/name, organization login, or user login.
github_urlNoSingleURLGitHub URLhttps://github.com/frictionlessdata/frameworkPublic GitHub URL for the linked resource.
github_api_urlNoSingleURLGitHub REST API URLhttps://api.github.com/repos/frictionlessdata/frameworkAPI metadata endpoint for the linked resource.
github_metadata_statusNoSingle selectControlled valueMetadata status valuesAvailableStatus of GitHub metadata verification.
hf_repo_typeNoSingle selectControlled valueHugging Face repository type valuesmodelUse for models or Spaces represented on the Hub.
hf_repo_idNoSinglenamespace/nameHugging Face repository idclimatebert/distilroberta-base-climate-detectorRepository id from the Hub.
hf_api_urlNoSingleURLHub API URLhttps://huggingface.co/api/models/climatebert/distilroberta-base-climate-detectorAPI metadata endpoint.
hf_task_or_sdkNoSingle or multipleText listHub task, SDK, or library labeltext-classificationFrom Hub metadata when available.
hf_licenseNoSingleSPDX id, Hub license tag, or N/ALicense valueapache-2.0Verify against the model or Space card.
hf_last_modifiedNoSingleISO 8601 date time or dateDate time2026-05-30T14:32:00ZFrom Hub metadata when available.
hf_metadata_statusNoSingle selectControlled valueMetadata status valuesAvailableStatus of Hub metadata.
mcp_server_idNoSingleIdentifierStable MCP server idMCP-DATAGOUVStable identifier for the MCP server.
mcp_transportNoSingle selectControlled valuestdio, http, sse, websocket, unknownstdioHow the MCP server is accessed.
mcp_endpoint_urlNoSingleURLEndpoint or commandhttps://mcp.example.org/sseConnection endpoint for HTTP, SSE, or WebSocket transports.
mcp_auth_methodNoSingle selectControlled valuenone, token, oauth, api_key, local_credentials, unknowntokenAuthentication the server requires.
mcp_tools_exposedNoMultipleText listTool namessearch_datasets; get_datasetTools the server exposes.
mcp_resources_exposedNoMultipleText listResource namesdataset; organizationResources the server exposes.
mcp_prompts_exposedNoMultipleText listPrompt namessummarize_datasetPrompts the server exposes.
mcp_installationNoSingleTextInstall or run instructionsnpx @example/datagouv-mcpHow to install or run the server.
mcp_security_notesNoSingleTextSecurity notesRequires API token; can trigger write actions.Notes on credentials, private access, external calls, or write actions.
mcp_statusNoSingle selectControlled valueexperimental, usable, stable, deprecated, unknownusableMaturity of the MCP server record.

data/evaluations.csv

FieldRequiredCardinalityType or formatValuesExampleNotes
evaluation_idYesSingleIdentifierPrefix EVAL-EVAL-001Stable Source Commons evaluation id.
source_idYesSingleIdentifierExisting SRC- idSRC-US-ED-CRDC-HARASSMENTSource being evaluated.
dqv_dimensionYesSingle select or URI-like labelDQV or Source Commons dimensiondqv:Availability, dqv:Completeness, dqv:Consistency, dqv:Accuracy, dqv:Timeliness, dqv:Licensing, sc:Exploitability, sc:LegalRisk, sc:Joinabilitydqv:AvailabilityDQV does not require a closed list; Source Commons recommends these starting values.
dqv_dimension_uriNoSingleURIDQV or Source Commons URIhttps://www.w3.org/ns/dqv#DimensionURI for the quality dimension when available.
dqv_metricYesSingleTextMetric labelDCAT_record_availableHuman-readable or machine-friendly metric name.
dqv_metric_uriYesSingleURI or compact URIMetric URIsc:metric/DCATRecordAvailableStable metric identifier.
dqv_expected_datatypeYesSingle selectControlled valueDQV expected datatype valuesbooleanExpected datatype for dqv_value.
dqv_valueNoSingleValue matching expected datatypeMetric valuetrueObserved metric value.
dqv_unitNoSingleText or URIUnitbooleanUnit for the value when relevant.
rating_1_5YesSingle selectInteger1, 2, 3, 4, 55Human rating where 5 is strongest.
confidenceYesSingle selectControlled valueConfidence valuesHighConfidence in the evaluation.
dqv_computed_onNoSingleISO 8601 date or date timeDate or date time2026-05-30Date the evaluation was performed.
dqv_measurement_methodNoSingleTextMethod descriptionChecked harvest record JSON and first download URL.Manual check, script, API call, or review method.
evaluator_idYesSingleIdentifierContributor or organization idCONTRIB-001Evaluator id.
evaluator_roleNoSingleTextRole labelSource curatorExpertise or responsibility of evaluator.
dqv_quality_annotationNoSingleTextAnnotationA DCAT JSON record can prefill core metadata.Maps to DQV quality annotation.
legal_risk_levelNoSingle selectControlled valueRisk level valuesLowLegal or ethical risk observed during evaluation.
legal_risk_commentNoSingleTextRisk commentAttribution and context should be preserved.Explain risk and evidence.
recommended_actionNoSingleTextActionAdd importer mapping for data.gov harvest records.Suggested fix or follow-up.

data/use-cases.csv

FieldRequiredCardinalityType or formatValuesExampleNotes
use_case_idYesSingleIdentifierPrefix UC-UC-001Stable Source Commons use case id.
dct_titleYesSingleTextUse case titleUrban development early signal monitorMaps to Dublin Core title.
wikidata_idNoSingleWikidata itemQIDQ12737077Primary Wikidata item for the use case topic when one clear item exists.
source_idsYesMultipleIdentifier listExisting SRC- idsSRC-CA-TORONTO-BUILDING-PERMITS; SRC-CA-HOC-HANSARDSources used by the workflow.
sectorYesSingle or multipleText listDomain labelsUrban planningSector or problem area.
user_org_typeNoSingleTextOrganization typeCity strategy teamType of user or team.
question_answeredNoSingleTextQuestionWhere are construction and policy signals increasing?Main question addressed.
workflow_summaryYesSingleTextWorkflow summaryNormalize permits, geocode addresses, extract policy mentions, aggregate by district and topic.Short operational workflow.
tools_usedYesMultipleIdentifier listExisting TOOL- idsTOOL-OPENREFINE; TOOL-FRICTIONLESSTools used by the workflow.
output_typeNoSingle or multiple selectControlled value listDataset, Dashboard, API, Notebook, Report, Briefing, Model, Knowledge base, Map, OtherDashboard; briefingOutput produced by the workflow.
result_linkNoSingleURL or status labelURL, not_published_demo, internal_only, N/Anot_published_demoLink to output when available.
legal_aspectsYesSingleTextLegal notesUse official open data API and cite municipal source.License, terms, privacy, scraping, or redistribution considerations.
impactYesSingleTextImpact statementEarlier detection of local development pressure.Practical value or expected outcome.
audienceNoSingle or multipleText listAudience labelsResearchers; public teamsIntended users or beneficiaries.
reuse_statusNoSingle selectControlled valueReuse status valuesDraftMaturity of the use case.
confidenceNoSingle selectControlled valueConfidence valuesMediumConfidence in the workflow or evidence.
source_join_keys_usedYesMultipleText listSource id and key namesSRC-CA-TORONTO-BUILDING-PERMITS: ward; address; application_dateJoin keys actually used.
join_strategyYesSingleTextStrategy descriptionGeocode addresses, aggregate by ward and week, then join with policy topics.How sources are connected.
join_confidenceNoSingle selectControlled valueConfidence valuesMediumConfidence in source joins.
github_resource_typeNoSingle selectControlled valueGitHub resource type valuesrepositoryPrimary GitHub resource type linked to the use case.
github_resource_idNoSingleGitHub resource idRepository full name or account loginexample-org/urban-monitor-demoRepository owner/name, organization login, or user login.
github_urlNoSingleURLGitHub URLhttps://github.com/example-org/urban-monitor-demoPublic GitHub URL for the linked resource.
github_api_urlNoSingleURLGitHub REST API URLhttps://api.github.com/repos/example-org/urban-monitor-demoAPI metadata endpoint for the linked resource.
github_metadata_statusNoSingle selectControlled valueMetadata status valuesNot checkedStatus of GitHub metadata verification.
hf_models_usedNoMultiplenamespace/name listHub model ids or N/Aclimatebert/distilroberta-base-climate-detectorHugging Face models used.
hf_spaces_usedNoMultiplenamespace/name listHub Space ids or N/Anarcis2007/ClimateBERTHugging Face Spaces used.
hf_datasets_usedNoMultiplenamespace/name listHub dataset ids or N/Arabuahmad/climatecheckHugging Face datasets used.
hf_discussion_refsNoMultipleURL listHub discussion URLs or API URLshttps://huggingface.co/api/models/example/model/discussionsHub discussions, issues, or review context.

data/skills.csv

Skills are reusable capability descriptions: a method, recipe, promptable workflow, context package, or operational procedure that helps someone do a recurring task. A skill can mention tools and datasets, but it is not itself a software tool. A skill can support many use cases.

Every skill is represented twice:

  • one row in data/skills.csv for indexing, search, and review;
  • one Markdown file in skills/{skill_id}.md with generated YAML frontmatter and editable Markdown body.
FieldRequiredCardinalityType or formatValuesExampleNotes
skill_idYesSingleIdentifierPrefix scf-skill- plus UUIDscf-skill-6f4d8e5c-12c2-4b61-9f62-c6b3e713b35dGenerated by scframework.org. Stable public skill id.
titleYesSingleTextSkill titleGeocode addressesHuman-readable name.
summaryYesSingleTextShort summaryConvert postal addresses into geographic coordinates.One sentence for cards and search.
versionYesSingleSemantic-ish versionVersion string0.1Increment when the method meaningfully changes.
statusYesSingle selectControlled valueSkill status valuesstableMaturity of the skill.
keywordsYesMultipleText listKeywordsgeocoding; addresses; mapsSearch terms.
solvesYesMultipleText listProblems solvedaddress matching; territorial analysisProblems or tasks the skill helps solve.
descriptionNoSingleMarkdown-capable textDescriptionConvert postal addresses into coordinates using open datasets and APIs.Longer description, also included in frontmatter when provided.
authorsNoMultipleText listPeople or organizationsSource Commons LabAuthors or maintainers.
licenseNoSingleSPDX id or license labelLicense valueCC-BY-4.0License for the skill text and method.
createdNoSingleISO 8601 dateDate2026-06-05Initial creation date.
updatedNoSingleISO 8601 dateDate2026-06-05Last meaningful update date.
categoriesNoMultipleText listCategory labelsgeography; transportBroad skill category.
domainNoMultipleText listDomain labelsmobilityDomain where the skill is useful.
use_casesNoMultipleText listUse case labels or idsdelivery optimization; territory observatoriesUse cases helped by the skill.
inputsNoMultipleText listInputspostal addressExpected input material.
outputsNoMultipleText listOutputslatitude; longitudeExpected outputs.
toolsNoMultipleText list or idsToolsDuckDB; PythonTools often used by the skill.
datasetsNoMultipleText list or idsDatasetsOpenStreetMap; BANUseful datasets.
mcp_serversNoMultipleText listMCP server idsdatagouv-mcpOptional MCP servers supporting the skill.
related_skillsNoMultipleIdentifier or slug listSkill ids or slugsreverse-geocodingRelated or complementary skills.
works_withNoMultipleFile, format, or context labelsContext referencescontext.md; GeoJSONFiles or formats the skill can operate with.
referencesNoMultipleURL listAbsolute URLshttps://wiki.openstreetmap.org/wiki/NominatimReferences and documentation.
canonical_urlNoSingleURLAbsolute URLhttps://sourcecommons.org/skills/geocode-addressesExternal canonical page when one exists.
languageNoSingleBCP 47 language tagLanguage codeenMain language.
alternate_languagesNoMultipleBCP 47 language tagsLanguage codesfrLanguages available or expected.
skill_urlYesSingleURL path or absolute URLscframework URLhttps://scframework.org/skill/scf-skill-...Public page generated by scframework.org.
markdown_pathYesSingleRepository pathMarkdown file pathskills/scf-skill-....mdMarkdown file written in the same pull request.

The generated YAML frontmatter should preserve the same field names as the CSV where possible. Multi-value fields are YAML arrays in Markdown and semicolon-separated lists in CSV.

Example Source Row

The CSV files in this repository remain empty except for headers. This example shows the expected style for a future source pull request.

source_id,source_level,dct_title,dct_description,dct_publisher,dct_identifier,wikidata_id,dcat_landing_page,dcat_access_url,dcat_download_url,dcat_distribution_format,dcat_media_type,dcat_endpoint_url,dcat_prefill_url,dcat_prefill_status,dct_access_rights,dct_license,dct_spatial,dct_temporal,dcat_theme,dcat_keywords,wikidata_main_topics,wikidata_related_entities,wikidata_advanced_relations,access_method,access_account_required,auth_method,rate_limit_declared,rate_limit_scope,rate_limit_value,rate_limit_period,rate_limit_notes,access_cost_type,pricing_model,pricing_currency,pricing_amount,pricing_unit,pricing_notes,update_frequency,structuredness,fragmentation_level,extraction_difficulty,stable_identifiers,join_keys,legal_risk_level,legal_risk_notes,scraping_position,documented_exception_cases,dct_conforms_to,join_key_types,join_key_examples,join_key_granularity,join_key_confidence,crosswalk_source_ids,crosswalk_notes,related_source_ids,related_tool_ids,related_use_case_ids,related_skill_ids,related_evaluation_ids,github_resource_type,github_resource_id,github_url,github_api_url,github_metadata_status,hf_repo_type,hf_repo_id,hf_api_url,hf_metadata_status
SRC-CA-TORONTO-BUILDING-PERMITS,L2_dataset,Building Permits - Active Permits,Information on currently active building applications and permits in Toronto.,City of Toronto,toronto-building-permits-active,,https://open.toronto.ca/dataset/building-permits-active-permits/,https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show?id=building-permits-active-permits,,CSV; JSON; API,application/json,https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show?id=building-permits-active-permits,https://open.toronto.ca/dataset/building-permits-active-permits/,Partial,Public,City of Toronto Open Data Licence,CA-ON,Current,Urban planning,building permits; development; construction; addresses,,,,Open data API; file download,false,None,Unknown,IP,,,No public limit found; check API behavior before monitoring.,Free,free_open_access,N/A,N/A,N/A,No fee declared for official API or downloads.,daily,Structured,Low,Low,Permit numbers and addresses,permit_number; address; ward; application_date,Low,Open data license; geocoding and personal data minimization should be checked.,No scraping needed,Use official open data API or downloads.,DCAT-3; DQV; SourceCommons-ODSP-0.1,address; geography; date,permit_number=24 123456; ward=10; application_date=2026-01-12,Permit and parcel level,High,N/A,Address and ward keys allow joins with planning and zoning sources.,,,,,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A,N/A

Identifier Rules

Identifiers should be stable and human-readable. Suggested prefixes:

  • SRC- for sources.
  • TOOL- for tools.
  • EVAL- for evaluations.
  • UC- for use cases.
  • scf-skill- plus a UUID for skills.

Records should not reuse an identifier for a different entity after publication.