Hash Functions
Generate cryptographic hashes for data verification, deduplication, and unique identifiers.
Hash Function Reference
SPARQL provides cryptographic hash functions that convert strings into fixed-length hexadecimal representations.
| Function | Description | Output Length |
|---|---|---|
MD5(str) |
MD5 hash (128-bit) | 32 hex characters |
SHA1(str) |
SHA-1 hash (160-bit) | 40 hex characters |
SHA256(str) |
SHA-256 hash (256-bit) | 64 hex characters |
SHA384(str) |
SHA-384 hash (384-bit) | 96 hex characters |
SHA512(str) |
SHA-512 hash (512-bit) | 128 hex characters |
Note: Hash functions are one-way transformations. The same input always produces the same hash, but you cannot reverse a hash to get the original input.
Basic Hash Generation
Generate hashes from string values.
MD5 Hash of Labels
Create MD5 hashes from entity labels:
SELECT ?country ?countryLabel ?hash WHERE {
?country wdt:P31 wd:Q6256 . # country
?country rdfs:label ?label .
FILTER(LANG(?label) = "en")
BIND(MD5(?label) AS ?hash)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ?countryLabel
LIMIT 20
SHA256 for Stronger Hashing
Use SHA256 for more secure hashes:
SELECT ?city ?cityLabel ?sha256Hash WHERE {
?city wdt:P31/wdt:P279* wd:Q515 . # city
?city wdt:P17 wd:Q142 . # France
?city rdfs:label ?label .
FILTER(LANG(?label) = "en")
BIND(SHA256(?label) AS ?sha256Hash)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 20
Comparing Hash Algorithms
See the different outputs from various hash algorithms.
Multiple Hash Types
Generate different hash types for the same input:
SELECT ?person ?personLabel ?md5 ?sha1 ?sha256 WHERE {
?person wdt:P106 wd:Q170790 . # mathematician
?person wdt:P166 wd:Q38104 . # Fields Medal winner
?person rdfs:label ?label .
FILTER(LANG(?label) = "en")
BIND(MD5(?label) AS ?md5)
BIND(SHA1(?label) AS ?sha1)
BIND(SHA256(?label) AS ?sha256)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 10
Creating Unique Identifiers
Use hashes to create unique identifiers from combined data.
Composite Key Hashing
Create a unique hash from multiple fields:
SELECT ?person ?personLabel ?birthYear ?compositeHash WHERE {
?person wdt:P106 wd:Q36180 . # writer
?person wdt:P27 wd:Q145 . # UK
?person wdt:P569 ?birthDate .
?person rdfs:label ?label .
FILTER(LANG(?label) = "en")
BIND(YEAR(?birthDate) AS ?birthYear)
BIND(CONCAT(?label, "|", STR(?birthYear)) AS ?composite)
BIND(MD5(?composite) AS ?compositeHash)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ?birthYear
LIMIT 20
Hash as Short Identifier
Use substring of hash as a short ID:
SELECT ?painting ?paintingLabel ?shortId WHERE {
?painting wdt:P31 wd:Q3305213 . # painting
?painting wdt:P170 wd:Q5582 . # by Van Gogh
?painting rdfs:label ?label .
FILTER(LANG(?label) = "en")
BIND(SUBSTR(MD5(?label), 1, 8) AS ?shortId)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 20
Hashing URIs
Create hashes from Wikidata entity URIs.
Hash Entity URIs
Generate a hash from the entity's IRI:
SELECT ?element ?elementLabel ?symbol ?uriHash WHERE {
?element wdt:P31 wd:Q11344 . # chemical element
?element wdt:P246 ?symbol . # element symbol
BIND(MD5(STR(?element)) AS ?uriHash)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ?symbol
LIMIT 30
Data Verification
Use hashes to verify data consistency and detect changes.
Detecting Duplicate Labels
Find entities with the same label hash:
SELECT ?hash (COUNT(?person) AS ?count) (SAMPLE(?label) AS ?sampleName) WHERE {
?person wdt:P106 wd:Q33999 . # actor
?person wdt:P27 wd:Q30 . # USA
?person rdfs:label ?label .
FILTER(LANG(?label) = "en")
BIND(MD5(LCASE(?label)) AS ?hash)
}
GROUP BY ?hash
HAVING (COUNT(?person) > 1)
ORDER BY DESC(?count)
LIMIT 20
Content Fingerprint
Create a fingerprint for entity data:
SELECT ?book ?bookLabel ?authorLabel ?year ?fingerprint WHERE {
?book wdt:P31 wd:Q7725634 . # literary work
?book wdt:P50 ?author . # author
?book wdt:P577 ?pubDate . # publication date
?book rdfs:label ?bookName .
?author rdfs:label ?authorName .
FILTER(LANG(?bookName) = "en")
FILTER(LANG(?authorName) = "en")
BIND(YEAR(?pubDate) AS ?year)
BIND(CONCAT(?bookName, "|", ?authorName, "|", STR(?year)) AS ?combined)
BIND(SHA256(?combined) AS ?fingerprint)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 20
Anonymization with Hashing
Use hashes to create pseudonymous identifiers.
Pseudonymous IDs
Replace identifiable information with hashes:
SELECT ?pseudoId ?birthDecade ?countryLabel WHERE {
?person wdt:P106 wd:Q901 . # scientist
?person wdt:P569 ?birthDate .
?person wdt:P27 ?country .
BIND(FLOOR(YEAR(?birthDate) / 10) * 10 AS ?birthDecade)
BIND(SUBSTR(SHA256(STR(?person)), 1, 12) AS ?pseudoId)
FILTER(?birthDecade >= 1900)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
ORDER BY ?birthDecade
LIMIT 30
Hash-Based Grouping
Use hash prefixes to distribute data into buckets.
Partition by Hash Prefix
Group items by the first character of their hash:
SELECT ?bucket (COUNT(?city) AS ?count) WHERE {
?city wdt:P31/wdt:P279* wd:Q515 . # city
?city wdt:P17 wd:Q183 . # Germany
BIND(SUBSTR(MD5(STR(?city)), 1, 1) AS ?bucket)
}
GROUP BY ?bucket
ORDER BY ?bucket
Sample by Hash
Use hash to get a reproducible random sample:
SELECT ?museum ?museumLabel ?countryLabel WHERE {
?museum wdt:P31/wdt:P279* wd:Q33506 . # museum
?museum wdt:P17 ?country .
BIND(SUBSTR(MD5(STR(?museum)), 1, 1) AS ?bucket)
FILTER(?bucket = "a") # ~6% sample (1/16 hex values)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 50
Use Cases Summary
| Use Case | Recommended Hash | Why |
|---|---|---|
| Quick lookups | MD5 | Fast, sufficient for non-security uses |
| Unique identifiers | SHA256 | Lower collision probability |
| Data fingerprints | SHA256 | Good balance of security and length |
| Partitioning | MD5 | Fast and evenly distributed |
| Maximum security | SHA512 | Highest cryptographic strength |