Patterns
Deduplication
Handle multi-valued properties cleanly. Use SAMPLE for single values, GROUP_CONCAT for lists, and DISTINCT to remove duplicates.
The Duplicate Problem
Wikidata often has multiple values for the same property (multiple occupations, multiple citizenships, etc.). This causes row multiplication in query results.
Problem: Duplicated Rows
# This produces duplicate rows for multi-paradigm languages
SELECT ?lang ?langLabel ?paradigmLabel
WHERE {
?lang wdt:P31 wd:Q9143 ;
wdt:P3966 ?paradigm .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 30
Solution 1: SAMPLE (Pick One)
Get One Paradigm Per Language
SELECT ?lang ?langLabel
(SAMPLE(?paradigmLabel) AS ?oneParadigm)
WHERE {
?lang wdt:P31 wd:Q9143 ;
wdt:P3966 ?paradigm .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
GROUP BY ?lang ?langLabel
LIMIT 30
Museums with One Country
SELECT ?museum ?museumLabel
(SAMPLE(?countryLabel) AS ?country)
(SAMPLE(?cityLabel) AS ?city)
WHERE {
?museum wdt:P31/wdt:P279* wd:Q33506 .
OPTIONAL { ?museum wdt:P17 ?country . }
OPTIONAL { ?museum wdt:P131 ?city . }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
GROUP BY ?museum ?museumLabel
LIMIT 30
Solution 2: GROUP_CONCAT (Combine All)
List All Paradigms in One Cell
SELECT ?lang ?langLabel
(GROUP_CONCAT(DISTINCT ?paradigmLabel; SEPARATOR=", ") AS ?paradigms)
WHERE {
?lang wdt:P31 wd:Q9143 ;
wdt:P3966 ?paradigm .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
GROUP BY ?lang ?langLabel
LIMIT 30
Cities with All Alternative Names
SELECT ?city ?cityLabel
(GROUP_CONCAT(DISTINCT ?altName; SEPARATOR=" | ") AS ?aliases)
WHERE {
?city wdt:P31 wd:Q515 ;
wdt:P17 wd:Q142 . # France
OPTIONAL { ?city skos:altLabel ?altName .
FILTER(LANG(?altName) = "en") }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
GROUP BY ?city ?cityLabel
LIMIT 30
Solution 3: COUNT (How Many?)
Languages by Paradigm Count
SELECT ?lang ?langLabel
(COUNT(DISTINCT ?paradigm) AS ?paradigmCount)
WHERE {
?lang wdt:P31 wd:Q9143 ;
wdt:P3966 ?paradigm .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
GROUP BY ?lang ?langLabel
ORDER BY DESC(?paradigmCount)
LIMIT 20
Solution 4: DISTINCT
Remove Duplicate Rows
SELECT DISTINCT ?lang ?langLabel
WHERE {
?lang wdt:P31 wd:Q9143 ;
wdt:P3966 ?paradigm .
# Even with multiple paradigms, each language appears once
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 30
Combining Techniques
Full Deduplication Pattern
SELECT ?lang ?langLabel
(SAMPLE(?designerLabel) AS ?designer)
(GROUP_CONCAT(DISTINCT ?paradigmLabel; SEPARATOR=", ") AS ?paradigms)
(COUNT(DISTINCT ?paradigm) AS ?paradigmCount)
WHERE {
?lang wdt:P31 wd:Q9143 .
OPTIONAL { ?lang wdt:P287 ?designer . }
OPTIONAL { ?lang wdt:P3966 ?paradigm . }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
GROUP BY ?lang ?langLabel
ORDER BY DESC(?paradigmCount)
LIMIT 20
Deduplication Summary
| Technique | Use When | Result |
|---|---|---|
SAMPLE(?x) |
Need one value, don't care which | Single arbitrary value |
GROUP_CONCAT |
Want all values in one cell | "a, b, c" |
COUNT(DISTINCT) |
Need the number of values | Integer count |
SELECT DISTINCT |
Remove entire duplicate rows | Unique rows only |
MIN/MAX |
Need smallest/largest value | Boundary value |