Patterns

Deduplication

Handle multi-valued properties cleanly. Use SAMPLE for single values, GROUP_CONCAT for lists, and DISTINCT to remove duplicates.

The Duplicate Problem

Wikidata often has multiple values for the same property (multiple occupations, multiple citizenships, etc.). This causes row multiplication in query results.

Problem: Duplicated Rows
Run ↗
# This produces duplicate rows for multi-paradigm languages
SELECT ?lang ?langLabel ?paradigmLabel
WHERE {
  ?lang wdt:P31 wd:Q9143 ;
        wdt:P3966 ?paradigm .

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
LIMIT 30

Solution 1: SAMPLE (Pick One)

Get One Paradigm Per Language
Run ↗
SELECT ?lang ?langLabel
       (SAMPLE(?paradigmLabel) AS ?oneParadigm)
WHERE {
  ?lang wdt:P31 wd:Q9143 ;
        wdt:P3966 ?paradigm .

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
GROUP BY ?lang ?langLabel
LIMIT 30
Museums with One Country
Run ↗
SELECT ?museum ?museumLabel
       (SAMPLE(?countryLabel) AS ?country)
       (SAMPLE(?cityLabel) AS ?city)
WHERE {
  ?museum wdt:P31/wdt:P279* wd:Q33506 .
  OPTIONAL { ?museum wdt:P17 ?country . }
  OPTIONAL { ?museum wdt:P131 ?city . }

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
GROUP BY ?museum ?museumLabel
LIMIT 30

Solution 2: GROUP_CONCAT (Combine All)

List All Paradigms in One Cell
Run ↗
SELECT ?lang ?langLabel
       (GROUP_CONCAT(DISTINCT ?paradigmLabel; SEPARATOR=", ") AS ?paradigms)
WHERE {
  ?lang wdt:P31 wd:Q9143 ;
        wdt:P3966 ?paradigm .

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
GROUP BY ?lang ?langLabel
LIMIT 30
Cities with All Alternative Names
Run ↗
SELECT ?city ?cityLabel
       (GROUP_CONCAT(DISTINCT ?altName; SEPARATOR=" | ") AS ?aliases)
WHERE {
  ?city wdt:P31 wd:Q515 ;
        wdt:P17 wd:Q142 .  # France
  OPTIONAL { ?city skos:altLabel ?altName .
             FILTER(LANG(?altName) = "en") }

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
GROUP BY ?city ?cityLabel
LIMIT 30

Solution 3: COUNT (How Many?)

Languages by Paradigm Count
Run ↗
SELECT ?lang ?langLabel
       (COUNT(DISTINCT ?paradigm) AS ?paradigmCount)
WHERE {
  ?lang wdt:P31 wd:Q9143 ;
        wdt:P3966 ?paradigm .

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
GROUP BY ?lang ?langLabel
ORDER BY DESC(?paradigmCount)
LIMIT 20

Solution 4: DISTINCT

Remove Duplicate Rows
Run ↗
SELECT DISTINCT ?lang ?langLabel
WHERE {
  ?lang wdt:P31 wd:Q9143 ;
        wdt:P3966 ?paradigm .

  # Even with multiple paradigms, each language appears once
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
LIMIT 30

Combining Techniques

Full Deduplication Pattern
Run ↗
SELECT ?lang ?langLabel
       (SAMPLE(?designerLabel) AS ?designer)
       (GROUP_CONCAT(DISTINCT ?paradigmLabel; SEPARATOR=", ") AS ?paradigms)
       (COUNT(DISTINCT ?paradigm) AS ?paradigmCount)
WHERE {
  ?lang wdt:P31 wd:Q9143 .
  OPTIONAL { ?lang wdt:P287 ?designer . }
  OPTIONAL { ?lang wdt:P3966 ?paradigm . }

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
}
GROUP BY ?lang ?langLabel
ORDER BY DESC(?paradigmCount)
LIMIT 20

Deduplication Summary

Technique Use When Result
SAMPLE(?x) Need one value, don't care which Single arbitrary value
GROUP_CONCAT Want all values in one cell "a, b, c"
COUNT(DISTINCT) Need the number of values Integer count
SELECT DISTINCT Remove entire duplicate rows Unique rows only
MIN/MAX Need smallest/largest value Boundary value