Documenting and preserving programming languages and software in Wikidata

John Samuel, Katherine Thornton, Kenneth Seals-Nutt

CPE Lyon, EaaSI

SWIB 2018, Bonn, 27th November, 2018

Creative Commons License

Programming Languages

English Wikipedia Infoboxes of Programming Languages

Programming Languages

Programming Languages with the most multilingual labels

Programming Languages

Programming Language Paradigms

Programming Languages

Programming Languages with the most number of different paradigms

Programming Languages

Programming languages with most number of multilingual Wikpedia articles

Programming Languages

Wikipedia languages with the most number of articles on programming languages

Programming Languages

Languages with the most number of labels of programming languages

Software

English Wikipedia Infoboxes of Software

Software

Software with the most number of labels on Wikidata

Software

Software with the most number of articles on Wikipedia

Software

Languages with the most number of articles on Wikipedia

Software

Languages with the most number of Software labels on Wikidata

Operating Systems

English Wikipedia Infoboxes of Operating Systems

Digital Preservation

  • Digital Preservation
    • OPF
    • Software Heritage
    • EaaSI

Wikidata

  • Wikidata
    • Started in 2012
    • is free, open, linked, structured, collaborative and multilingual knowledge base
    • From multi-(sub)domain multilingual Wikipedia sites to a single-domain multilingual website
    • Collaborative Multilingual Multi-domain Ontology development

Wikipedia to Wikdiata

Importing structured data from Wikipedia Infoboxes to Wikidata

Wikdiata to Wikipedia

Exporting data from Wikidata to multiple multilingual Wikipedia articles

Wikipedia Infobox Properties

Existing English Wikipedia Infobox Properties of Programming Languages

Wikidata

Wikidata entry of Python Programming Language (labels)

Wikidata Properties

Wikidata entry of Python Programming Language (property values)

Wikidata Properties

Example of Wikidata Property

Wikidata Properties

Property Creation on Wikidata

Wikidata Projects

Example Wikidata WikiProject

Wikidata Projects

Example Wikidata WikiProject and Property Suggestions

Tools: Histropedia

Timeline of Programming Languages

http://histropedia.com/timeline/d98rtpg9bg0t/Programming-languages

Status of software data

  • Wikidata
    • 85,000
    • destop applications
    • research software
    • FLOSS

Licenses approved by the Free Software Foundation

Licenses approved by the Free Software Foundation by count of software titles available under each

UNIX utilities

Some unix utilities have their own identifiers in the LoC Name Authority File or in the GND

Deutsches Forschungsnetz

Software developed by members of Deutsches Forschungsnetz

File format items

File format items that have a LoC FDD identifier, along with all other identifiers

Wikidata for Digital Preservation

  • Wikidata
    • Inspired by WikiGenomes
    • Streamlined interface
    • Property checklists tailored to digital preservation
    • Specialty searches (PUID, mimetype)

Wikidata for Digital Preservation

WikiGenomes

wikigenomes.org

Role of Portals

  • About 5,000 properties in Wikidata
  • Data models are not pre-defined
  • Portal has a domain-specific property checklist

Technologies

WikiDP.org

Screenshot of search results in the WikiDP portal

WDProp

  1. WDProp:
    • Collaborative Multilingual Multi-domain Ontology development: is it possible to achieve a truly multilingual experience?
  2. Goals:
    • Understanding Wikidata property proposal, creation and translation
    • Available templates and their usage
    • Providing real-time statistics to (multilingual) contributors

WDProp

Information on Wikidata Properties

WDProp

  • WDProp
    • Get real-time translation statistics
    • Navigate supported languages, properties, datatypes, classes
    • Compare translation statistics
    • Find available properties for an entity
    • Uses Wikidata SPARQL endpoints and Mediawiki API
  • URL

Conclusion

  • Digital Heritage
    • Wikidata: Multilingual, Structured Knowledge Base
    • Need for Digital Preservation
    • Digital Preservation on Wikidata
    • Community participation: Property proposition, translation and item description
    • Tools using SPARQL endpoints and/or MediaWiki API

Tools and Projects

Tools and Projects

References

  1. Kaffee, L. A., Piscopo, A., Vougiouklis, P., Simperl, E., Carr, L., & Pintscher, L. (2017, August). A glimpse into Babel: an analysis of multilinguality in Wikidata. In Proceedings of the 13th International Symposium on Open Collaboration (p. 14). ACM.
  2. Müller-Birn, C., Karran, B., Lehmann, J., & Luczak-Rösch, M. (2015, August). Peer-production system or collaborative ontology engineering effort: What is Wikidata?. In Proceedings of the 11th International Symposium on Open Collaboration (p. 20). ACM.
  3. Samuel, J. (2017) Collaborative Approach to Developing a Multilingual Ontology: A Case Study of Wikidata. In : Research Conference on Metadata and Semantics Research. Springer, Cham, 2017. p. 167-172.
  4. Samuel, J. (2018). Towards Understanding and Improving Multilingual Collaborative Ontology Development in Wikidata. In: WikiWorkshop 2018
  5. Thornton, K., Cochrane E., Ledoux T. (2017). Modeling the Domain of Digital Preservation in Wikidata . In: iPRES 2017

Thank you

Questions?

SPARQL Query

Programming paradigms with the count of programming languages

SELECT ?paradigmLabel (count(?prog) as ?count)
{
  ?prog wdt:P31 wd:Q9143;
        wdt:P3966 ?paradigm.
  SERVICE wikibase:label { bd:serviceParam
          wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP by ?paradigmLabel
HAVING (?count>1)
          

SPARQL Query

Programming languages with the count of programming paradigm

SELECT ?progLabel (count(?paradigm) as ?count)
{
  ?prog wdt:P31 wd:Q9143;
        wdt:P3966 ?paradigm.
  SERVICE wikibase:label { bd:serviceParam
          wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP by ?progLabel
HAVING (?count>2)
          

SPARQL Query

Programming languages with the count of multilingual labels

SELECT ?languageLabel (count(?label) as ?count) {
  {
    SELECT DISTINCT ?languageLabel ?label (lang(?label) as ?langLabel) {
      ?language wdt:P31/wdt:P279* wd:Q9143;
                rdfs:label ?label.
      SERVICE wikibase:label { bd:serviceParam
          wikibase:language "[AUTO_LANGUAGE],en". }
      }
  }

}
GROUP by ?languageLabel
HAVING (?count > 50)
ORDER by DESC(?count)

          

SPARQL Query

Software with the count of multilingual labels

SELECT ?softwareLabel (count(?label) as ?count) {
  {
    SELECT DISTINCT ?softwareLabel ?label (lang(?label) as ?langLabel) {
      ?software wdt:P31/wdt:P279 wd:Q7397;
                rdfs:label ?label.
      SERVICE wikibase:label { bd:serviceParam
          wikibase:language "[AUTO_LANGUAGE],en". }
      }
  }

}
GROUP by ?softwareLabel
HAVING (?count > 40)
ORDER by DESC(?count)

          

SPARQL Query

Programming language with the count of multilingual labels

SELECT ?langLabel (count(?language) as ?count) {
  {
    SELECT DISTINCT (lang(?label) as ?langLabel) ?language {
      ?language wdt:P31/wdt:P279* wd:Q9143;
                rdfs:label ?label.
      }
  }

}
GROUP by ?langLabel
ORDER by DESC(?count)

          

SPARQL Query

Language with the count of software labels

SELECT ?langLabel (count(?software) as ?count) {
  {
    SELECT DISTINCT (lang(?label) as ?langLabel) ?software {
      ?software wdt:P31/wdt:P279* wd:Q7397;
                rdfs:label ?label.
      }
  }

}
GROUP by ?langLabel
ORDER by DESC(?count)
          

SPARQL Query

Languages with the count of Wikipedia articles on programming languages

SELECT DISTINCT ?languageLabel ?sitelinks {
      ?language wdt:P31/wdt:P279* wd:Q9143;
                wikibase:sitelinks ?sitelinks.
       FILTER(?sitelinks > 20)
       SERVICE wikibase:label { bd:serviceParam
          wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER by DESC(?sitelinks)

          

SPARQL Query

Languages with the count of Wikipedia articles on software

SELECT DISTINCT ?softwareLabel ?sitelinks {
      ?software wdt:P31/wdt:P279* wd:Q7397;
                wikibase:sitelinks ?sitelinks.
       FILTER(?sitelinks > 100)
       SERVICE wikibase:label { bd:serviceParam
          wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER by DESC(?sitelinks)

          

SPARQL Query

Languages with the count of Wikipedia articles on programming languages

SELECT ?lang (count(?progLanguage) as ?count) {
  {
    SELECT DISTINCT ?progLanguage ?lang {
      ?progLanguage wdt:P31/wdt:P279* wd:Q9143.
      [] schema:about ?progLanguage;
         schema:inLanguage ?lang.
     }
  }
}
GROUP BY ?lang
ORDER BY DESC(?count)
          

SPARQL Query

Languages with the count of Wikipedia articles on Software

SELECT ?lang (count(?software) as ?count) {
  {
    SELECT DISTINCT ?software ?lang {
      ?software wdt:P31/wdt:P279* wd:Q7397.
      [] schema:about ?software;
         schema:inLanguage ?lang.
     }
  }
}
GROUP BY ?lang
ORDER BY DESC(?count)
          

SPARQL Query

Licenses approved by the Free Software Foundation by count of software titles available under each

SELECT ?item ?itemLabel (COUNT(DISTINCT ?software) AS ?count) WHERE {
  ?software (wdt:P31/wdt:P279*) wd:Q7397.
  ?software wdt:P275 ?item.
  ?item wdt:P790 wd:Q48413.
  SERVICE wikibase:label { bd:serviceParam
          wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?item ?itemLabel
ORDER BY DESC(?count)
          

SPARQL Query

UNIX utilities with identifiers in the LoC Name Authority File or in the GND

SELECT ?item ?itemLabel ?LCNAF ?GND
WHERE
{
  ?item wdt:P31 wd:Q18343316.
  OPTIONAL {?item wdt:P244 ?LCNAF}.
  OPTIONAL {?item wdt:P227 ?GND}.
  SERVICE wikibase:label { bd:serviceParam
          wikibase:language "[AUTO_LANGUAGE],en". }
}
          

SPARQL Query

Software developed by members of Deutsches Forschungsnetz

SELECT ?member ?memberLabel ?software ?softwareLabel WHERE {
  ?member wdt:P463 wd:Q2514863.
  ?software wdt:P178 ?member.
  SERVICE wikibase:label { bd:serviceParam
          wikibase:language "[AUTO_LANGUAGE],en". }
}