Wiki Workshop 2020: John Samuel

I got the opportunity to attend Wiki Workshop 2020¹ on 21^st April, after two years. Unlike the previous workshop of 2018², which was held in Lyon, France, this one was held online. As far as I recall, this is the first time that I registered and attended a virtual workshop. Nevertheless, the excitement was the same for me as well as for all the participants interested in Wikimedia projects.

The keynote talk was given by Jess Wade. She is very well known to most Wikipedia researchers and community members for her contributions to improving the gender gap on Wikipedia projects. With multiple examples on how Wikipedia is becoming the first point of reference for not only the general public but also for academics as well as journalists, she stressed on the need for improving the representation of women, LGBTI+, and people of color. The improvement in the visibility of minority communities may play a pivotal role in dealing with societal biases.

Next in line, was the interview of Mark Graham, Internet Archive (Wayback Machine) by Bob West (EPFL). This interview highlighted some of the common characteristics between the Internet Archive and Wikipedia. Both are increasingly becoming a major destination for finding information. One may even wonder whether this archiving effort can be referred to as the Library of Alexandria version 2.0. During these tough times, such efforts must be indeed amplified especially to deal with the growing problems of disinformation associated with the Coronavirus pandemic. Anybody, especially journalists can contribute to the Internet Archive by saving pages using the Save Page tool³. Researchers can also access this vast data source through its API⁴.

Following this interview, there was a panel talk including Kristina Lerman (USC ISI), Misha Teplitskiy (University of Michigan), Benjamin Mako Hill (University of Washington), and Jérôme Hergueux (CNRS). Kristina Lerman talked about her works on algorithmic bias and collaboration. Cognitive diversity and the role of ideological differences of people in communities were the discussion points of Misha Teplitskiy, pointing out that the more ideologically diverse a team is, the more diverse is the Wikipedia page content and the ideological diversity also means that discussion will move to newer and broader topics. Oversight and governance structure were the topics discussed by Jérôme Hergueux, especially how these structures help in the successful creation of rules, but pointing out that these rules become major barriers to entry for women and minorities, hence the importance of the study of behavioral sciences. Finally, Benjamin Mako Hill, who studies the lifecycle of the building of communities talked about the pattern of the initial openness of communities to their eventual closedness, hindering the entry to newbies. Such situations give the impression of competition instead of cooperation. Tiered architecture and the usefulness of badges were also discussed.

Following this insightful panel discussion, a couple of articles and lightning talks were presented. Wikitrends ⁵ was demonstrated, which takes into consideration the page views from three languages: English, French, and Russian to study the trends and language biases. Kai Zhu et al. focused on content growth, i.e., the creation of Wikipedia articles, based on the understanding of clickstream data, both from internal and external sources as well as the study of attention propagation. Nicholas Vincent et al. presented the work of the incidence rate of Wikipedia pages on search engine query results for different search engines and different devices (mobile and desktop). Both the number of articles and the position of articles (i.e., the exposure of articles as well as the connectivity with other articles) must be used to understand the geographical bias in the multilingual records of biographies on Wikipedia was suggested by Pablo Beytia et al.

Other works took into consideration the field of performing arts (ballet and opera) to understand the network structure and collaboration patterns between the participants using the Pittsburg ballet theatre data as well as Wikidata (Yessica Herrera-Guzman et al.). Automatic scholar profiling based on Wikipedia was also presented. Chien-Chun Ni et al⁶. presented the work on layered graph embedding which focused on embedding generation from Wikilinks graph, Link-main graph (main text), and clickstream graph. Matching Ukrainian redlinks with English Wikipedia was also presented. Another interesting project called Wikigender⁷ discussed on using machine learning models to detect gender bias, highlighting the bias in adjectives and the bias in nouns for biographies of different genders. Citation detective and citation hunt tool were used to quantify citation quality. It's important to state that statements on Wikipedia require citation (as they famously call it 'citation needed'). Finally an iterative design process was presented to diagnose the incompleteness of Wikidata. There was a poster session, following these presentations.

Wiki workshop was indeed an opportunity to learn about several new topics related to Wikimedia projects including Wikipedia and Wikidata. Despite being a virtual event, the organizers had done a pretty good job. This was undoubtedly a unique experience not only for organizers but also for speakers and participants during these tough times.

References

Wiki Workshop 2020
Wiki Workshop 2018
Save Pages in the Wayback Machine
Internet Archive APIs
Wikitrends: Graph Visualization of Wikipedia
Ni, Chien-Chun, et al. "Layered Graph Embedding for Entity Recommendation Using Wikipedia in the Yahoo! Knowledge Graph." ArXiv:2004.06842 [Cs], Apr. 2020. arXiv.org, doi:10.1145/3366424.3383570.
Wikigender - Exploring gender linguistic bias in the overview of Wikipedia biographies