2017 Social Media Data Stewardship Specialist Meeting: Recap

The 2017 Specialist Meeting on Social Media Data Stewardship brought together 30 experts in social media research to discuss methods, ethics, and policies specifically related to managing social media data by industry, researchers and data librarians.

The event featured presentations from four invited speakers. Their abstracts and slide decks are listed below.

If you are working on a project/paper or organizing an event related to the broad theme of Social Media Data Stewardship,  we would like to hear from you! Please post a link to your project/paper/event using the commenting section at the end of the blog post.

1. Ethics of Academic Use of Social Media Data
Annette Markham (Aarhus University, Denmark)

Abstract: For many people–academics, designers, and policymakers alike, “ethics” is a term laden with vague philosophical and strict regulatory baggage. Rather than embrace the concept as a guiding force for research or technological design, many see it only as a hoop to jump through. In this talk, Annette Markham presents an impact model of ethics that emphasizes the need to consider the outcomes of actions at various stages, time points, and levels. She explains how this model usefully shifts ethics related discussions from a priori to ex post facto, which in turn may nurture more future-oriented, socially accountable stances toward tech design, academic research methods, data stewardship practices, and regulatory ambitions.

An impact model of ethics is rooted in a larger consideration of ethics as methods whereby ethics emerge from everyday decisions that turn one’s gaze, design, approach, or project one way versus countless other possible ways. In this way, both ‘ethic’ and ‘method’ are broadened or even deconceptualized to include everyday habitual ways of knowing and making sense of the world around us. This presentation also emphasizes the importance of developing conceptual categories and vocabularies around ethical data stewardship that do not oversimplify what counts as data, social media, and other terms that emerge from everyday decision making by researchers and designers.

In the context of this workshop, this impact model of ethics is offered as a conversation starter for discussing baselines for what might be included in future models of ethical social media data stewardship.

2. Industry Social Data Code of Ethics
Stuart Shulman (Texifter, USA)

Abstract: In 2014, a 501 (c)(6) nonprofit organization, The Big Boulder Initiative (BBI), announced a draft “Code of Ethics and Standards for Social Data” posting it to the blog and thereby consigning it to almost certain obscurity. The document has been dormant ever since, and though @BBI goes on, the initial focus on creating and setting industry standards has given way to more practical matters related to running a yearly Big Boulder conference. This presentation will tell the story of setting up the Big Boulder Initiative and how the Code came about. It will highlight the opportunity to pick up where the founders the BBI left off discussing the need for practical shared standards, thereby challenging the specialist meeting to engage with industry leaders in a direct manner.

3. Strategies for Collecting, Processing, Analyzing, and Preserving Tweets from Large Newsworthy Events
Nick Ruest (York University, Canada)

Abstract: #WomensMarch, #Aleppo, #paris, #bataclan, #parisattacks, #porteouverte, #jesuischarlie, #jesuisahmed, #jesuisjuif, #charliehebdo, #panamanpapers, and #exln42 are all different hashtags, but they share several things in common. They are all large newsworthy events. They are datasets that each contain over a million tweets. Most importantly these collections raise some interesting insights in collecting, processing, analyzing, preserving large newsworthy events. Collecting tweets from these events can be challenging because of timing. Tweets can be collected from the Filter API and Search API. Both having their own caveats. The Filter API only captures the current Twitter stream, and is limited to collecting up to 1% of the overall Twitter stream. The Search API allows you to collect more than 1% of the overall Twitter stream, but one can only collect up to 18,000 every 15 minutes, and is limited to a 7 day window. Generally, using a strategy of using the Filter and Search API to capture a given event is the best.

DocNow’s twarc includes a number of utilities to process a dataset after collection. These tools allow a researcher, librarian, or archivist to filter their dataset(s) down to what is needed for appraisal, and then accession. Noteworthy tools include; deduplication, source, retweets, date/times, users, and hashtags. DocNow’s utilities can be further used to curate related collections. One can extract all the urls of a dataset, unshorten them, and extract the unique urls to use as a seed list for a web crawler to capture websites related to a given event. One can also extract all of the image urls, and download all images associated with a dataset, which then can be used for image analysis, presentation, and/or preservation.


4. Challenges and Techniques Associated with Social Bot Detection
Dhiraj Murthy
(The University of Texas at Austin, USA)

Abstract: Social media data has become increasingly easy to access at a big data scale. Many research projects are oriented around selecting relevant hashtags, geographical areas, or other selection criteria for the collection that yield substantially large data sets that are not possible for humans to evaluate robustly in terms of integrity and validity. Some automated methods can be deployed. In the case of Twitter, one can, for example check for no change in profile picture from the ‘egg’ or an anomalous tweeting pattern, but these can be relatively ‘fuzzy’ methods. Of particular interest to this talk is the question of the prevalence of social bots affecting collected social media data. As discussions around fake Twitter followers during the 2016 US presidential election became heated, social bots were thought to have profoundly affected information dissemination during the election. This question is particularly timely given former FBI Director James Comey’s recent comments that Russian bots targeted particular demographics.

More than ever, social researchers need to evaluate the roles social bots play in social media data. Specifically, we are generally not doing enough in terms of checking our social media data sets for effects caused by social bots. We tend to research and publish our work with little regard to the effect of social bots on our data sets. Therefore, when top tweets or the frequency of particular hashtags or the number of retweets are used as evidence to answer particular research questions, such basic metrics such as frequencies are easily skewed by social bots and other non-human actors. This talk does not seek to provide solutions to increase data integrity and validity, but rather serves as an intervention to raise awareness of the issue and outline some preliminary thoughts about how we might begin to outline new approaches towards detection, computational, qualitative and mixed, but also the theoretical implications of this –  including being more cognizant of the limitations of social media data.

Thank you to our organizers who collaborated to make this event possible:

Anatoliy Gruzd, Ryerson University
Jenna Jacobson, Ryerson University
Philip Mai, Ryerson University
Naomi Eichenlaub, Ryerson University
Dhiraj Murthy, University of Texas at Austin
Elizabeth Dubois, University of Ottawa
Priya Kumar, Ryerson University

Thank you to our participants who contributed to the success of the event

First NameLast NameAffiliation
AnabelQuan-HaaseWestern University
AnatoliyGruzdRyerson University Social Media Lab
AndreaZeffiroMcMaster University
AnnPegoraroLaurentian University
AnnetteMarkhamAarhus University
AvaLewUniversity of Toronto
BreeMcEwanDePaul University
BrendaMoonQueensland University of Technology
DhirajMurthyUniversity of Texas at Austin
DonnaSmithRyerson University
Elizabeth DuboisUniversity of Ottawa
EmadKhazraeeKent State University
HazelKwonArizona State Univ.
JacquelynBurkellWestern University
JaigrisHodsonRoyal Roads University
JeffHemsleySyracuse University
JeffreyBoaseUniversity of Toronto
JennaJacobsonRyerson University Social Media Lab
JonathanObarYork University
LinaGomez-VasquezUniversidad del Este
MelodieSongMcMaster University
NadiaConroyRyerson University Social Media Lab
Naomi EichenlaubRyerson University
NatalijaVlajicYork University
NicholasWorbyUniversity of Toronto Libraries
NickRuestYork University
PhilipMaiRyerson University Social Media Lab
PooriaMadaniYork University
PriyaKumarRyerson University Social Media Lab

2017 Social Media Data Stewardship Specialist Meeting: Recap
Tagged on:
Visit the COVID19MisInfo Portal - a rapid response project of the Ryerson University Social Media Lab.COVID19MisInfo.org