Date: July 27, 2017 9:00-4:30 p.m.

Location: Ryerson University Student Learning Centre (SLC), Room 508 – 341 Yonge St, Toronto, ON M5B 1S1

IMPORTANT: SLC is redesigning the main entrance doors! There is construction on the main doors of the building. Members of the Social Media Lab will be at the steps of the SLC to guide you to the building from 8:45-9:15 a.m. If you arrive after this, then please follow the signs.

Following the first successful event on social media data stewardship at the 2016 iConference in Philadelphia, we are delighted to announce the second annual Specialist Meeting on Social Media Data Stewardship at Ryerson University in Toronto on July 27, 2017.

The goal of the event is to bring together experts in social media research and data librarians to discuss methods, ethics and policies specifically related to managing social media data by researchers and data librarians.

This year’s meeting focuses on the following areas:

  1. Ethics of academic use of social media data
  2. Industry social data code of ethics
  1. Preservation of social media data 
  1. Social bot detection


9:00-9:15Registration & Coffee
9:15-9:45Welcome & Overview of Social Media Data Stewardship


Anatoliy Gruzd, Social Media Lab, Ryerson University

Madeleine Lefebvre, Ryerson University Library

9:45-10:30Introductions (“1-Minute Madness”)


Participants will introduce their area of research and what they believe is the most pressing issue with regards to social media data stewardship

10:45-12:15Invited Presentations


Topic 1: Ethics of Academic Use of Social Media Data

Annette Markham, Aarhus University, Denmark

For many people–academics, designers, and policymakers alike, “ethics” is a term laden with vague philosophical and strict regulatory baggage. Rather than embrace the concept as a guiding force for research or technological design, many see it only as a hoop to jump through. In this talk, Annette Markham presents an impact model of ethics that emphasizes the need to consider the outcomes of actions at various stages, time points, and levels. She explains how this model usefully shifts ethics related discussions from a priori to ex post facto, which in turn may nurture more future-oriented, socially accountable stances toward tech design, academic research methods, data stewardship practices, and regulatory ambitions.

An impact model of ethics is rooted in a larger consideration of ethics as methods whereby ethics emerge from everyday decisions that turn one’s gaze, design, approach, or project one way versus countless other possible ways. In this way, both ‘ethic’ and ‘method’ are broadened or even deconceptualized to include everyday habitual ways of knowing and making sense of the world around us. This presentation also emphasizes the importance of developing conceptual categories and vocabularies around ethical data stewardship that do not oversimplify what counts as data, social media, and other terms that emerge from everyday decision making by researchers and designers.

In the context of this workshop, this impact model of ethics is offered as a conversation starter for discussing baselines for what might be included in future models of ethical social media data stewardship.

Bio: Annette N. Markham, Ph.D.: Annette is Professor MSO of Information Studies & Digital Design at the Institute for Communication & Culture, Aarhus University, Denmark, and Affiliate Professor of Digital Ethics at the School of Communication, Loyola University, Chicago. Former chair of the Ethics Committee of the international Association of Internet Researchers, Annette has a long familiarity with regulatory models for ethical research practices in the arts, social, and human sciences. Her current research focuses on how qualitative methods can more aptly tuned toward making social change, which involves studying the future and taking a more proactive role in shaping rather than simply describing or explaining cultural and social phenomena. She Co-directs an interdisciplinary masters program in Digital Living at Aarhus University. Annette received her PhD in Organizational Communication and Interpretive Methodologies at Purdue University in 1998. She is well-recognized for her work in innovative methods and ethics for studying digitally-saturated social contexts. More information can be found at or

Topic 2: Industry Social Data Code of Ethics

Stuart Shulman, Texifter, USA

In 2014, a 501 (c)(6) nonprofit organization, The Big Boulder Initiative (BBI), announced a draft “Code of Ethics and Standards for Social Data” posting it to the blog and thereby consigning it to almost certain obscurity. The document has been dormant ever since, and though @BBI goes on, the initial focus on creating and setting industry standards has given way to more practical matters related to running a yearly Big Boulder conference. This presentation will tell the story of setting up the Big Boulder Initiative and how the Code came about. It will highlight the opportunity to pick up where the founders the BBI left off discussing the need for practical shared standards, thereby challenging the specialist meeting to engage with industry leaders in a direct manner.

Bio: Stuart Shulman is an entrepreneur and US Soccer National C licensed Olympic Development Program coach. He is the founder and CEO Texifter, as well as inventor of DiscoverText. From 2014-2015, he served as a Board Member and Treasurer for the Big Boulder Initiative.

Topic 3: Strategies for Collecting, Processing, Analyzing, and Preserving Tweets from Large Newsworthy Events

Nick Ruest, York University, Canada


#WomensMarch, #Aleppo, #paris, #bataclan, #parisattacks, #porteouverte, #jesuischarlie, #jesuisahmed, #jesuisjuif, #charliehebdo, #panamanpapers, and #exln42 are all different hashtags, but they share several things in common. They are all large newsworthy events. They are datasets that each contain over a million tweets. Most importantly these collections raise some interesting insights in collecting, processing, analyzing, preserving large newsworthy events. Collecting tweets from these events can be challenging because of timing. Tweets can be collected from the Filter API and Search API. Both having their own caveats. The Filter API only captures the current Twitter stream, and is limited to collecting up to 1% of the overall Twitter stream. The Search API allows you to collect more than 1% of the overall Twitter stream, but one can only collect up to 18,000 every 15 minutes, and is limited to a 7 day window. Generally, using a strategy of using the Filter and Search API to capture a given event is the best.

DocNow’s twarc includes a number of utilities to process a dataset after collection. These tools allow a researcher, librarian, or archivist to filter their dataset(s) down to what is needed for appraisal, and then accession. Noteworthy tools include; deduplication, source, retweets, date/times, users, and hashtags. DocNow’s utilities can be further used to curate related collections. One can extract all the urls of a dataset, unshorten them, and extract the unique urls to use as a seed list for a web crawler to capture websites related to a given event. One can also extract all of the image urls, and download all images associated with a dataset, which then can be used for image analysis, presentation, and/or preservation.

Bio: Nick Ruest is the Digital Assets Librarian at York University, and co-Principal Investigator of the SSHRC grant “A Longitudinal Analysis of the Canadian World Wide Web as a Historical Resource, 1996-2014”, and co-principal investigator of the Compute Canada Research Platforms and Portals Web Archives for Longitudinal Knowledge application. At York University, he oversees the development of data curation, asset management and preservation initiatives, along with creating and implementing systems that support the capture, description, delivery, and preservation of digital objects having significant content of enduring value. He is also active in the Islandora and Fedora communities, serving as Project Director for the Islandora CLAW project, member of the Islandora Foundation’s Roadmap Committee and Board of Directors, and contributes code to the project. In the past he has served as the Release Manager for Islandora and Fedora, the moderator for the OCUL Digital Curation Community, the President of the Ontario Library and Technology Association, and President of McMaster University Academic Librarians’ Association.

Topic 4: Challenges and Techniques associated with Social Bot Detection

Dhiraj Murthy, The University of Texas at Austin, USA


Social media data has become increasingly easy to access at a big data scale. Many research projects are oriented around selecting relevant hashtags, geographical areas, or other selection criteria for the collection that yield substantially large data sets that are not possible for humans to evaluate robustly in terms of integrity and validity. Some automated methods can be deployed. In the case of Twitter, one can, for example check for no change in profile picture from the ‘egg’ or an anomalous tweeting pattern, but these can be relatively ‘fuzzy’ methods. Of particular interest to this talk is the question of the prevalence of social bots affecting collected social media data. As discussions around fake Twitter followers during the 2016 US presidential election became heated, social bots were thought to have profoundly affected information dissemination during the election. This question is particularly timely given former FBI Director James Comey’s recent comments that Russian bots targeted particular demographics.

More than ever, social researchers need to evaluate the roles social bots play in social media data. Specifically, we are generally not doing enough in terms of checking our social media data sets for effects caused by social bots. We tend to research and publish our work with little regard to the effect of social bots on our data sets. Therefore, when top tweets or the frequency of particular hashtags or the number of retweets are used as evidence to answer particular research questions, such basic metrics such as frequencies are easily skewed by social bots and other non-human actors. This talk does not seek to provide solutions to increase data integrity and validity, but rather serves as an intervention to raise awareness of the issue and outline some preliminary thoughts about how we might begin to outline new approaches towards detection, computational, qualitative and mixed, but also the theoretical implications of this –  including being more cognizant of the limitations of social media data.

Bio: Dhiraj Murthy is an Associate Professor in the School of Journalism and the Department of Sociology (by courtesy) at the University of Texas at Austin. He earned his Ph.D. in Sociology from University of Cambridge and was previously a Reader (Professor) of Sociology at Goldsmiths, University of London. His research explores social media, virtual organizations, virtual teams, digital research methods, race/ethnicity, and big data quantitative analysis. This work has been situated in a variety of contexts including health, disasters, journalism, and organizations. He has authored over 40 articles, book chapters, and papers and a book about Twitter, the first on the subject (published by Polity Press). His work on social networking technologies in virtual organization breeding grounds was funded by the National Science Foundation’s Office of CyberInfrastructure and resulted in two edited journal issues and the Collaborative Organizations & Social Media conference. Dhiraj’s work also uniquely explores the potential role of social technologies in diversity and community inclusion. Dhiraj founded and directs the Computational Media Lab. He recently chaired the Social Media, Activism, and Organisations (#SMAO15) conference and co-chaired the 2016 International Conference on Social Media & Society.

12:15-1:30Lunch & Breakout Sessions


Over lunch, participants will join their pre-assigned discussion groups in one of the four topics. Each group will discuss and complete the Information Sheets that will address the What, When, Why, and How of their discussion topic.

  1. Ethics of academic use of social media data (Lead: Annette Markham, Moderator: Jenna Jacobson)
  2. Industry social data code of ethics (Lead: Stuart Shulman, Moderator: Priya Kumar)
  3. Preservation of social media data (Lead: Anatoliy Gruzd, Moderator: Naomi Eichenlaub)
  4. Social bot detection (Lead: Dhiraj Murthy, Moderator: Elizabeth Dubois)
1:30-2:30Breakout Sessions Continue


Each group will summarize their discussion and prepare a short presentation to share with the entire group.

2:30-3:00Coffee Break
3:00-4:30Group Presentations & Reflection


The Information Sheets prepared will be shared on the website as a reference tool for social media researchers. Each Information Sheet will feature a list of contributors.


Organizing Committee:

  • Anatoliy Gruzd, Ryerson University
  • Jenna Jacobson, University of Toronto
  • Philip Mai, Ryerson University
  • Naomi Eichenlaub, Ryerson University
  • Dhiraj Murthy, University of Texas at Austin
  • Elizabeth Dubois, University of Ottawa
  • Priya Kumar, Ryerson University


The event is in part supported by the Canada Research Chair Award (PI:Anatoliy Gruzd).


First NameLast NameAffiliation
AnabelQuan-HaaseWestern University
AnatoliyGruzdRyerson University Social Media Lab
AndreaZeffiroMcMaster University
AnnPegoraroLaurentian University
AnnetteMarkhamAarhus University
AvaLewUniversity of Toronto
BreeMcEwanDePaul University
BrendaMoonQueensland University of Technology
DhirajMurthyUniversity of Texas at Austin
DonnaSmithRyerson University
Elizabeth DuboisUniversity of Ottawa
EmadKhazraeeKent State University
HazelKwonArizona State Univ.
JacquelynBurkellWestern University
JaigrisHodsonRoyal Roads University
JeffHemsleySyracuse University
JeffreyBoaseUniversity of Toronto
JennaJacobsonRyerson University Social Media Lab
JonathanObarYork University
LinaGomez-VasquezUniversidad del Este
MelodieSongMcMaster University
NadiaConroyRyerson University Social Media Lab
Naomi EichenlaubRyerson University
NatalijaVlajicYork University
NicholasWorbyUniversity of Toronto Libraries
NickRuestYork University
PhilipMaiRyerson University Social Media Lab
PooriaMadaniYork University
PriyaKumarRyerson University Social Media Lab
2017 Specialist Meeting on Social Media Data Stewardship
Tagged on:
Visit the COVID19MisInfo Portal - a rapid response project of the Ryerson University Social Media