Date: July 27, 2017 9:00-4:30 p.m.
Location: Ryerson University Student Learning Centre (SLC), Room 508 – 341 Yonge St, Toronto, ON M5B 1S1
IMPORTANT: SLC is redesigning the main entrance doors! There is construction on the main doors of the building. Members of the Social Media Lab will be at the steps of the SLC to guide you to the building from 8:45-9:15 a.m. If you arrive after this, then please follow the signs.
Following the first successful event on social media data stewardship at the 2016 iConference in Philadelphia, we are delighted to announce the second annual Specialist Meeting on Social Media Data Stewardship at Ryerson University in Toronto on July 27, 2017.
The goal of the event is to bring together experts in social media research and data librarians to discuss methods, ethics and policies specifically related to managing social media data by researchers and data librarians.
This year’s meeting focuses on the following areas:
- Ethics of academic use of social media data
- Industry social data code of ethics
- Preservation of social media data
- Social bot detection
|9:00-9:15||Registration & Coffee|
|9:15-9:45||Welcome & Overview of Social Media Data Stewardship |
Anatoliy Gruzd, Social Media Lab, Ryerson University
Madeleine Lefebvre, Ryerson University Library
|9:45-10:30||Introductions (“1-Minute Madness”) |
Participants will introduce their area of research and what they believe is the most pressing issue with regards to social media data stewardship
Topic 1: Ethics of Academic Use of Social Media Data
Annette Markham, Aarhus University, Denmark
An impact model of ethics is rooted in a larger consideration of ethics as methods whereby ethics emerge from everyday decisions that turn one’s gaze, design, approach, or project one way versus countless other possible ways. In this way, both ‘ethic’ and ‘method’ are broadened or even deconceptualized to include everyday habitual ways of knowing and making sense of the world around us. This presentation also emphasizes the importance of developing conceptual categories and vocabularies around ethical data stewardship that do not oversimplify what counts as data, social media, and other terms that emerge from everyday decision making by researchers and designers.
In the context of this workshop, this impact model of ethics is offered as a conversation starter for discussing baselines for what might be included in future models of ethical social media data stewardship.
Bio: Annette N. Markham, Ph.D.: Annette is Professor MSO of Information Studies & Digital Design at the Institute for Communication & Culture, Aarhus University, Denmark, and Affiliate Professor of Digital Ethics at the School of Communication, Loyola University, Chicago. Former chair of the Ethics Committee of the international Association of Internet Researchers, Annette has a long familiarity with regulatory models for ethical research practices in the arts, social, and human sciences. Her current research focuses on how qualitative methods can more aptly tuned toward making social change, which involves studying the future and taking a more proactive role in shaping rather than simply describing or explaining cultural and social phenomena. She Co-directs an interdisciplinary masters program in Digital Living at Aarhus University. Annette received her PhD in Organizational Communication and Interpretive Methodologies at Purdue University in 1998. She is well-recognized for her work in innovative methods and ethics for studying digitally-saturated social contexts. More information can be found at http://annettemarkham.com or http://futuremaking.space
Topic 2: Industry Social Data Code of Ethics
Stuart Shulman, Texifter, USA
Bio: Stuart Shulman is an entrepreneur and US Soccer National C licensed Olympic Development Program coach. He is the founder and CEO Texifter, as well as inventor of DiscoverText. From 2014-2015, he served as a Board Member and Treasurer for the Big Boulder Initiative.
Topic 3: Strategies for Collecting, Processing, Analyzing, and Preserving Tweets from Large Newsworthy Events
Nick Ruest, York University, Canada
#WomensMarch, #Aleppo, #paris, #bataclan, #parisattacks, #porteouverte, #jesuischarlie, #jesuisahmed, #jesuisjuif, #charliehebdo, #panamanpapers, and #exln42 are all different hashtags, but they share several things in common. They are all large newsworthy events. They are datasets that each contain over a million tweets. Most importantly these collections raise some interesting insights in collecting, processing, analyzing, preserving large newsworthy events. Collecting tweets from these events can be challenging because of timing. Tweets can be collected from the Filter API and Search API. Both having their own caveats. The Filter API only captures the current Twitter stream, and is limited to collecting up to 1% of the overall Twitter stream. The Search API allows you to collect more than 1% of the overall Twitter stream, but one can only collect up to 18,000 every 15 minutes, and is limited to a 7 day window. Generally, using a strategy of using the Filter and Search API to capture a given event is the best.
DocNow’s twarc includes a number of utilities to process a dataset after collection. These tools allow a researcher, librarian, or archivist to filter their dataset(s) down to what is needed for appraisal, and then accession. Noteworthy tools include; deduplication, source, retweets, date/times, users, and hashtags. DocNow’s utilities can be further used to curate related collections. One can extract all the urls of a dataset, unshorten them, and extract the unique urls to use as a seed list for a web crawler to capture websites related to a given event. One can also extract all of the image urls, and download all images associated with a dataset, which then can be used for image analysis, presentation, and/or preservation.
Bio: Nick Ruest is the Digital Assets Librarian at York University, and co-Principal Investigator of the SSHRC grant “A Longitudinal Analysis of the Canadian World Wide Web as a Historical Resource, 1996-2014”, and co-principal investigator of the Compute Canada Research Platforms and Portals Web Archives for Longitudinal Knowledge application. At York University, he oversees the development of data curation, asset management and preservation initiatives, along with creating and implementing systems that support the capture, description, delivery, and preservation of digital objects having significant content of enduring value. He is also active in the Islandora and Fedora communities, serving as Project Director for the Islandora CLAW project, member of the Islandora Foundation’s Roadmap Committee and Board of Directors, and contributes code to the project. In the past he has served as the Release Manager for Islandora and Fedora, the moderator for the OCUL Digital Curation Community, the President of the Ontario Library and Technology Association, and President of McMaster University Academic Librarians’ Association.
Topic 4: Challenges and Techniques associated with Social Bot Detection
Dhiraj Murthy, The University of Texas at Austin, USA
Social media data has become increasingly easy to access at a big data scale. Many research projects are oriented around selecting relevant hashtags, geographical areas, or other selection criteria for the collection that yield substantially large data sets that are not possible for humans to evaluate robustly in terms of integrity and validity. Some automated methods can be deployed. In the case of Twitter, one can, for example check for no change in profile picture from the ‘egg’ or an anomalous tweeting pattern, but these can be relatively ‘fuzzy’ methods. Of particular interest to this talk is the question of the prevalence of social bots affecting collected social media data. As discussions around fake Twitter followers during the 2016 US presidential election became heated, social bots were thought to have profoundly affected information dissemination during the election. This question is particularly timely given former FBI Director James Comey’s recent comments that Russian bots targeted particular demographics.
More than ever, social researchers need to evaluate the roles social bots play in social media data. Specifically, we are generally not doing enough in terms of checking our social media data sets for effects caused by social bots. We tend to research and publish our work with little regard to the effect of social bots on our data sets. Therefore, when top tweets or the frequency of particular hashtags or the number of retweets are used as evidence to answer particular research questions, such basic metrics such as frequencies are easily skewed by social bots and other non-human actors. This talk does not seek to provide solutions to increase data integrity and validity, but rather serves as an intervention to raise awareness of the issue and outline some preliminary thoughts about how we might begin to outline new approaches towards detection, computational, qualitative and mixed, but also the theoretical implications of this – including being more cognizant of the limitations of social media data.
Bio: Dhiraj Murthy is an Associate Professor in the School of Journalism and the Department of Sociology (by courtesy) at the University of Texas at Austin. He earned his Ph.D. in Sociology from University of Cambridge and was previously a Reader (Professor) of Sociology at Goldsmiths, University of London. His research explores social media, virtual organizations, virtual teams, digital research methods, race/ethnicity, and big data quantitative analysis. This work has been situated in a variety of contexts including health, disasters, journalism, and organizations. He has authored over 40 articles, book chapters, and papers and a book about Twitter, the first on the subject (published by Polity Press). His work on social networking technologies in virtual organization breeding grounds was funded by the National Science Foundation’s Office of CyberInfrastructure and resulted in two edited journal issues and the Collaborative Organizations & Social Media conference. Dhiraj’s work also uniquely explores the potential role of social technologies in diversity and community inclusion. Dhiraj founded and directs the Computational Media Lab. He recently chaired the Social Media, Activism, and Organisations (#SMAO15) conference and co-chaired the 2016 International Conference on Social Media & Society.
|12:15-1:30||Lunch & Breakout Sessions|
Over lunch, participants will join their pre-assigned discussion groups in one of the four topics. Each group will discuss and complete the Information Sheets that will address the What, When, Why, and How of their discussion topic.
|1:30-2:30||Breakout Sessions Continue |
Each group will summarize their discussion and prepare a short presentation to share with the entire group.
|3:00-4:30||Group Presentations & Reflection|
The Information Sheets prepared will be shared on the https://socialmediadata.org website as a reference tool for social media researchers. Each Information Sheet will feature a list of contributors.
- Anatoliy Gruzd, Ryerson University
- Jenna Jacobson, University of Toronto
- Philip Mai, Ryerson University
- Naomi Eichenlaub, Ryerson University
- Dhiraj Murthy, University of Texas at Austin
- Elizabeth Dubois, University of Ottawa
- Priya Kumar, Ryerson University
The event is in part supported by the Canada Research Chair Award (PI:Anatoliy Gruzd).
|First Name||Last Name||Affiliation|
|Anatoliy||Gruzd||Ryerson University Social Media Lab|
|Ava||Lew||University of Toronto|
|Brenda||Moon||Queensland University of Technology|
|Dhiraj||Murthy||University of Texas at Austin|
|Elizabeth||Dubois||University of Ottawa|
|Emad||Khazraee||Kent State University|
|Hazel||Kwon||Arizona State Univ.|
|Jaigris||Hodson||Royal Roads University|
|Jeffrey||Boase||University of Toronto|
|Jenna||Jacobson||Ryerson University Social Media Lab|
|Lina||Gomez-Vasquez||Universidad del Este|
|Nadia||Conroy||Ryerson University Social Media Lab|
|Nicholas||Worby||University of Toronto Libraries|
|Philip||Mai||Ryerson University Social Media Lab|
|Priya||Kumar||Ryerson University Social Media Lab|