A course developed and taught by Indira Sen, Maximilian Kreutner, and Georg Ahnert
This course focuses on the collection and analysis of social media data for two types of societally relevant applications --- for studying the impact of social media on society as well as the use of social media data to learn about society. The course will introduce technical details on the development of social media platforms, programmatic and large-scale data collection of platform data, and the analysis of this data using computational methods. Students will be introduced to data collection from a variety of platforms including Wikipedia, Youtube, and Tiktok. This course will also help students critically reflect on the epistemology of social media data and the validity of its analysis.
| Week | Lecture | Readings | Tutorial |
|---|---|---|---|
| 1 | Course Intro + Potentials and Pitfalls of Social Media Data | Infrastructure setup | |
| 2 | Data Collection 1: Web Scraping | 1. Lazer, David MJ, et al. "Computational social science: Obstacles and opportunities." Science 369.6507 (2020): 1060-1062. 2. Gerard, Patrick, Nicholas Botzer, and Tim Weninger. "Truth social dataset." Proceedings of the international AAAI conference on web and social media. Vol. 17. 2023. |
Python Recap and Web Scraping |
| 3 | Data Collection 2: APIs | 1. David Garcia, "Background on APIs" 2. Murtfeldt, Ryan, et al. "RIP Twitter API: A eulogy to its vast research contributions." arXiv preprint arXiv:2404.07340 (2024). |
dynamic webpage scraping, API intro |
| 4 | Data Collection 3: Platform Affordances | 1. Hase, Valerie, Karin Boczek, and Michael Scharkow. "Adapting to affordances and audiences? A cross-platform, multi-modal analysis of the platformization of news on Facebook, Instagram, TikTok, and Twitter." Digital Journalism 11.8 (2023): 1499-1520. 2. Guinaudeau, Benjamin, Kevin Munger, and Fabio Votta. "Fifteen seconds of fame: TikTok and the supply side of social video." Computational communication research 4.2 (2022): 463-485. |
BlueSky API |
| 5 | Data Collection 4: Sampling Social Media Data + Project Information | 1. Bozarth, Lia, and Ceren Budak. "Keyword expansion techniques for mining social movement data on social media." EPJ Data Science 11.1 (2022): 30. 2. Chapter 5.4 "Components of a Research Project" Principles of Sociological Inquiry: Qualitative and Quantitative Methods, Saylor Academy, 2012" |
Youtube, Tiktok |
| 6 | Data Processing 1: Data Cleaning | 1. Sen, Indira, "Social Media Data Cleaning" 2. Bakhshi, Saeideh, David A. Shamma, and Eric Gilbert. "Faces engage us: Photos with faces attract more likes and comments on instagram." Proceedings of the SIGCHI conference on human factors in computing systems. 2014. |
data cleaning |
| 7 | Data Processing 2: Data Exploration | 1. Gallagher, Ryan. J., Morgan R. Frank, Lewis Mitchell, Aaron J. Schwartz, Andrew J. Reagan, Christopher M. Danforth, and Peter Sheridan Dodds. “Generalized Word Shift Graphs: A Method for Visualizing and Explaining Pairwise Comparisons Between Texts.” EPJ Data Science 10, no. 4 (2021). 2. Nizzoli, Leonardo, et al. "Coordinated behavior on social media in 2019 UK general election." Proceedings of the international AAAI conference on web and social media. Vol. 15. 2021. |
data vizualization and exploratory data analysis |
| 8 | Data Analysis 1: Text Analysis I | 1. Kevin Markham’s Introduction to Linear Regression 2. Mitra, Tanushree, and Eric Gilbert. "The language that gets people to give: Phrases that predict success on kickstarter." Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. 2014. |
infrastructure and project consulation |
| 9 | Project Pitches + background | stats revision + NLP basics | |
| 10 | Data Analysis 2: Text Analysis II | 1. Maria Antoniak’s “Topic Modeling for the People” 2. Dietz, Laura, et al. "Principles and Guidelines for the Use of LLM Judges." Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR). 2025." |
NLP Intermediate (topic modeling, transformers, LLMs) |
| 11 | Data Analysis 3: Causal Inference | 1. Munger, Kevin. "Tweetment effects on the tweeted: Experimentally reducing racist harassment." Political Behavior 39.3 (2017): 629-649. 2. Gligorić, Kristina, Ashton Anderson, and Robert West. "How constraints affect content: The case of Twitter’s switch from 140 to 280 characters." Proceedings of the International AAAI Conference on Web and Social Media. Vol. 12. No. 1. 2018. |
Causal Inference |
| 12 | Ethics, Reproducibility, and Documentation | 1. Kramer, Adam DI, Jamie E. Guillory, and Jeffrey T. Hancock. "Experimental evidence of massive-scale emotional contagion through social networks." Proceedings of the National Academy of Sciences 111.24 (2014): 8788-8790. 2. Hudson, James M., and Amy Bruckman. "“Go away”: Participant objections to being studied and the ethics of chatroom research." The information society 20.2 (2004): 127-139. 2. |
Documentation, project work + consulation |
| 13 | Project Discussion | project work + consulation | |
| 14 | Summary and Outlook | 1. Lazer, David MJ, et al. "Computational social science: Obstacles and opportunities." Science 369.6507 (2020): 1060-1062. 2. Sen, Indira, et al. "A total error framework for digital traces of human behavior on online platforms." Public Opinion Quarterly 85.S1 (2021): 399-422. |
Mid-term Project Presentations |
- A tutorial on collecting and analyzing Wikipedia data by Martin Gerlach, Lucie-Aimée Kaffee, and Tiziano Piccardi
- GESIS guides on digital trace data (which includes social media data)
- A tutorial on the Tiktok API by Lion Wedel
Some of the materials in this course is based on a series of Social Media Workshops Indira conducted with Prof. Katrin Weller. We're grateful to her for working on these materials!