Spring 2018 Annenberg School for Communication, University of Pennsylvania
Professor: Dr. Matt O'Donnell Email: [email protected]
In this hands-on course students will learn how to manage large textual datasets (e.g. Twitter, YouTube, news stories) to investigate research questions. They will work through a series of steps to collect, organize, analyze and present textual data by using automated tools toward a final project of relevant interest. The course will cover linguistic theory and techniques that can be applied to textual data (particularly from the fields of corpus linguistics and natural language processing).
No prior programming experience is required. Through this course students will gain skills writing Python programs to handle large amounts of textual data and become familiar with one of the key techniques used by data scientists, which is currently one of the most in-demand jobs.
-
This course will provide an introduction to Python programming for collecting, preparing and analyzing text data from various sources including social media (e.g. Twitter), weblogs, online news media and various publicly available archives (e.g. presidential speech achive, congressional sessions).
-
Each week one session will be in lecture and seminar form and provide background for the theory and techniques and the second will be a lab session in which students will work through programming exercises using Jupyter notebooks [A web-based programming environment well suited for data science and class-based assignments].
-
By completing this course students will:
- gain an understanding of relevant linguistic concepts for the analysis of text
- understand the field of corpus linguistics and how its concepts and tools can be applied to text analysis questions of relevance to Communication
- be exposed to a range of techniques from natural language processing and understand how they can be used to improve content analyses
- gain a basic level of programming proficiency in the Python programming language and have completed a number of programming exercises to build, clean and analysis corpora of text
- gain an understanding of relevant linguistic concepts for the analysis of text