# Introduction

R is a powerful programing language for statistical analysis and visualization that can be broadly used for many applications in the digital humanities. As with any programming language, getting started with R involves a steep initial learning curve in order to produce useful results. In its current form, this blog contains the notes from a hands-on workshop that I initially ran at the University of Kansas's Digital Humanities Forum/THATCamp *Representing Knowledge in the Digital Humanities* in September of 2011 and expanded with a more literary focus at the (*University of Kansas 2012 Digital Humanities Forum*). It was further revised for an additional workshop at the University of Iowa Oberman Center for Advanced Study in the fall of 2014. The purpose of these two workshops was to introduce the R environment, describe data structures in R, ways to format data about literary texts for statistical analysis, and provide practical examples of ways to use R to answer questions about literature.

The examples are based on three different data sets. The first table shows with data about ancient Greek tragedy with columns for the author, the title, and the year in which each drama was written, another table contains the lemmatized form of every English word in books 9 – 12 of Homer’s *Odyssey*, a third table contains the lemmatized lemmatized form of every Greek word in the first chapter of Mary Shelly’s *Frankenstein*, and a final table contains the lemmatized form of every Greek word in Herodotus’ *History*. The examples and case studies I hope to explore will be drawn from Anthony Kenny's *The Computation of Style: An Introduction to Statistics for Students of Literature and the Humanities*, J.F. Burrows' *Computation into Criticism: A Study of Jane Austen's Novels*, Douglas Biber's *Corpus Linguistics: Investigating Language Structure and Use*, and R.H. Baayen's *Analyzing Linguistic Data: A Practical Introduction to Statistics Using R*, Matt Jocker's *Text Analysis with R for Students of Literature* and Stefan Gries' *Statistics for Linguistics with R* and *Quantitative Corpus Linguistics with R*.