Using Data Frames

Now that the data has been loaded into the R tables called trag.length and frank.ch1 along with a vector called hdt.bk1, you can begin to manipulate and perform operations on them. With a table such as our samples, this is done by performing operations on complete columns or subsets of the columns in your table. We have already seen how the contents of the table can be inspected by using typing the name of the variable trag.length and pressing return. This causes the entire table to be displayed in the R environment.

Subsets of these tables can be accessed in several ways. In the table, each individual cell can be referenced by its row and column number. If, for example, you type trag.length[1, 1] and press return, the R environment will display the word "Tragedy" because this is the data found in row one column one of the table. Entire rows and columns can be displayed by omitting either the row or column designator within the square brackets. Entering trag.length[1, ], for example, outputs the entire first row of our sample table.

GenreAuthorPlayYearWord.Count
1TragedyEuripidesCyclops4384104
Similarly, typing the command frank.words[3,] gives us the first word of Chapter 1 in Frankenstein.

WordSpellingPOSStandardLemmawn
3IIpns11Ii0

Entering the command trag.length[,5] outputs the entire fifth column of our table, namely the word length of each tragedy.

[1] 4104 4939 5115 5189 5297 5426 5447 6240 6603 7077 7177 7279 7363 7398 7597 7672 7902 7914 8032 8157 8187 8396 8702 8830 [25] 9240 9280 9430 9879 9927 10030 10385

When we created our sample table in Excel, however, we also added header rows that labeled the information found in each column. These headers can also be used to facilitate access to columnar data. Rather than needing to remember that the word count for each tragedy is stored in column 5 of our table, we can also enter trag.length[ ,'Word.Count'] and see the same output.

If you have forgotten the names you gave the rows in your table, it is possible to access them using the command colnames(trag.length).

It is also possible to access subsets of the data contained in a table by row or column number and also according to search criteria that you specify. If, for example, you would like to inspect the data in the first five rows of your table (perhaps to make sure it loaded in correctly), you can enter the command trag.length[1:5, ] and get the output:

GenreAuthorPlayYearWord.Count
1TragedyEuripidesCyclops4384104
2TragedyAeschylusThe Suppliants4634939
3TragedyAeschylusThe Seven Against Thebes4675115
4TragedyAeschylusThe Persians4725189
5TragedyAeschylusEumenides4585297

Likewise, if you want to look at just the plays and their approximate years of publication, you can do so with the command trag.length[,3:4].

Finally, you can access data in your table based on search criteria that you specify. For example, you can inspect the data for all the plays with Sophocles as the author with the command trag.length[trag.length$Author == "Aeschylus", ]

As we will see in the next section, these slices of your data can be either assigned to a variable using the <- operator or used in conjunction with the mathematical functions in R to perform calculations on your data.

<<-- Previous: Preparing Literary Data
Next: Analyzing Literary Data-->>