Generate A Word List
Another task that we may want to perform is creating a list of the words that appear in a text. If you simply want to list the words that appear in the first chapter of Frankenstein or the first book of Herodotus, you would use the command frank.words[,5] or hdt.words. Since many lemmas will be repeated in any text, you can generate a list of unique lemmas using the unique() command.
If you want to generate a frequency list for each of these lemmas, you can do this in the same way that we calculated the number of words that appeared in each segment of Books 9 - 12 of the Odyssey using the table command. Following the recipe from http://johnvictoranderson.org/?p=115, we can issue the command hdt.frq <- table(hdt.words) generate a table showing each word and its frequency. This table can be sorted using the sort command so that hdt.frq <- sort(hdt.frq, decreasing=TRUE). After this, the command hdt.frq[1:20] will show us the 20 most frequent lemmata in the first book of Herodotus.
| ὁ | ἵημι | εἰμί | ὅς | δέω2 |
| 2712 | 1770 | 1561 | 1527 | 1453 |
| δέω | δεῖ | δέομαι | εἰς | εἰ |
| 1451 | 1438 | 1437 | 1330 | 1299 |
| εἰ2 | αἴ | αἴ2 | δέ | καίω |
| 1299 | 1221 | 1221 | 1185 | 1178 |
| ἀκή | ἀκή2 | ἀκή3 | καί | καί2 |
| 1175 | 1175 | 1175 | 1175 | 1175 |
 
We can combine several of these commands so that frank.frq <- sort(table(frank.words[,5]), decreasing=TRUE) generates a list of the most frequent lemmata in the first chapter of Frankenstein.
<<-- Previous: Calculate Totals and Subtotals-->>
Next: Graphing Results: Bar Graphs and Pie Charts -->>