This is a study that demands some attention, in my opinion. Dr. Jockers, a Consulting Assistant Professor in the English Dept. at Stanford, has excellent experience in applying digital tools to the humanities. Daniela M. Witten is a Ph.D. student in statistics at Standford. Craig S. Criddle, the primary inspiration for the study, is a former Mormon who has long advocated the theory that Sidney Rigdon and Solomon Spalding were the real authors of the Book of Mormon. He maintained this in spite of extensive problems with the theory that has led some critics of the Book of Mormon to abandon this theory as hopeless. However, he is openly aware of his bias and I believe has sincerely tried to ensure that the number crunching and analysis was done fairly by others. That's healthy.
Mr. Criddle has put the Rigdon/Spalding theory to the test -- at least in his view -- by comparing word usage in Book of Mormon chapters to a handful of modern authors to see which modern author's style is closest to those in the Book of Mormon. This is the critical gap that seems to have confused a few people about this study. It is not a statistically valid test measuring the probability that either Solomon Spaulding or Sidney Rigdon is the author of a particular chapter, but a test that simply determines which of a tiny handful of authors (Joseph Smith excluded!) comes closest in style to each chapter of the Book of Mormon.
Two somewhat related techniques are used to compare word frequency characteristics: the "delta" method and the "nearest shrunken centroids" method (sometimes affectionately abbreviated as the "shrunken 'roids" model). These methods look at the frequency of simple non-contextual words (words like "of", "the", and "and" whose usage typically doesn't vary strongly as a function of the topic or context of writing) and compare stats from chapters in the Book of Mormon to characteristics obtained for text from each of the candidate authors. At first glance, I was astonished to see individual chapters of the Book of Mormon being assigned authorship with high probabilities. You can't do that with such a small sample - but it was my mistake, for again, they are simply ranking each of the 7 candidates relative to that passage and seeing who comes closest. You can do that for a single verse or sentence, if you wanted, and always have a winner -- but not necessarily a meaningful result.
The candidates in question DO NOT include Joseph Smith. The authors argue that he relied too much on scribes and that we don't have a large amount of text that we know came from him. In my opinion, it seems that they aren't even trying. Why not at least use some of his handwritten letters, and some of the early versions of his revelations in the Doctrine & Covenants? Why not try? I have to wonder if it is because they have already ruled out Joseph based on their preconceived notions, which are key to the Spalding theory.
While wordprint studies have sometimes been likened to fingerprints, with the ability to detect unique aspects of a writer's style, the reality of statistical analysis of human text makes it difficult to have sufficient power to uniquely identify authorship, especially for small chunks of text like an individual chapter in the Book of Mormon. Larger blocks are typically required, and great care must be taken to account for complicating factors and the high variability that occurs in writing. For her master's degree thesis in statistics, my wife, Kendra Lindsay, used computational wordprint tools to assess authorship of several Pauline epistles using Greek texts. It was difficult work leaving several areas unresolved, but raised the issue that large differences in style among the Pauline works might be due to multiple authorship in some cases or other influences.
She has not yet had time to review the new Book of Mormon study, and I'm not sure she is all that interested (she'd want to redo analyses, etc.), so for now you're just getting my superficial non-expert views.
Being the "winner" among a pool of seven is a far cry from being proven to be the actual culprit. In the case at hand, the authors have a pool of 7 where victory can be proclaimed if the results point to either of two authors (well, four if we include Oliver Cowdery and Isaiah/Malachi) gaining the first or second place slot. For the 239 chapters of the Book of Mormon, the NSC ("shrunken 'roids") method assigns 1st place Cowdery 20 times, Pratt 9 times, Rigdon 93 times, Spalding 52 times, Isaiah/Malachi 63 times, Barlow 0 times, and Longfellow 2 times. The delta method gives Cowdery 5 chapters, Pratt 7, Ridgon 63, Spalding 47, Isaiah/Malachi 112, Barlow 0, and Longfellow 5. In fact, Isaiah and Malachi are assigned to far more chapters than I would expect, but this is downplayed. 21 of 22 chapters taken from Isaiah or Malachi are properly assigned to Isaiah and Malachi, which is nice (Isaiah 53, quoted by Abinadi, was assigned to Longfellow). But I believe the Isaiah/Malachi control text included the chapters that were being compared to it; if so, it's not terribly impressive.
What would happen if we applied a similar approach to solving a crime? Someone has been shot. Evidence from witnesses and crime scene data suggests that the killer may have been male, with dark hair, over 5 feet tall, had a scar on his cheek, and drove a red Honda. The police round up 7 people from the area who could be suspects, including two suspects they don't like and think could have done it. The two suspects are males with dark hair who were in the vicinity of the crime. The prosecuting attorney brings in a college professor who did a study comparing the evidence to the seven people, a pool that includes the two suspects plus three short female blondes, one short male blonde, and a short female redhead. None drive a Honda; none have scars on their cheek. The professor explains that careful testing and analysis has confirmed that the two suspects match the evidence far better than any of the other suspects. He goes through the Digital Hair Color Test, complete with detailed image analysis and computation color assessment. The two suspects are pegged with 99% probability. Then there is the Laser Height Test. A laser altimeter is user to determine that the two suspects have been selected as the best fits in terms of height with 99% probability. Then we have the Biomolecular Gender Test. Advanced DNA testing is used, and three of the candidates are selected with over 85% probability (some uncertainty arises from an extra chromosome), and sure enough, both of the main suspects are in this group of three males. Putting everything together, there is an overwhelming probability that the two suspects are the murderers, the jury is told. Should they convict?
In the Criddle study (and it deserves to be called the Criddle study, for he is the driving force for this work, for its assumptions, and for the hypothesis being tested), every chapter is going to have a winner, using this technique. Any randomly selected group of candidates can result in claims that at least one of them is the "guilty party" with this methodology. No matter how dissimilar the styles, how far apart the two texts may be, the method forces each chapter to be assigned to one of the 7 candidates as the best fit. It is entirely possible that Jeff Lindsay could be assigned as the author of numerous chapters of the Book of Mormon if my works were thrown into the mix, using the approach in the Criddle study. My own works could be assigned to Sidney Rigdon as well if there were compared to him and several other more dissimilar authors, with me being left out of the mix. What would that prove?
Surely Mr. Criddle and his friends must understand that this work is guaranteed to pick a winner every time from one of the seven candidates, regardless of how close their style actually is to the Book of Mormon, and that great caution must thus be exercised in drawing conclusions about actual authorship just because a dominant winner is found. Rigdon, Spalding, and Cowdery have styles that are closer to the Book of Mormon than, say, the poetry of Joel Barlow or Longfellow -- but this says little about the true authorship of the Book of Mormon (though perhaps it helps us rule out Barlow and Longfellow).
Unfortunately, awareness of the limitations of this study is not keenly evident in the confident conclusions found in the article. Consider the abstract:
Mormon prophet Joseph Smith (1805--44) claimed that more than two-dozen ancient individuals (Nephi, Mormon, Alma, etc.) living from around 2200 BC to 421 AD authored the Book of Mormon (1830), and that he translated their inscriptions into English. Later researchers who analyzed selections from the Book of Mormon concluded that differences between selections supported Smith's claim of multiple authorship and ancient origins.
We offer a new approach that employs two classification techniques: ‘delta' commonly used to determine probable authorship and ‘nearest shrunken centroid' (NSC), a more generally applicable classifier. We use both methods to determine, on a chapter-by-chapter basis, the probability that each of seven potential authors wrote or contributed to the Book of Mormon. Five of the seven have known or alleged connections to the Book of Mormon, two do not, and were added as controls based on their thematic, linguistic, and historical similarity to the Book of Mormon.
Our results indicate that likely nineteenth century contributors were Solomon Spalding, a writer of historical fantasies; Sidney Rigdon, an eloquent but perhaps unstable preacher; and Oliver Cowdery, a schoolteacher with editing experience. Our findings support the hypothesis that Rigdon was the main architect of the Book of Mormon and are consistent with historical evidence suggesting that he fabricated the book by adding theology to the unpublished writings of Spalding (then deceased).
Wow. I want to be tactful here, for I appreciate the efforts put forth to understand the Book of Mormon in this study. But if I understand what has been done, this study does not determine the probability that any of the potential candidates had anything to do with the Book of Mormon. It determines the probability that one candidate is closer to some metrics of Book of Mormon style than another candidate from an extremely limited pool that excludes the most likely modern candidate, Joseph Smith (though adding him might not have made any difference). But saying that Sidney Rigdon is closer to the style of, say, 2 Nephi 10, than Orson Pratt or Henry Longfellow tells us nothing about who wrote 2 Nephi 10. Unwittingly, the nature of this study may make it, in retrospect, inherently rigged for Rigdon/Spaulding/Cowdery. Maybe Ridgon + Spalding would have been the best fit even if hundreds of other possibilities had been tested, but that remains to be seen (actually, the wordprint work of Hilton et al. has already raised serious and highly credible questions challenging Spalding as a potential author of the Book of Mormon).
One positive aspect of this study: the results are consistent with the concept of multiple authorship. Some chapters are assigned to Spalding, some to Ridgon, and some to Cowdery, though it's possible that none of these authors have styles close enough to the Book of Mormon to be a genuine candidate for authorship with more confidence than the hypothesis of multiple ancient authors with different styles translated by a single modern author in a way that allowed some subtle non-contextual stylistic differences to persist. If we can learn anything from the work, it may be that one source or one author alone may not reasonably account for the differences in style. Of course, that's something some Mormons - including authors of previous wordprint studies - have been saying for quite a while.
A couple minor nitpicks. I opened up the 40+MB text file with data from all the authors and Book of Mormon. Did a couple of quick checks to see if there were any major problems - didn't see any. I picked a few chapters of the Book of Mormon and looked to make sure some unique words were represented. Out of about 15 tries, I found only 1 problem. The word "rehearsed" in 2 Nephi 1:1 has been missed. The column for "rehearsed" (and for all related forms of the word) show 0 counts in 2 Nephi 1. I think it's worthwhile to do some further checking of data integrity.
I would also suggest that the computerized text needs to be cleaned up. Some of the words have been "fouled" by punctuation. For example, "--wherefore" shows up as a distinct word, in addition to "wherefore". Only a couple hundred words in the entries for the Book of Mormon fall into these "fouled" categories in the initial columns of the spreadsheet, but it suggests that the computerized data perhaps weren't scrutinized and corrected. It's possible that they might make a difference somewhere, but I would expect it to be minor.
This is my first response to this work and will require updating as I get more time. I may be wrong on several counts and will strive to correct any mistakes as I learn more. As always, do your own homework and don't rely on error-prone amateurs like me. Fortunately, errors in these matters aren't unique to Mormon apologists, and sometimes creep into the works of our critics, no matter how sincere and careful they have tried to be in advancing alternate theories for Book of Mormon authorship.
Update: Blair Dee Hodges provides an excellent discussion of the wordprint study on his blog, Life on Gold Plates. I've just added this blog to my blogroll to the right. Blair offers detailed, well researched and carefully written posts on meaningful topics. He has done a tremendous service with his notes from the 2008 FAIR Conference, for example. I was also pleased and enlightened with his treatment of the rumor about Joseph Smith allegedly saying the telestial kingdom was so wonderful that people would be tempted to commit suicide to get there if they saw it. Thanks, Blair!