On the complexity of scientific literature. Just how complex is scientific literature compared to other writing? Quantifying the difficulty of a text is in itself difficult but an article in tomorrow's Nature talks about this issue.
Perhaps the best known metric for reading difficulty is the Flesch Reading Ease scale because that is used in Microsoft Word. However, it is primarily based on the lengths of words and sentences to rate the difficulty of a piece. Some writing may contain lots of long words but still be relatively simple and that is where the Flesch scale breaks down.
An alternative is Donald Hayes' LEX scale. It essentially rates a text based on how commonly used its words are. The text in question is rated by using a list of the most common 87,000 words in English. The proportion of words more common than some level of commonality is plotted as a graph against the level of commonality. This means we get a line that rises as we get to less common words. The slope of the line is a basic indicator of the complexity of a text. The steeper the slope, the easier the text (it contains a greater proportion of words that are common).
If we consider the baseline for measuring complexity as that of a newspaper, we can make some comparisons. We can even quantify this by comparing the area under the graph we drew from one text to another. If we set the complexity of a newspaper as the zero on our scale, we find that fiction for nine-year olds gets a rating of -32 (i.e. easire than a newspaper). A transcript of a farmer talking to his cows gets a -59. At the more complex end, papers in Science and Nature get an average rating of 30.
What is most startling is that in 1900, papers in Science and Nature had a rating of 0 - similar to the level of complexity of the New York Times.
As science has split into subdisciplines and new vocabularies have been invented for each, the number of common words in papers has dropped dramatically. Interestingly, the most difficult papers seem to be those in subdisciplines of biology, where entirely new sets of jargon have been needed. For example, Cell has a rating of around 40. Some of the easier-to-read papers are found in physical science journals. Physical Review D, devoted to particle physics and gravitation (one of the more esoteric and complex areas of physics) has a rating of 22.
Because of the increasing complexity of scientific writing, there are moves afoot to simplify language. Techniques include the use of more appropriate grammatical structures, and better editing and more practice. Some journals, such as Nature, place a lot of emphasis on crafting a strong and simple first paragraph - the journal's editors tend to have a significant influence on this paragraph.
Whether or not it will be possible to simplify scientific writing back to level 0 texts is unclear. It would seem very difficult to get ratings down close to newspapers, and I would guess that even popular science writing is rising in rating as the number of science-literate readers increases and the reading market specialises even further.
Still, clear writing is a noble goal and something that more journals should aim for.
Read the technical details of Hayes' LEX analysis at his webpage.
[David Harris: Science news]
12:47:47 AM
|