Scientific texts are commonly known to be hard to read. This may be for a number of reasons, such as the fact that they contain a large number of nominalizations, they often use complicated jargon, and sometimes even assume reader knowledge. In order to find ways to make scientific texts clearer, it may be worth to better understand the characteristics of this register. In this thesis, I focus on one aspect of scientific texts: their frequent use of a structure I refer to as complex nominal compounds. Nominal compounds (structures such as "research paper", "linguistic analysis paper") are composed of a head noun ("paper") and one or more modifiers ("research", "linguistic", "analysis"), which can be either nouns or adjectives. Complex nominal compounds are nominal compounds made up of three or more words. Despite decades of research on nominal compounds (complex and otherwise), little is known about how complex nominal compounds are processed, or about how they are used in scientific papers. In this thesis, I take steps towards filling this gap. In my attempt to do so, I recruit two frameworks of language processing and use, through which I produce predictions for the experiments I report here: the Entropy Rate Constancy (ERC) Principle and the Uniform Information Density (UID) Hypothesis. Of particular relevance, not much attention has been given to the UID Hypothesis from the point of view of comprehension. These experiments, therefore, not only contribute towards the understanding of nominal compounds, but also test the validity of these frameworks in general, and of the UID Hypothesis from the perspective of comprehension in particular. On the processing front, the thesis analyzes the L1 and L2 processing of complex nominal compounds, comparing them with a different structure that was predicted to be easier to process: nouns followed by prepositional phrases (e.g., "paper on the analysis of language"). The results do show that compounds are harder to process than the alternative structure, but were not as clear as predicted. On the usage front, the thesis analyzes the distribution of compounds in a corpus of 182 scientific papers of Biology, Linguistics and Economics. Compounds appear with roughly the same frequency throughout the different regions of the papers (i.e., do not cluster in specific areas), and are not reused much after the first use. They are also typically set up by their context, and this does have an impact on the difficulty experienced by readers when encountering them (at least when encountering unfamiliar compounds). The results corroborate the recommendations from writing guides suggesting that compounds should be used with parsimony, but also suggests that familiar compounds do not need to be avoided as much, as that providing contextual support may mitigate some of the difficulties experiencied by readers when encountering these structures. It is my hope that future writing guides will take these findings into consideration. In addition, the results partially support the ERC Principle and the UID Hypothesis, but were not as clear as predicted, and raise questions about the validity of the ERC Principle in texts and of the UID Hypothesis for comprehension.
Building similarity graph...
Analyzing shared references across papers
Loading...
John Cristian Borges Gamboa
Building similarity graph...
Analyzing shared references across papers
Loading...
John Cristian Borges Gamboa (Thu,) studied this question.
www.synapsesocial.com/papers/6a0ea17cbe05d6e3efb6037e — DOI: https://doi.org/10.26204/kluedo/13127