Divided By A Common Language Flashcards
CORPUS LINGUISTICS
To analise language we can use a method called corpus linguistics, by analazing a selection of text. A collection of text to be analized is called a corpus.
A collection of text contains billions of words, so computer software is used to count linguistic phenomena, carry out statistical tests, sort the data and present them visually to humans so they can interpret them more easily, but it is only with human input and interpretation that the patterns identified by computers can be explained.
many of texts in corpora contain addittional levels of information that have been added to them, either by humans or computer software or a combination of both. If all of the words in a corpus are assigned codes which indicate this information, we can make more sophisticated calculations on the data.
There are two types of research: corpus based and corpus driven.
Corpus based studies involve forming and testing hypotheses about language. These hypotheses may arise in a number of ways. For example, they may be based on a claim or finding made by someone else.
Corpus-based research can be serendipitous, involving a ‘noticing’ of a particular phenomenon in language as a result of our everyday encounters.
Whatever the origin of the hypothesis, the researcher will know what he or she wants to look for and will usually have a particular question in mind, such as ‘Are nouns more common than verbs in recent American English?’ A potential limitation of this kind of research is that it requires humans to form hypotheses. Unfortunately, such an approach can be problematic, as we are burdened with numerous cognitive biases.
Instead, omputer software, unhampered by such biases, is useful at objectively identifying the main trends and patterns. This kind of approach is termed a corpus-driven analysis; we begin the analysis with no initial hypotheses. Instead, we may ask open questions, such as ‘What characterises the language in this corpus?
Corpus-driven technique is referred to as a keyword analysis. For our purposes, this involves comparing frequencies of all of the words in two corpora and running statistical tests to identify which words are much more frequent in one of the corpora compared against the other.
There are potential issues around corpus-driven approaches.
The first is that they often give too many results. As such approaches consider every word (or linguistic) feature in a corpus, the analysis will present information about each one, running into hundreds or thousands of rows of data. So this means imposing cut-offs that can still give hundreds of “statistically significant” results.
A second issue with corpus-driven analyses is that they can often tell us what we already know or would expect to find. So we need to concetrate on those which are less expected. However, even obvious differences can sometimes inspire interesting questions. With spelling, for example, while it is obvious that there are differences between British and American English, what may not be so apparent is whether the differences are being steadily maintained over time or whether one variety is moving closer towards the other.
In this book, I analyse a matched set of eight corpora encompassing texts of written standard published English. The chapters focus on: orthography, affixation/letter sequences, words and word sequences, parts of speech, semantics/culture and identity/discourse markers
BROWN CORPUS
The Brown Corpus consists of 1 million words of written standard English that was published in 1961.
The 500 text samples were taken from four main categories of writing, which were further split into 15 subcategories or genres, labelled with the letters A–R. The texts were taken from the library at Brown University as well as the Providence Athenaeum and the New York Public Library. The number of texts in each genre are note qual but reflect what the linguists felt would be the most representative coverage of English writing.
In the early 1970s, a second corpus was created. This corpus was created by collaborators at the University of Lancaster, the University of Oslo and the Norwegian Computing Centre for the Humanities at Bergen and so was known as the Lancaster-Oslo/Bergen, or LOB, corpus. Since the pubblication of these first two corpora, six others jave joined them. I collected the texts that made up the British English 2006 Corpus while Amanda Potts led a team to create the American English 2006 Corpus. Due to the wealth of available data now online, texts were sampled from online sources, with the proviso that they needed to have first been published in ‘paper’ format so that comparisons with the earlier forms of published writing in the 1960s and 1990s corpora would be valid.
A possible solution to the fact that different cultures and time periods reflect interests in different genres is to try to use different categories.
Another point worth considering relates to the fact that all the samples are taken from published texts. They represent a somewhat ‘conservative’ form of English. However, a lot of the innovation in English happens in much more informal contexts, especially where young people or people from different backgrounds mix together.
So the Brown family is unlikely to be able to tell us about what is happening at the forefront of linguistic change.
Gathering a collection of 1 million words of language data was impressive in 1961 but by recent standards, the Brown family are now ‘small’ corpora. The British National Corpus, collected in the early 1990s, is 100 times larger than the Brown corpus, and so on. There are clear advantages to having corpora consisting of larger sample sizes; we can be more certain that our findings can be generalised to a population of language users.
For the aims of this book, I argue that corpora consisting of 1 million words are large enough to focus on the phenomena that I am most interested in. The aim is to provide coverage of the most noticeable and oft-encountered differences and changes in English.
COMPARING CORPORA
Mair provides a ‘tipology of contrasts’ which is helpful in considering the different ways that two language varieties can be compared together.
- Regionally specific change: There is a significant change in one variety but not in the other.
- Convergent change: The comparative frequencies show greater similarity after the change than before.
- Parallel change: There is a significant change in the same direction for both American and British English.
- Different rates of change: Even if the significant changes are in the same direction, the rate of change can be considerably higher in one variety than the other.
- Different start/ending points: Significant differences show up at the starting point and/or the ending point of the period of time under consideration.
- The follow-my-leader pattern is a subtype of parallel change. Both varieties show a move in the same direction as the other, but one variety is already further advanced in that direction in 1931, and appears to be ahead at the other time periods examined t
Keywords (and Key Clusters, Letter Sequences and Tags
This test uses the frequencies of the word in each corpus as well as taking into account the total number of words in both. Keywords are one of the most popularly used corpus methods. Ideas about the best statistical way of calculating keywords have changed over the years. In this book I employ the more widely known log-likelihood measure as the main way of identifying keywords It is a hypothesis-testing measure, which tells us the likelihood that a word actually is a keyword. To illustrate how the keywords measure works, let’s look at the case of the word today Table 1.3 shows the frequency of this word in all eight of the corpora. In order to do a keyword comparison we also need to know the total size of each corpus (1 million words). If we enter all these numbers into a log-likelihood calculator8 it produces a log-likelihood score of 353.31. The higher the score, the greater the confidence that a word is a keyword.
So I have used the keywords technique to compare which words are especially frequent in one variety compared to another at different points in time. For each word this was achieved by carrying out four sets of keywords comparisons, first comparing the two 1930s corpora against each other, then the 1960s corpora, then the 1990s corpora, and finally the 2000s corpora. This procedure allows me to focus on long-standing differences between the two varieties. However the keywords technique alone is not enough, so I have employed a second measure called the Coefficient of Variatio
The Coefficient of Variation
In order to examine change over time across a single variety, the Coefficient of Variation works well.
The CV is calculated by taking the standard deviation of a set of values and then dividing it by the mean of that set of values. The CV can thus be calculated on all words in the four American (or four British) corpora, and the words with the highest CVs will be those which show the most change over time. While the keywords comparison involved taking one row at a time, the CV involves taking one column at a time. So first we take the four frequencies for British English. The CV is the standard deviation of these four numbers divided by the mean which works out as 59.74 divided by 406.2 giving 0.147. We then multiply this by 100 to get 14.7. Normally, the CV is between 0 and 100 – the higher the number, the greater the difference between the frequencies.
Correlation
A keyword comparison takes two corpora at a time, the CV takes four at a time, while the final measure I have used is correlation, which takes all eight corpora into account. A correlation measure takes two lines drawn on a graph and produces a number based on the extent to which the lines are moving in the same direction. A number close to +1 indicates that the lines are parallel to each other, whereas one close to −1 indicates that the lines are moving in opposite directions.
The frequencies in Table 1.5 show that for British English, who has slowly but consistently decreased across the four time periods, whereas for American English, who has increased. When these eight frequencies are entered into two columns in an Excel spreadsheet and the CORREL measure is applied to them, we get a correlation score of −0.87.
Importantly, the correlation statistic does not tell us what direction the lines are moving in. But if we combine the correlation measure with the keywords and CV analyses, we can build a better profile of a feature’s behaviour.
Interpreting and Explaining: Going from the Quantitative to the Qualitative
Simply presenting our statistical findings in tables only counts as an early stage in our analysis. We then need to go on to interpret and explain our results. So we should ask the following questions: what is a particular word, sequence or tag used to achieve in the eight corpora, what contexts does it occur in and is this the same or different when we compare the corpora together?
To answer these questions we need to explore the language in the corpora in more detail. For example, we might want to take into account whether an item is distributed evenly across an entire corpus, or whether it only occurs in a small number of texts or text types. If an item appears in numerous files across a small number of registers, then that is worth taking into account.
A second way that we can consider context is to examine how an item is embedded within the language of a particular text. For example, we could take into account collocates. A word’s collocates help us to understand its meaning. For example, in one corpus the word bank may collocate with river, reeds, water, vole and otter, while in another the same word could collocate
with money, lend, city, loan and mortgage. The different sets of collocates help us to understand that the word has a different meaning in the two corpora.
Another approach involves reading the texts in detail via a technique called concordancing. A concordance is a table which shows all of the citations of a particular word with several words of context either side. Table 1.7 shows a concordance of the word bank for the 1931 American English corpus.
From reading this concordance table we can start to see the two sense of bank described above. Lines 1, 3 and 7 relate to the “river bank” meaning while 2, 4, 5, 6, 8, 9 and 10 relate to the ‘repository of money’ meaning. We might want to state that this indicates that in 1931 American English, bank was more likely to refer to money than rivers.
Collocates and concordances are useful ways of interpreting the quantitative patterns in corpus data, but they may not help to explain why a particular pattern exists. To do this we often need to take into account different types of relevant data such as lifestyle and consideration of political, economic and social movements and the impact of important events.
For example, I found that in all the American corpora, at every time period there are significantly higher references to words relating to law and order than in their respective British corpora. In order to try to explain this finding I carried out some contextual research looking at how the two cultures relate to the concept of law and order.
Changing Word and Sentence Lenghts
An initial way of thinking about the eight corpora involves looking at the length and variety of words and sentences. Figures 1.4–1.6 present three calculation for the corpora. First the mean word lenght is based on the number of letters in each word (Figure 1.4.). second, the mean sentence length is based on the number of words on average in each corpus (Figure 1.5.)
Each corpus is made of types and tokens. A token is simply any word, while types are unique words.
The type/token ratio is simply the number of types in a corpus divided by the number of tokens. A low type/token ratio (close to zero) indicates that a corpus has few types of words in it. On the other hand, a high type/token ratio indicates a much more lexically diverse use of language with many different words in a corpus.
Figure 1.4 indicates that generally, mean word length has increased between 1931 and 2006. The lines between 1991/2 and 2006 are more horizontal, which may indicate that the trend is beginning to level off.
The reserve pattern is the case for mean sentence length, with sentences appearing to contain fewer words over time (Figure 1.5)
As for the standardised type token ratio (Figure 1.6) which measures lexical diversity, neither line is straight, but there does appear to be an increase over time in both varieties.
SPELLING DIFFERENCE
One of the most inescapable differences between the two language varieties is to do with spelling. British English orthography shows that for much of its long history many of the inhabitants of Britain had either poor or no literacy.
American English, on the other hand, was developed under very different conditions, with a largely literate and more linguistically cohesive population. By the start of the nineteenth century, Americans had confidently established their nation and spelling reform was seen as a way of marking difference from the British, potentially even signalling superiority. The spelling difference are not huge but they are one of the most ways that the two nations demonstrate their identities through language use.
Old English had spellings and words that were derived from competing sources: Latin, which was used by educated people and Scandinavian words (Old Norse) introduced during Viking invasions. The Norman invasion of 1066 introduced French into the nobility and educated classes, leaving a mark on the whole of English.
Denham and Lobeck note that the Great Vowel Shift (a term coined by Jespersen 1909), which took place between 1350 and 1700, was responsible for many of the peculiarities of British spelling. As Middle English gave way to Modern English, people altered the ways that they pronounced many words, with the result being that spellings became outdated.
By the late fifteenth century, the printing press helped to bring about a higher level of consistency in spelling choices.
For example, sixteenth-century scholars introduced numerous Latin and Greek spellings into English, feeling that this would make their language appear more learned. Notable examples include the introduction of b in debt, the p in receipt, the s in island and the l in salmon.
Perhaps its history of adaptation and change explains why it continues to absorb words from many other languages, e.g. tsar, guerrilla, karate, kibbutz, llama, and kitsch.
In contrast, the publication in 1806 of Noah Webster’s first dictionary, A Compendious Dictionary of the English Language was a key point in attempts to create a standardised and separate American English orthography. Webster disapproved of some of the inconsistencies in English spellings, noting that it made the language unnecessarily difficult to learn. He used a combination of logic and aesthetics to recommend changes. Some of these involved shortening word lengths in American English.
Webster suggested that words became shorter, e.g. the removal of some doubled l consonants in British English inflections (e.g. cancelled becomes canceled), while words ending in an unstressed - our in British English should be spelled with –or in American English.
Many of his spellings were widely taken up by Americans, helping to result in a spelling system that feels slightly less cryptic than British English.
Tottie points to American enthusiasm in adopting the new spellings, and notes how American passion for spelling is manifested in organised spelling contests known as spelling bees – the US National Spelling Bee has been running since 1925.
FINDING SPELLING DIFFERENCES
In order to identify a list of potential spelling differences, A triangulatory method was used, first by carrying out keyword comparisons of the paired sets of corpora which identified some of the most salient sets of spelling differences.
Using this procedure when comparing the two corpora from 1931, the top 100 American keywords included color, center, labor, honor, behaviors, fibers, program, gray, favor, organization, theater, defense, favorite and colored while the top 100 British keywords included colour, labour, centre, organisation, realised, favour, towards, whilst, programme, coloured, organised, behaviour, honour, neighbourhood and favourable.
A second method involved referring to external sources like Howard (1984), Peters and Finegan who have outlined spelling differences between American and British English. This produced a few additional potential differences like analyze/analyse, foetal/fetal and canceled/cancelled. Taken together, this produced a list of potential differences shown in Table 2.1. In order to obtain as full a picture of change as possible, we need to consider as many forms of individual words as we can. To catch as many relevant forms as possible, initial searches were carried out on the corpora using the tool CQPweb. Rather than searching for words, wildcards were used in combination with the spelling differences to produce lists of potential differences.
Additionally, British and American dictionaries were consulted as a third form of triangulation. Baker tried to discuss individual words for at least three reasons:
(1) if they were particularly frequent and thus contributed disproportionately to the overall frequency of a spelling difference;
(2) if there was uncertainty or disagreement over whether they actually were affected by a spelling difference;
(3) if their frequency pattern was somehow unusual, compared to other words with the same spelling variant.
Example of spelling differences
variant.
-or or –our
Baker begins with a discussion of the –or/-our difference as it is the one most frequently encountered in written texts. British English uses -our in words like colour and behaviour while American English uses the shorter form -or. The British spelling has been attributed to borrowings from French. However, after the seventeenth century, British scholars advocated the use of Latin endings for some words which ended in -or so words like chancellour and governour became chancellor and governor in British English. However there wasn’t always agreement about which words were loans from Latin and which came from French. Webster’s American dictionary gives a clear preference for the shorter –or spelling.
Figure 2.1 shows a visual representation of the spelling choices for British and American writers across the four time periods.
The dark coloured grey bars represent usage in American English, while the lighter ones are usage in British English. The top half of the figure always shows the (traditionally thought) extent towards a preference for what is deemed the American spelling (in this case -or), while the bottom half shows the preference for the so-called British spelling (-our).
The first (dark-coloured) bar on the top part of the chart reaches the 96.80% point, while British English favoured the –our spelling 98.50% of the time.
The main pattern shown in Figure 2.1 is of a strong difference between American and British English which appears to be holding across all four time periods.
The three most commonly used words (and their plural forms) which contributed towards the frequencies in Figure 2.1 were labor/labour, color/colour and behavior/behaviour.
However, in 2006, there is a notable shift towards behavior in British English.
Some British texts probably show this shift because they are editorial practices in international journals.
International publishing practices may result in more fluid conceptualisations of national varieties, with proof-editing changes imposed on authors at a late stage.
-ize or –ise AND RELATED FORMS
American writers strongly favour the -ize spelling in every time period. However, British writers do not appear to be as consistent in their use of –ise. The British –ise appears to have rallied in popularity by 2006, but even here, 23% of the words affected by this spelling variant occur with – ize.
American English favours realize (and related forms). British English has a weaker preference for realise, which is weakest in the 1961 corpus, but is stronger in 1991 (69%) and gains further ground in 2006 (80%). There is a similar pattern for recognize/recognise and related forms, with American English faithfully sticking to recognize in all four time periods and British English being most strongly in favour of recognise in 2006. There is thus evidence that British English has been moving towards a more confident use of -ise since the 1960s, while American English has always remained firmly in favour of -ize.
How could we explain the comparatively lower uses of –ise in British English, especially in the 1960s and 1990s? One possible reason could be due to Oxford spelling.
Oxford spelling was first adopted in the first edition of the Oxford English Dictionary.
Tieken Boon Van Ostade analysed use of -ize and -ise in the 100 million word British National Corpus. She found that while some words (generalise, characterise) were spelled about equally with -ise and -ize, others (criticise,recognise,realise) preferred the -ise spelling. She concludes that ‘there are therefore factors at play today other than a straightforward preference for either the British or the American spelling system.
Baker calculates the overall trends in spelling anc carries out two calculations for British English. The firs includes –ize/-ise while the second removes those data.
Why has British English started to favour -ise more often in the later time periods examined? A likely answer, is that word processing software makes the choice for us.
What about the related forms -ization/-isation and -yze/-yse? Oxford English spelling rules would apply for the former (globalization) but not the latter (analyse).
The word organization is responsible for about a quarter to a third of cases where British people use -ization, although civilization is also relatively common in British English. Talking to British friends, some of them expressed surprise that civilisation was a valid spelling, although the Oxford English Dictionary actually offers both spellings.
Almost 90% of cases of -yze/-yse occur with analyze/analyse and its related word forms. Care must be taken with the plural noun form analyses because it is spelled the same in British and American English.
Overall, there is a trend for polarisation, with American English strongly favouring –yze and British English using –yse almost as much.
-er or –re
Next he considers the -er/-re difference, which is commonly found in words like center/center. The spelling difference tends to mainly affect words which have a b or t before the er/re, such as meter or fibre. Words which contain this spelling variant are frequent.
Figure 2.4 indicates that the UK is somewhat more in favour of the British –re spelling choice than the US is of the American –er choice.
Most frequent forms are: metre/meter and sombre/somber. American writers favour center over centre strongly but this is not the case for theatre.
-og or –ogue
There are also less frequent spelling differences. Words containing the –og/-ouge difference are relatively rare in English.
A problematic case which emerged during the initial searches on ogue was synagogue, which appears similar to words like catalogue and dialogue. However, the ‘American’ spelling, synagog did not appear in any of the corpora, and American English dictionaries indicated that it was expected to be spelt synagogue even in American English.
Figure 2.5 shows that the British –ouge spelling dominates both the American and British corpora. The most popular cases of this spelling difference are the words dialogue/dialog and catalogue/catalog.
Dialogue tends to be common in American English.
Catalog and catalogue are equal variants. The word analog(s) appears in the 1991 and 2006 American corpora but we are dealing with low-frequency data.
-a-/-o- or –ae-/-oe-
When compiling a list of terms for this category of spelling, the final result included the words
aesthetic, anaemia, archaeology, encyclopaedia, faeces, gynaecologist, haemorrhage, leukaemia, mediaeval, orthopaedic, paedophile and related forms. These are not particularly frequent words.
Figure 2.6 shows that British English mostly retains the –ae- spelling, while American English appears to have switched between 1961 and 1991 to also start favouring the –ae- spelling. Of the 113 uses of -ae- words in the American 2006 corpus, 54 of them involve forms of aesthetic which occur in one file, a journal article published in the Journal of Aesthetics and Art Criticism.
It is also aesthetic which is largely responsible for around 80% of the American 1991 uses of –ae-.
Additional evidence indicates that for the last 200 years aesthetic has always been more popular than esthetic in American English.
However, there is evidence that the preference for aesthetic has gown even stronger in recent decades.
For the related -oe-/-e- variant I examined forms of amoeba, diarrhoea, foetus, homoeopathy, oestrogen and oesophagus.
American English showed a preference for the -e- form, however. Generally British English showed a similar preference for -oe-.
-ce or –se
For some words which can be nouns or verbs, a distinction is made between noun forms which use -ce, and verb forms which have -se. For example, I can give advice or I can advise you. British English traditionally tends to use this rule for more words than American English. However, two word forms are meant to indicate American-British differences. Americans are supposed to use practice as both a noun and a verb, whereas British people use practice as a noun and practise as a verb.
Let us consider the verb cases of practice/practise first. In the corpora, the term is much more likely to be a noun than a verb.
Figure 2.7 shows that in British English, the preference for practise as a verb has declined somewhat from 100% adherence in the 1931 to around four in five cases in 2006.
American English shows a growing preference for the American practice as a verb over time. How about the noun forms of license/license? The preferences are shown in Figure 2.8.
Here, the American preference for the traditionally American spelling (license as a noun) is almost always at 100%, whereas in British English, the preference for the traditionally British licence as a noun i salso reasonably high.
Another ce/se distinction is made for the British spellings defence and offence which are usually spelled with -s in American English.
Figure 2.9 shows that this distinction has held across all four time periods, although a few defences and offences have crept into American English.
The use of defence in America English appears in a text written in 1789, well before the first edition of Webster’s dictionary in 1828.
-l/-m or –ll/-mme
The doubling of consonant forms is another aspect of British English that Webster attempted to excise. Generally speaking, in both varieties, when a suffix which begins with a vowel is added to a word that ends in a vowel then a consonant, then that consonant is doubled. So for example, strip + -ed becomes stripped.
However, British English tends to double the l, even when the final consonant is not stressed, and this doubling is used for inflections and noun suffixes.
Figure 2.10, included the related forms of the words bevel, cancel, channel, chisel, counsel, cruel, dishevel, label, level, marvel, model, panel, tunnel, etc. In 1931, British English is much more confident in its use of -ll, with American English only using the single -l form in two out of three cases. However, as time passes, it appears that Americans become more firmly wedded to the single -l, and the British preference for -ll flounders slightly, so that by 2006, both varieties have an
equal 94% preference for the national distinction. Therefore, this seems to be a situation where British English is moving (slightly) towards American English.
A related word, discussed in this section is program/programme. It follows a similar rule to l/ll, except that it involves m/mme and really only involves one words.
There is practically 100% adherence to the national variety in 1931, although by 2006 this has decreased on both sides to 90% for British English and 95% for American English. The reason for the change appears to be due to a newer meaning of program to refer to computer software which did not exist in 1931 but is reasonably common in 2006. British English users appear to have decided to use American program to refer to this meaning, but have retained programme for almost all other cases. This shows an interesting case where American technological innovation has influenced British spelling practices.
Dropped e
For some words that end in e, when the suffix –ing or –able is added, American English usually drops the e, although this is not always the case for British English. A few words like dye, singe and swinge always keep the e in these situations. For -able, the American English tends to keep the e when the root word has more than one syllable, or when the e is needed to keep a soft s, c, ch or g.
Both American and British English favour keeping the e in 1931. The other three time periods show a preference for dropping the e in the American data, while British English always prefers to keep the e.
However, by 2006 only three in five cases involve British English keeping the e. this appears mainly to be due to the word queuing (as opposed to queueing).
-st
Two relatively frequent words amongst/among and whilst/while, could also appear to reflect national choices, with the -st forms associated with older British English.
In the spelling difference also reflects one of pronunciation. American English shows a very strong preference for among, whereas British English choses amongst in about 1 in 5 cases.
How about while/whilst?
Here, we neeed to tale into account that the noun usage of this term is always while.
So noun uses have been excluded from counts. For non-noun uses, there is disagreement over whether the conjunction uses of while and whilst actually have distinct meanings in British English. Carterindicate that they are the same. Peters suggests that the choice ‘is a matter of regional dialect and style’, although online sources claim that some grammarians have suggested that whilst indicates a short period of time (having the same meaning as when), and while indicates a longer period of time.
The pattern for whilst/while is somewhat similar to that of among/amongst with full adherence in American English to while, and a similar amount uses of whilst in British English.
As a final note, a related choice, between amid or amidst, is much less frequent. American English has a few more uses of amidst than perhaps expected.
-ction or –xion
The more commonly spelt connection could be spelt connexion in British English.
Figure 2.15 shows that for all eight corpora, connection are always more popular than the -x spelling. While adherence has been at almost 100% for American English, British English used connexion in almost one in five cases in 1931, but is almost non-existent by 2006. Two other spellings, reflexion(s) and inflexion(s) are only present in the British corpora.
Two other words, complexion and crucifixion, are meant to retain the –xion form, even in American English.
Toward or Towards
British English tends to favour towards while American English uses the shortened form toward.
This word form occurs around 300-400 times in each corpus.
Gray or Grey
Peters notes that Johnson made grey the standard spelling in British English, although that spelling is rare in American usage. As both Gray and Grey can be proper names, frequencies have only been counted for cases which refer to the colour. Figure 2.17
American English in 1931 is less confident of gray than British English is of grey, although the following two time periods seem to indicate a strengthening towards national varieties, reaching 100% in British English and 95% in American English.
An interesting case which emerged for the American English data was greyhound.
The well-known American intercity bus service which was founded in 1914 is referred to as Greyhound Lines Inc, not Grayhound.
DENSIFICATION
The increase of words containing apostrophes is evident across both varieties in Table 4.3 (it’s, didn’t, don’t) as well as that’s and I’m in AE and BE respectively.
In addition, the word cannot is decreasing in British English, while an alternative expression can’t has almost double the frequency in 2006 compared to 1931.
If we count all cases of words containing apostrophes for negatives, forms of be and have and modal verbs, there are 3289 and 3685 in the British and American 1931 corpora and 7672 and 7943 in the equivalent 2006 corpora. This is a densification trend.
Another declining word which provides further evidence for this trend is upon. It shows a remarkable pattern of decrease.
Upon tends to appear after forms of the verbs call, base, depend, agree, rely, look and impose but also notably occurs in the phrase once upon a time. I would hypothesise that a shorter word, on, has simply replaced upon over time, as part of densification.
Figure 4.3, The network indicates that on has more collocates than upon.
Figure 4.4 on now has more collocates and upon only has one.
Even the verb rest, which was only associated with upon in 1931, is now associated with on.
The figure indicates that as upon has fallen, on has risen.
If current trends continue then upon will become restricted to the idiomatic once upon a time. The opposite pattern to upon is seen with around.
In 1931 round has more collocates than around. By 2006, around has more collocates. The case of around replacing round goes against the trend towards densification.
However, the most commonly used degree adverb that is used to approximate quantities is about. Cases of about as a degree marker have decreased over time, which indicates an additional way that around is fulfilling the function of another word.
DEMOCRATISATION
In Table 4.3 we can note a decline in certain titles – Mr and Mrs in both varieties, as well as Sir in British English. Such trends point to informalisation but also denote democratisation.
The gendered title for married women (Mrs) could be viewed as particularly problematic as its male equivalent Mr can be used on married or unmarried men so does not signify that one has been ‘taken’, as Mrs does. Note also the decline of men in American English.
Democratisation can also be evidenced through patterns around modal verbs.
For example, in Table 4.3 British English shows a marked decline in the ‘strong’ modal verbs shall and must.
Should is also constantly decreasing in both varieties.
A complementary trend is the rise of the weaker need in both varieties. Two other words are want and wanted, strongly increasing over time in American English.
Not all modals follow this trend though. May, which is a relatively weak modal is declining in both varieties.
May is likely to have been supplanted by other forms. As could, also will is decreasing in both varieties.
American English appears to be ahead of the decline of will. A final point about modals involves negation. Four modals: can, could, would and will have a negation pattern that becomes more informal over time.
Cases of mightn’t and mayn’t are completely absent from the 1990s and 2000s corpora, while shan’t and mustn’t are very rare.
In terms of words which are more about content than style, there are rises in several general words relating to people and society (human, social, children, child, family, people), reflecting that in various ways, writers are increasingly referring to these ‘human’ concepts.
Baker noted that British writers seemed to be spending more time considering children over time, particularly in relation to the concept of danger or risk. The word children tends to be mentioned most often in the news sections of the corpora, especially in the British English 2006 corpus, which contains references to children in relation to obesity, paedophiles, poverty, knives, AIDS, adoption, etc. Less obviously, the increase in people is also related to increased focus on children. The top collocate of people in British English 2006 is young, almost always occurring in the phrase young people.
The growth of social over time reflects changes (in both cultures) in terms of provision for disadvantaged or needy people. For example, collocates in the 2006 corpora include housing, security, medicare, justice, movements and worker:
Such collocates are not found in the 1931 corpora, where social instead collocates with words like institutions, political, economic and relations.
There are also some interesting differences between British and American English in terms of the use of social, indicating that the term perhaps has had wider penetration into British English. In British English 2006, social collocates strongly with housing, class and workers, while in equivalent American corpus, top collocates are security and medicare.
Health is also rising strongly in British English. The top collocate of health in 2006 British English is mental.
Part of the reason for the increase in this word in American English is due to the issue of health care with top collocates of health in 2006 American English being care, insurance, costs and spending. President Obama signed the Obamacare. The Act aimed to increase affordability and quality of health insurance, but was met with resistance from conservative advocacy groups, and perhaps explains why, in the 2006 American corpus the word health appears relatively most often in the news registers.
Help appears to have seen the most ncrease in category H. In American English, help is most common in 1931 in category L (Mystery and Detective fiction). This is the same as in the 2006 corpus. The pattern is rather different for British English, where in 1931 help is associated most commonly with category B (Press Editorial), but in 2006 it is much more common in H (Miscellaneous Non-Fiction).
British English also has more cases of help overall in 2006 compared to American English.
Help appears to be part of a growing discourse, especially prevalent in British English institutional writing to emphasise democratisation, participation and inclusivity.
INFORMALISATION
A related trend to democratisation is informalisation which is characterised by Goodman as involving changes to terms od address (shortened Christian names), contractions of negatives and auxiliary verbs, uses of active rather than passive verbs, and more use of slang and colloquial terms. Increased use of apostrophes in written language. In the literature on language change, the term colloquialisation is often used to refer to trends that are very similar to informalisation. The two terms largely overlap.
A salient aspect of informalisation involves changes to pronoun usage, particularly increases in first and second person pronouns, which help to cement the appearance of relationship between a text producer and those who receive it.
America shows a constant increase in the first person pronouns I and my, as well as the second person pronoun your.
It is also worth considering two other first person pronouns, me and my. Me is key in 1961 British English only. Me actually dips over time in British English between 1961 and 2006, whereas American English shows the opposite pattern. My is increasing in both varieties.
In general then, the trend towards greater use of first person pronouns is more pronounced in American English.
Across the corpora the J category (Learned) has the lowest relative frequency of I and my overall, indicating a more formal writing style which aims towards (the appearance of) objectivity. I is indicative of active sentences, and it is notable that two of the declining words in Table 4.1 include
verbs which suggest passive sentences: given and taken, whereas a third declining American verb, made is also a potential indicator of passives.
In the American corpora which show the greatest increases in I then, when comparing 1931 and 2006, we find that the increase in I is not equal across all registers. In fact, I is lower in three registers in 2006 (Mystery and Detective Fiction, Press Editorial and Miscellaneous: Government documents). The largest increases are in three of the General Prose categories (Religion, Skills, Trades and Hobbies and Popular Lore). In Academic English in 2006, I is regularly used when authors try to give an overview of the structure of their paper.
We might wonder then whether the trend towards I and my, along with the decline of taken and given could be due to grammatical advice given by word processing software. For example, Microsoft Word allows authors to check their spelling, grammar and style, so a range of phenomena could be flagged as potentially problematic if the grammar checking facility is used. Two increasing words which also reflect informalisation, in different ways, are love and like. Americans seem to have increased the use of love, although the pattern is less clear for the UK. Love tends to be used about twice as often as a noun as a verb, although in American English the verb uses are double in 2006 what they were in 1931. British English does not show the same increase. In the 2006 British corpus people describe loving cakes, Paris, Venice, credit cards, music, etc.
The equivalent American corpus has references to loving fishing, steamrooms, workouts, Park Avenue, classical music. This love of concepts, hobbies or non-human material things is largely absent from the 1931 corpora where love almost always refers to strong feelings between people or involves religious deities. This expansion of love into many other contexts could be viewed as a way that some writers have incorporated hyperbole into their texts.7 The rise of really in American English could also perhaps be viewed as contributing to this process.
Like, which has also shown a strong trend of growth in American English, can be a verb (I like you), noun (the like of which we won’t see again), preposition (she looked like a very feisty lady), adjective (in a like category) or adverb.
In written English it mainly occurs as a preposition. Its rise in US English is due to an increase in the prepositional use, where it tends to be found after verb forms such as was, look, sound, feel and seem. The analysis points to a rising use of like as a simile in American English.
CONTRARY PATTERNS: WHO AND SAY
There are words that shown dramatic changes while, viceversa, there are cases of contrary change that is to say cases where a word shows increase in one variety but dicrease in the other. In order to calculate these changes, each word received a correlation coefficient between 1 and -1. For example, the word death shows a decrease for British English and a rise for American English. There also cases of words showning a continuous rise and fall, for example Say and Who that show rises in American English and falls in British English. Probably Who is decreasing in British English might be due to its position in a sentence.
Another change involves Who followed by the form of thw verb to have with a decrease in British English compared to American English. Also who followed by form of the verb to be shows a decrease in British English compared to American English.
The change in British who appears to be due to its decrease in relative clauses.
On the other hand, the rise of who in American English could be due to decline in whom.
This is illustrated in Figure 4.12
Figure 4.13
It shows frequency of Say. In American English the use of Say has increased in the construction: noun followed by Say.
In British English Say has decreased due to less usage of expressions like that is to say, it is to say, etc.
It is predicted that British English may begin to catch up with the “Expert Say” quotation styles of the American press.
The word says is more frequent in American English, above all in the press genre.
TWO AND THREE-WORD CLUS
Clusters= sequences of words
There are 126 two-word clusters that have a combined frequency of 1000 or more in the 4 American corpora, and 143 such clusters in the British corpora. When we consider clusters of more than two words, their frequency drops off quite dramatically, There is only one three-word cluster that occurs more than 1000 times in both varieties: one of the