Tag Archives: statistics

Out For The Count

In order to arrive at a reasonably definitive figure for the population of a country, county or state, it has been the practice for governments to conduct a census at regular intervals, typically every 10 years. When and if the detailed records are eventually made available to the public, these provide an unrivalled method of gathering information on individual ancestors and also for examining the total numbers and geographical distribution of people sharing the same surname. This article considers what the census can tell us about this geographical distribution. Continue reading

How Rare is our Surname?

The first question that people ask when told that someone is studying the family history is almost invariably ‘how far back have you gone?’. The second question, when one has explained what a one-name study actually involves, is usually along the lines of ‘how many people do you have in the database?’, closely followed by ‘how many people of your name are alive in the world today?’

The last of these is a question that also exercises those of us who are engaged in the organisation of one-name societies, not least because it helps to define the number of potential members. It also helps us to determine when all the various branches have been identified. This article examines some of the statistics for the Linfield and similar names and shows several methods by which the present, and past, population may be estimated.

Ideally, from a family historian’s point of view at least, one would simply consult the latest census results in order to determine the number of Lin(d)fields in the country today. One would repeat that process for each country to arrive at the total number in the world and to show the distribution between countries. In the real world, however, considerations of privacy and human rights dictate that census information is not released into the public domain until after an agreed interval. In most cases this is defined in such a way as to ensure that all of the data subjects have died. In the United Kingdom, for example, the time is set at 100 years from the end of the census year, so that the 1901 census became available for public inspection on 2nd January 2002. Some countries set a shorter time, typically 70 or 75 years, while some have agreed that the data will be destroyed and never released to the public.

In the absence of a definitive survey such as the census, there are several less accurate methods that may be used to estimate the current population of a given surname:

  • Birth and death registrations
  • Birth rate calculations
  • Death rate calculation
  • Census data projections
  • Electoral rolls and phone books

This article will explore how each of these techniques works and will indicate the numbers for the Linfield name variants that may be estimated in each case.

Birth and death registrations

The requirement to register a birth, marriage or death came into effect on 1 July 1837 in England and Wales and on 1 January 1855 in Scotland. The indexes of registrations are standard sources of information for family historians, and as such are often the first to be transcribed by groups researching a particular surname. In the case of the Linfield variants, we have collected all of the birth, marriage and death registrations from 1837 up to 1994, and these are available for members and others who need the details for their research or in order to purchase a certificate.

During the early years of civil registration many births were not registered, the extent of under-registration having been estimated to be at least 10%. This was particularly true for illegitimate children. From 1875, there was a penalty for failing to register the birth of a child within 42 days. This often led to parents giving a later date than was actually the case in order to avoid having to pay the fine. A birth could not be registered at all after 6 months had elapsed.

On the face of it, we could arrive at a figure for the present Lin(d)field population simply by listing all of the births in the last 120 years, and then deleting all those from that list whose deaths had also been registered. The remaining number should be those people born with the name who are still alive today. As with all the best simple plans, there are however some flaws in this scheme.

The first difficulty is one of definition. In the census, voters lists and phone books, the number obviously includes women who have taken the name on marriage. Indeed, among these sources, only the census records actually show the marital status, and this is the one source that is denied to us in respect of the current or very recent data. It is impossible therefore, to distinguish unmarried from married females in the electoral roll; a male and a female at the same address are probably man and wife, but could be mother and son or father and daughter, or even daughter-in-law. Three females of the same name with a male shown at the same address could be a wife and two daughters, but could also be 3 unmarried daughters.

It is convenient, therefore, to adopt the convention that we count all those currently bearing the name, regardless of whether they were born with that name. Given the present situation where the average number of children is about 2, it may be argued that since males and females are born in almost equal numbers, the number of wives who have taken the name on marriage will be balanced out by the number of daughters who relinquish the name on marriage. This balance is skewed slightly in one direction by the modern trend for women to retain their maiden surname, and in the other by women who revert to their former names or gain another new name on divorce and remarriage. More seriously, the balance is skewed by daughters who do not marry and retain the name for life.

Further errors arise due to migration, adoption, and changes of name for other reasons. It is estimated that between 1815 and 1931, more than 20 million people emigrated from Britain to countries outside Europe. Even allowing for a substantial proportion of Scots and Irish in that number, this probably represents an annual rate of at least 1 in every 400 of the population of England and Wales. If the Lin(d)field families were typical of the general population, this would suggest that as many as 100 members of the Lin(d)field families might have been expected to have emigrated during the 100 year period.

These difficulties having been recognised, the calculation is still valid as a crude estimating technique, provided that we accept that the margin of error may be as high as, say, plus or minus 20%.

The total number of births registered in the years 1895-1994 inclusive is 1884. If we then subtract the numbers of deaths recorded in the same period, for those entries where the age indicates a birth after 1894, a total of 655, this leaves 1229 people who would still have been alive at the end of 1994. This assumes, as noted above, that for each death of a Linfield widow there is a death of a Linfield daughter under her married name.

Birth Rate

The number of births registered in each decade gives an indication of the trend in birth rates, particularly if the figures are expressed in proportion to the total population at the time. I have written previously about the trend observed during the first 100 years of registration, using the population figures for the census years, and the number of Lin(d)field births for each decade.1

We can also use the national birth rate as a crude estimate of the Lin(d)field population, provided that we recognise the possible errors. For example, if the numbers of Lin(d)field births are expressed as a fraction of the total population, the rate can be seen to be fairly consistent during the period up to the turn of the century. Birth rates generally declined from about 1900 and this trend is also evident in the Lin(d)field figures.

For much of the nineteenth century the annual birth rate was about 35 per thousand of the population. The Lin(d)field figure in that period works out to about 9 per decade, or an annual rate of 0.9, per million of total population. From this, it may be estimated that the Lin(d)field population increased from about 400 to 700 during the 19th century.

The birth rate declined during the 20th century from 28.2 per thousand in 1900 to 13.2 in 1980 with peaks following each of the two world wars. It then rose again slightly and in 1990, the latest year for which we have a reasonable sample of birth registration data, the rate was 13.9 per thousand. The annual rate of Lin(d)field births in the 1980’s and early 90’s was 16.4, and on this basis we can estimate the Lin(d)field population as about 1180. However, given that the national rates were lower in 1980 and 1997 (13.2 and 12.3 respectively) it is possible that using the 1990 figure understates the result by between 5 and 12%.

This estimate obviously assumes that the birth rate among the Lin(d)field families were similar to those in the population at large and that family sizes and mortality rates were also typical of the general population. In the early years of registration, there is no obvious reason why the Linfield birth rate should have differed significantly from the national figure. However, as populations become more diverse in terms of ethnicity and culture, there is a greater possibility of error. We find though, that in 1991, only 2.91 million, or 5.05% of the population described themselves as being in one of the non-white categories listed. This was the first census in which respondents were asked to categorise themselves in terms of colour; previous census returns asked for the place of birth of the head of household, but this data was not very informative since so many non-white people in Britain were second- or third-generation. Such a small proportion is unlikely to affect the overall national rates very significantly and any errors are likely to be masked by differences that occur between the Lin(d)field population and the white population as a whole.

Census data

The first available "census", and perhaps the most famous, is of course the Domesday Book of 1089. However, similar surveys were carried out on a more limited basis such as the Leicestershire Survey (about 1124-1129) and the Lindsey Survey (1124-1128). Various ecclesiastical census surveys have also been taken over the 900 years since Domesday.

UK pop (millions) Year Number Total
11.94 1801 0 306
13.36 1811 0 343
15.47 1821 0 397
17.83 1831 0 458
20.18 1841 43 518
22.26 1851 156 571
24.52 1861 53 629
27.43 1871 39 704
31.01 1881 796 796
34.26 1891 35 879
38.24 1901 0 982
42.08 1911 0 1080
44.02 1921 0 1130
46.03 1931 0 1182
50.22 1951 0 1289
52.7 1961 0 1353
53.79 1966 0 1381
55.5 1971 0 1425
56.3 1981 0 1445
57.99 1991 0 1489

The modern census series started in 1801. (An earlier attempt in 1753 failed in its goal of measuring population growth using a questionnaire to parish clergy.) Since 1801 there has been a census every 10 years except for 1941, which was omitted on account of the war. Unfortunately, this gap in the records was compounded when the 1931 returns were destroyed by fire during WW2. Two additional censuses owed their existence to war, however. These were the 1915 (Aliens Act) census and 1938/9 census (1939 Sept. 29th National Registration), prompted by the need to determine where and how many aliens lived in the UK.

As noted already, full access is available to all census data from the 1801 to the 1901. However, prior to 1841 the census contained little useful data for most family historians and not many of the earlier returns survive in any case.

Access to processed data
rom the more recent census surveys is also available but this simply takes the form of statistical material such as the numbers of males in a district, social trends, and so on. It does however give us figures for the total population and these can be used to estimate the numbers of Linfield variants if we are prepared to make certain assumptions.

One of the difficulties until recently, was that there was no complete record available of any of the census years, in a form which could easily be searched for a particular surname. Whilst some indexes had been produced, such as the excellent series published by June Barnes for parts of Sussex, the only way of finding the total numbers of any surname was to check every volume of the census. This task would probably take someone several months of full-time working in record offices, and was clearly beyond the resources of a group such as ours.

How then can we work out how many people of our surnames were alive in a particular year? The answer came with the publication by the Church of Latter Day Saints (the Mormons) of an index to the 1881 census. This became available on microfiche a few years ago, and is now available on CD-ROM which allows searches to be made quickly by computer. This includes an alphabetical index by surname, which allowed us, for the first time, to search quickly for all Linfield surname variants and to establish the total numbers enumerated in that year. The total number of Linfield, Lindfield, Linville, Linkfield and Lingfield entries in England and Wales in 1881 is 787. (There were a further 9 in Scotland, but these were fairly recent migrants from Sussex. For the purposes of the estimation that follows, these may be ignored).

In order to estimate the numbers in any other year, we can use the total numbers of population for the whole of England, Wales and Scotland. A reasonable estimate can then be obtained by assuming that our names represent a constant fraction of that total.

The table above sets out the figures. The first column shows the total population (in millions) of England, Wales and Scotland for each of the census years starting with the first census in 1801.

The fourth column shows the numbers of Lin(d)fields, estimated in proportion to those totals from the figure for 1881. It will be seen that for 1991, the last year for which census population is available, this process gives an estimated 1489 individuals using the Linfield variant surnames. The population of the UK has been relatively static over the last 10 years, with a significant part of any increase due to immigration from countries where the surname is not found. The figure for 2001 is likely therefore to be substantially unchanged.

The third column shows the number of entries currently entered into the spreadsheet. Thus it may be seen that for 1851, we have entered 156 of the estimated 571 Lin(d)fields in that census.

Prior to 1841, census enumerators were not required to list names of individuals in each household, although many did actually do so. We cannot therefore expect to confirm the estimated totals from the records, however much time we spend searching. It is interesting though, to apply the same process of estimation to work backwards from 1801, using such estimates as are available for the total population.

England & Wales
UK pop
Year Linfield
Est E&W
Est UK
2 1100 51
4.16 4.47 1570 107 115
4.81 5.17 1600 123 133
5.6 6.02 1630 144 155
5.77 6.2 1670 148 159
6.04 6.5 1700 155 167
6.51 7.00 1750 167 180
Figures in italics are calculated from 1700 ratio UK/E&W

Death Rate

A similar technique may be applied to death rates. Between 1984 and 1994, the average annual death rate for the Linfield variant names was 14.9. The national average in 1990 was 11.1 per thousand, suggesting a Lin(d)field population of around 1343. Again, this assumes that our surname population was statistically representative of the whole.

Electoral roll and phone books

The history of democracy in the United Kingdom has resulted in an electoral system that many regarded as outdated and in need of radical reform. On the credit side, however, the evolution of our voting system has led to the list of voters being something that we expect to be available for public scrutiny. The practice of vote-rigging in the early 18th century, of which the attorney John Linfield was famously found guilty, might perhaps have played a part in shaping our traditional insistence on such openness in the conduct of elections. It would be nice to think that a Linfield had done his bit for genealogy, albeit unwittingly and for all the wrong reasons!

Residents of other countries are not so fortunate. Canada, for example, has very strong privacy laws, both at the provincial and federal levels. Canadians do not therefore enjoy the same unfettered access to voting registers. The same culture of privacy applies to births, marriages and deaths, so that the techniques I have discussed for using the birth and death registrations are also impossible in Canada. These registrations are a provincial responsibility and there is no one central register for the whole of Canada. Each province has its own electoral register for elections to the provincial legislatures, and a national register is maintained by Elections Canada for elections at the federal level. British Columbia has a permanent electoral register, whereas some provincial administrations still do an enumeration before each election.

Whatever the reasons and history, it is a great convenience to family historians that the Register of Electors may be found in every public library and we should consider ourselves fortunate that it is so accessible. It has been possible for several years to obtain the data on CD-ROM, together with software allowing it to be searched very easily. The product that I used is marketed as UK InfoDisk, and is available in several versions, with a range of search and data handling facilities. The data also includes phone book entries.

The latest disc owned by the Group is UK Info Disk 2000, which contains the 1999 Register of Electors. This has a total of 881 individuals with one of the Linfield variant names, in 535 households. Since only those aged 18 or over are included, we must then add the births registered after 1981 to arrive at the year 2000 estimate. (A quick check of the birth registrations for 1981/2 confirms that the 1981 children are listed on the 1999 roll while those born in 1982 are not.)

Starting at the beginning of 1982, there are 115 birth registrations up to the end of 1994, or 8.8 per year. This figure excludes any births in Scotland, but these are so few as to be negligible. Adding a further 44 to allow for the 5 years up to the start of 2000 for which we have not yet collected the birth registrations, brings the figure to 159 children below voting age. Our records show that 2 of these died before 2000, giving us a final estimate of 994.


If we summarise the various estimates, and postulate the error margins for each method, the results are as follows:

Method Error margin Weight Estimated population in year 2000
Birth and death registrations +20% – 20% 3 1229 (984 – 1474)
Birth rate calculation -5% +12% 4 1180 (1121-1321) (on 1990 data)
Death rate calculation -5% +10% 4 1343 (1275 – 1477)
Census extrapolation +20% -40% 2 1489 (893 – 1786)
Electoral roll and phone books -5% +5% 10 994 (944 – 1043)
Weighted average estimate 1160

The estimates of error rate are my own, and I would not claim that they are arrived at very scientifically. I have attempted to suggest a figure that includes both systematic errors in the estimating method, and also those errors in the base data on which each calculation is made. The weightings are based on the confidence that might be expected of the figure actually being in the middle of the error range, and here again, the estimates are entirely my own. (As a matter of interest, a simple averaging of the 5 estimates produces a figure of 1247, so it can be seen that the relative weightings do not make a major difference.)

However inaccurate the various methods may be, it is clear that the true figure is around 1000. It is highly unlikely that it exceeds 2000 and fairly unlikely to be greater than 1500.

As regards the original question of just how rare the surname is, several ideas have been suggested for classifying names according to their rarity. One such method which has gained considerable acceptance is to use the 1881 population. This was suggested by Geoff Riggs in 1997 and the classifications that he suggested were as follows:2

High frequency 0.1% or greater (30,001+ people)
Medium frequency 0.01 to 0.099% (3,000 to 30,000)
Low frequency 0.001 to 0.0099% (300 to 3000)
Rare 0.0001 to 0.0009% (1 to 300)

On this basis, the Linfield variants, with an 1881 total of 796, or 0.00257% of the population, would be classed as a low frequency surname.

Trevor Ogden wrote an interesting article about this in the Guild Journal a couple of years ago.3 More recently, he has commented that there are a lot of very uncommon surnames. Quoting the data from the UK Info disc, (which was also the source of the Electoral Roll data quoted above), he notes that about 42% of names occur once, 16% of names occur twice, 7% occur three times, and so on with ever decreasing numbers. Phonebooks are not such a good sample of the population, but they show a similar pattern.

Trevor goes on to point out that if just one male holds a name, probability theory and computer modelling shows that there is about an 89% chance that the name will become extinct, although this depends on the assumptions about the number of sons people are like to have (ie, a probability of 0.89). If there are n men with a name, the chance of its becoming extinct is 0.89 raised to the power n. For example, if there were five holders of a name, the chances of none of them eventually having any male descendants is 0.89 raised to the power of 5, which equals about 0.56, so in this case there is still a better than even chance that the name will become extinct. However, if there are 50 holders of a name in the first generation, the probability that the name will become extinct is only 0.3% – a 99.7% chance that the name will survive.

Readers with access to the Internet might want to look at the press release ‘How common is your name?’ at http://www.ons.gov.uk/regist_f.htm It has been compiled by the Office of National Statistics and is based on registrations with GPs since 1991. The commonest male name in England and Wales is not John Smith, as one might think, but David Jones, with 15763 occurrences. John Smith is actually third with 12793. However, Smith is slightly more common than Jones as a surname. Robert Davies is 100th with 3366. The most common female name is Margaret Smith with 7640 and the 100th is Elizabeth Evans with 1974. Also given are the 100 most popular surnames and the 50 most popular male and 50 most popular female forenames. The figures quoted are for for England and Wales only.

Of one thing we may be certain – genealogy would be much more difficult if we were all called Smith!

1 Lindfield and Linfield Births Registered in England and Wales 1837-1937; edited by A Lindfield; 1993 Lin(d)field One Name Group ISBN 0 9522738 0 2

2 The 1881 Project – British Surname Distribution; Geoff Riggs, Journal of One-Name Studies, vol 6 no 3, July 1997.

3 How rare are surnames?; Trevor Ogden, Journal of One-name Studies, April 1998,.

Names in Database 1992

Relative Numbers

Mark Twain is generally credited with the assertion that there are three kinds of lies – lies, damned lies and statistics, and I have no doubt that a lot of people would agree with him. Certainly, for many people the mention of statistics seems to provoke something between cynical disbelief and uncomprehending boredom. This is unfortunate, for statistics provide a useful dimension to many subjects, and family history is one of them. By looking at the numbers in our database, we can examine trends such as birth rates and lifespans, and show the movement of the various families around the country, and indeed the world. This article sets out some of the statistics taken from the database as it exists at the end of September 1992. Continue reading