Adding Earnings Variables to the SARs: Accounting for the
Ethnicity Effect
Stephen Drinkwater and Derek Leslie1
Department of Economics, Manchester Metropolitan University
1. Introduction
Dale et al (1995) have attempted to overcome the lack of income information in the 1991 Census by imputing an income variable using New Earnings Survey (NES) data and attaching it to the SARs. It is possible to expand upon this useful idea by using other data sources to impute alternative income variables, which may be more appropriate for other purposes. More specifically, earnings information has recently become available in the Labour Force Survey (LFS) which offers some advantages over the NES.
The LFS contains particularly rich information
with respect to personal and household characteristics and virtually the same
Census definition of ethnic group is used. It is well known that ethnic minority
earnings (particularly for males) are lower, even after other major
characteristic differences have been controlled for. This arises because of
direct labour market discrimination or because some ethnic groups, because of
their isolation from the majority, have a correspondingly more limited range of
economic opportunities. The LFS will give information about this, whereas the
NES is unable to do so. Given that one of the most popular uses of the SARs is
for exploration of ethnicity issues, an earnings variable imputed from LFS data
is more appropriate than the existing one for users interested in ethnic
matters.
Notes: Gross hourly earnings (in pence) of
full and part-time employees only.
Earnings relate to individuals in the 2 percent individual
SAR.
Before describing the method used to impute our alternative earnings variable (LFSSCORE), Table 1 compares the existing Dale et al earnings variable NESSCORE for full and part time employees with our measure for the ten ethnic groups, separated into males and females.2 We have scaled LFSSCORE to be such that its average value is the same as NESSCORE. One point about both these scales is that they are relative measures and are not really intended to be interpreted in an absolute sense. Scaling LFSSCORE in this way aids comparability of the two series. In Table 1, the average NESSCORE is 747 for males and 537 for females. The LFSSCORE is then scaled to have the same average values. These averages take account of variation by individual characteristics, which are controlled for in the estimation procedure.
As can be seen, the new variable generally
produces lower values for the ethnic groups. For example, with LFSSCORE
non-white earnings are 88 percent of white earnings for males and 102 percent of
white earnings for females. The corresponding percentages using NESSCORE are 100
percent and 107 percent. So one point to emerge from the new variable is the
different patterns for males and females. Men from the ethnic minorities do
worse than white males, but females from the ethnic minorities do not seem to do
any worse than white females. It is women as a group that do worse relative to
men and slightly worse relative to non-white men. The second general point to
emerge from the Table 1 is widespread variation among the various ethnic groups.
Pakistani and Bangladeshi males do particularly badly and this reflects their
special disadvantages. They were the latest group to arrive, at a time when
labour demand was declining; they have a greater than average problem with
English as a first language and stricter religious observance all contribute to
the greater economic isolation. Pakistani and Bangladeshi females also do the
worst among women.
Table 2 highlights the ethnicity effect on these rankings (its derivation is described in section 3). Table 1 shows averages (controlling for individual variation) whilst Table 2 concentrates on the ethnicity effect alone and calculates the differential on the assumption that the ethnic minorities and the white group have the same individual characteristics. The table splits by gender and assumes that individuals from the ethnic minorities are foreign born. Notice that ethnic minority women now appear to have a significant disadvantage, in contrast to Table 1. So, unlike men, ethnic minority women seem to be endowed with better earnings enhancing characteristics, which compensates for the earnings reducing ethnicity effect seen in Table 2. This table shows the importance of delving more deeply into the underlying causes of earnings differences across groups. If UK born, the differential would decline by around 15 percentage points for males and 9 percentage points for females so the disadvantage to UK born individuals appears to be declining somewhat.
2. Details of the LFS Data
The use of LFS data to impute an earnings variable has a number advantages over using NES information. The LFS contains a finer classification of characteristics, which enables us to control for characteristic differences across individuals far more effectively. The LFS also contains better information on low earners. The NES misses around 30 percent of part-time men and 20 percent of part-time women (Orchard and Sefton (1996)), which occurs because it only records the earnings of workers who earn in excess of the Lower Earnings Limit (£52 in 1991). This is important in the present context given that a higher proportion of ethnic minorities are in low paid jobs and around 40 percent of white females and 25 percent of non-whites are part-time.
A disadvantage of the LFS earnings series is that no information was available in April 1991 since earnings data were only collected for the first time in the fourth quarter of 1992. The NES also has more observations. The sample size of the NES was around 160,000 in 1991, while on average each quarter of the LFS contains around 8,000 individuals with earnings information. The problem of small sample sizes in the LFS was overcome by pooling the eleven quarters from the fourth quarter of 1992 to the second quarter of 1995. This gives 38,878 male observations and 39,226 female observations, more than enough for present purposes. The fact that the LFS data do not match 1991 exactly is not such a problem, given our interest in establishing a ranking not an absolute scale. It is unlikely that rankings would have changed much in the short time gap between 1991 and our estimation period.
3. Description of Imputation Method
The method used to attach earnings information to the SARs entailed estimating separate earnings functions for males and females using LFS data. An earnings function is based on the economic principle of human capital and the idea is that earnings (expressed in log terms) are explained by a set of explanatory variables. The principle variables are education (better educated individuals earn more) and age, used as a proxy for work experience (older workers up to about 50 tend to earn more). These variables can be supplemented by a whole set of further controls which also help to explain the differences across individuals. We selected variables with compatibility with the SAR variables in mind. The additional explanatory variables included marital status, the presence of children, number of higher qualifications, region of residence, industry, occupation and a part-time dummy. Most of important of all, we also included separate ethnic dummies as well as an additional dummy to distinguish the ethnic minorities into UK and foreign born. These latter variables provide crucial information not available in NESSCORE. We also included some variables not contained in the SARs, such as job tenure, plant size and number of years of schooling. Obviously, these variables could not be used to explain individual variation when matching up to the SAR information. In effect, these variables are set at the same value for individuals in the SAR data set. Altogether each earnings function contained 53 explanatory variables, of which 47 were used in predicting the value of LFSSCORE. Finally, we included a set of time dummies to track any secular changes in earnings over the estimation period.
The next stage was to impose the coefficients produced by
the earnings equations onto the individual data contained in the SARs and
predict the individual LFSSCOREs. Table 1 is derived from this. Table 2 is
derived directly from the values obtained for the ethnic dummy variables in the
male and female equations. The foreign born dummy is used to give the separate
estimate for UK born individuals. Because the numbers of UK born earners
from the ethnic minorities are small (27.9% for this sample), there are not
enough observations to calculate a separate UK differential for each ethnic
group; the 15 percentage points decline for males and 9 percentage points for
females is an average figure calculated from the foreign born dummy.
As in Dale et al, we can impute a score for other groups and we have
constructed additional series to give as comprehensive a picture as possible for
the working age population (16-59 for women and 16-64 for men). These are
LFSNOTWK for the unemployed and inactive who reported occupations and this
variable corresponds to the existing NESNOTWK variable in the SARs.3 These
series display similar patterns to those observed for employees in employment,
namely that the NES information produces higher predicted earnings for out of
work non-whites compared with the LFS. For example, when the earnings of
non-whites not in work are expressed as a percentage of white earnings, the LFS
predictions are 8 percentage points lower for males and 3 percentage points
lower for females than the NES predictions. Another series was created for the
remainder of the sample, namely the self-employed and those on a government
scheme who had a stated occupation (LFSSEG). Self-employed males and males on a
government scheme from the ethnic minorities do better than their counterparts
in employment as their earnings were predicted to be 91 percent of white
earnings, whereas the reverse was true for non-white females who now earn 3
percentage points less than whites. Unfortunately, hourly earnings cannot
be estimated for those individuals who did not report an industry or an
occupation.
4. Some Caveats
Researchers who wish to make use of the Dale et al series and our own developments should be aware of some important limitations. Despite the relative sophistication of these methods, there is no substitute for actual hard information, which, hopefully, the 2001 Census will rectify. Since these are predicted values, the SCORES should not be used as a dependent variable. What they represent is a mapping from a disparate set of characteristics into one convenient continuous scale of likely economic position. Remember also that actual earnings of the individuals concerned will vary greatly around the predicted values. Thus the SCORES should not be used to make any statements about earnings variability. The SCORES will always show distributions which are much tighter than any real world earnings data. The LFS earnings functions explain 45.8 percent of the variance of log earnings for males and 46.3 percent for females; the variability of the predicted series will be correspondingly compressed.
The final data series attempts to rectify this latter
problem by introducing some artificial variability into the LFSSCORE series. We
construct a new series called LFSRAND. The first stage is a series of random
drawings from a normal distribution with a standard error derived from the
earnings function estimate. Adding this to LFSSCORE gives LFSRAND. This then
mimics the earnings variability that is found in the raw LFS earnings data.
Again it is important to note that the SCORE produced on this basis is
entirely artificial for any particular individual; all that is being attempted
here is to achieve a better measure of variability within groups.
Table 3 gives some summary information about this procedure. It constructs the Gini coefficient for LFSSCORE and for LFSRAND. The Gini coefficient is a frequently used measure of inequality, ranging in value from zero (complete equality) to one (complete inequality). Table 3 certainly points up the lesson that the true variation in earnings is considerably greater than that which would be found using the predicted series. The same would also be true of the NESSCORE variable. LFSRAND is reasonably close to the numbers obtained using the raw LFS earnings data. White males have a GINI of .311, white females .308, non-white males .301 and non-white females .270. We would expect the raw LFS numbers to be slightly higher given that we combine several years of data.
Notes
1 Financial support from the ESRC is
gratefully acknowledged. The LFS estimates were provided by N. O’Leary
with thanks to the ONS and Essex Data Archive. A technical
note and the raw data series are available from the
authors at e-mail address mailto:"s.drinkwater@mmu.ac.uk" ormailto:"d.leslie@mmu.ac.uk"
2 To aid comparability we select only where we have both a
NESSCORE and LFSSCORE observation for British residents of working age.
LFSSCORE has 390,230 observations and NESSCORE 393,062.
3 For a similar and more technical application of
this type of procedure, see Blackaby et al (1995).
References
Blackaby, D., Clark, K., Leslie, D. and Murphy, P. (1995),
‘The Changing Distribution of Black and White Earnings and the Ethnic Wage Gap:
Evidence for Britain’, Department of Economics Discussion Paper No. 95-07,
University of Wales Swansea.
Dale, A., Middleton, E. and Schofield, T.
(1995), ‘New Earnings Survey variables added to the SARs’, SARs Newsletter, No.
6.
Orchard, T. and Sefton, R. (1996), ‘Earnings from the Labour Force Survey
and the New Earnings Survey’, Labour Market Trends, April.