0% found this document useful (0 votes)
52 views18 pages

ILO Modelled Estimates 2019

The document provides an overview of the methodology used by the ILO to model estimates of key labor market indicators for countries that lack reported data. Some key points: - Econometric models are used to impute missing data and allow for consistent global and regional estimates each year. - Both reported and imputed data are combined into an internationally comparable dataset. - Models relate labor indicators to explanatory variables to impute missing observations and project indicators. Cross-validation is used to select the best statistical relationships. - Methodology has been adapted due to COVID-19 disruptions by introducing explanatory variables specific to the pandemic.

Uploaded by

anshulpatel0608
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views18 pages

ILO Modelled Estimates 2019

The document provides an overview of the methodology used by the ILO to model estimates of key labor market indicators for countries that lack reported data. Some key points: - Econometric models are used to impute missing data and allow for consistent global and regional estimates each year. - Both reported and imputed data are combined into an internationally comparable dataset. - Models relate labor indicators to explanatory variables to impute missing observations and project indicators. Cross-validation is used to select the best statistical relationships. - Methodology has been adapted due to COVID-19 disruptions by introducing explanatory variables specific to the pandemic.

Uploaded by

anshulpatel0608
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

 ILO modelled estimates

Methodological overview

January 2023

Department
of Statistics
ILO modelled estimates
Methodological overview

Department of Statistics
The ILO has designed and actively maintains a series of econometric models that are used to
produce estimates of labour market indicators in the countries and years for which country-
reported data are unavailable. The purpose of estimating labour market indicators for countries
with missing data is to obtain a balanced panel data set so that, every year, regional and global
aggregates with consistent country coverage can be computed. These allow the ILO to analyse
global and regional estimates of key labour market indicators and related trends. Moreover, the
resulting country-level data, combining both reported and imputed observations, constitute a
unique, internationally comparable data set of labour market indicators.

Relevant references
• International Conference of Labour Statisticians (ICLS). The ILO modelled estimates use
and promote the use of the recommendations of the ICLS1. For instance, the labour
underutilisation model estimates the indicators introduced by the 19th ICLS.
• Detailed documentation is available for the ILO modelled estimates of working-hours
(annexes of ILO Monitor: COVID-19 and the world of work), employment by economic
class (working poverty), and the labour income share and distribution.
• For a better understanding of the underlying data used in the ILO modelled estimates,
please refer to the quick guides on sources and uses of labour statistics, ILOSTAT
microdata processing, and interpretation of the unemployment rate.
• The ILO modelled estimates use as input data third-party databases including: the World
Economic Outlook from the International Monetary Fund, the World Development
Indicators and PovcalNet from the World Bank, UIS.Stat from UNESCO, and the World
Population Prospects and National Accounts Data from the United Nations.

Data collection and evaluation


The ILO modelled estimates are generally derived for 189 countries and territories,
disaggregated by sex and age as appropriate. For selected indicators an additional
disaggregation by geographical area (urban and rural) is presented. Before running the models
to obtain the estimates, labour market information specialists from the ILO Department of
Statistics, in cooperation with the Research Department, evaluate existing country-reported data
and select only those observations deemed sufficiently comparable across countries. The recent
efforts by the ILO to produce harmonized indicators from country-reported microdata have
greatly increased the comparability of the observations. Nonetheless, it is still necessary to select

1 The employment definition following 19th ICLS is not yet implemented in the ILO modelled estimates for
countries in which it would generate a methodological break, as there are not enough data points based on the new
standard to produce reliable global and regional estimates.
2
the data based on the following four criteria: (a) type of data source; (b) geographical coverage;
(c) age group coverage; and (d) presence of methodological breaks or outliers.

With regard to the first criterion, in order for labour market data to be included in a particular
model, they must be derived from a labour force survey, a household survey or, more rarely, a
population census. National labour force surveys are generally similar across countries and
present the highest data quality. Hence, the data derived from such surveys are more readily
comparable than data obtained from other sources. Strict preference is therefore given to labour
-force -survey-based data in the selection process. However, many developing countries, which
lack the resources to carry out a labour force survey, do report labour market information based
on other types of household survey or population census. Consequently, because of the need to
balance the competing goals of data comparability and data coverage, some (non-labour -force -
survey) household survey data and, more rarely, population -census-based data are included in
the models.

The second criterion is that only nationally representative (that is, not geographically limited)
labour market indicators are included. Observations corresponding to only urban or only rural
areas are not included, because large differences typically exist between rural and urban labour
markets, and the use of only rural or only urban data would not be consistent with benchmark
data such as gross domestic product (GDP). Nonetheless, when the data are broken down by
urban versus rural location, geographically limited data covering the area of interest are included.

The third criterion is that the age groups covered by the observed data must be sufficiently
comparable across countries. Countries report labour market information for a variety of age
groups, and the age group selected can influence the observed value of a given labour market
indicator.

The last criterion for excluding data from a given model is whether a methodological break is
present or a particular data point is clearly an outlier. In both cases, a balance is struck between
using as much data as possible and omitting observations likely to distort the results. During this
process, particular attention is paid to the existing metadata and the underlying methodology for
obtaining the data point under consideration.

Historical estimates can be revised in cases where previously used input data are discarded
because a source has become available that is more accurate according to the above-mentioned
criteria.

General methodology used to estimate labour market


indicators

3
Labour market indicators are estimated using a series of models that establish statistical
relationships between observed labour market indicators and explanatory variables. These
relationships are used to impute missing observations and to make projections for the indicators.

There are many potential statistical relationships, also called “model specifications”, that could be
used to predict labour market indicators. The key to obtaining accurate and unbiased estimates
is to select the best model specification in each case. The ILO modelled estimates generally rely
on a procedure called “cross-validation”, which is used to identify those models that minimize the
expected error and variance of the estimation. This procedure involves repeatedly computing a
number of candidate model specifications using random subsets of the data: the missing
observations are predicted, and the prediction error is calculated for each iteration. Each
candidate model is assessed on the basis of the pseudo-out-of-sample root mean square error,
although other metrics such as result stability are also assessed depending on the model. This
makes it possible to identify the statistical relationship that provides the best estimate of a given
labour market indicator. It is worth noting that the most appropriate statistical relationship for
this purpose may differ according to the country.

The extraordinary disruptions of the global labour market caused by the COVID-19 pandemic
have rendered the series of models underlying the ILO modelled estimates less suitable for
estimating and projecting the evolution of labour market indicators. For this reason, the
methodology has been adapted, and explanatory variables that are specific to the COVID-19
pandemic have been introduced into the modelling process.

The benchmark for the ILO modelled estimates is the 2022 Revision of the United Nations World
Population Prospects, which provides estimates and projections of the total population broken
down into five-year age groups. The working-age population comprises everyone who is at least
15 years of age.

Although the same basic approach is followed in the models used to estimate all the indicators,
there are differences between the various models because of specific features of the underlying
data. Further details are provided below for each model.

Estimates of the labour force


Methodological changes are introduced in the current version of the labour force participation
rate (LFPR) model in order to produce more granular age breakdowns. The basic data used as
input for the LFPR model are single-year LFPRs disaggregated by sex and age groups, the latter
comprising four intervals (15–24, 25–54, 55–64 and 65+). Compared with earlier years when only
two intervals were available (15–24 and 25+), the additional age groups significantly increase the
amount of input data. Moreover, estimates for the 25+ age group can still be recovered with the
new methodology. The underlying methodology has been extensively assessed in terms of
pseudo-out-of-sample performance. However, for certain types of missing data patterns, the
LFPR and the unemployment rate models are the only two models described in this appendix
which do not carry out automatized model selection.

4
Linear interpolation is used to fill in the missing data for countries for which such a procedure is
possible. This procedure produces accurate estimates of low variance, which is not surprising,
given that the LFPR is a very persistent variable. In all other cases, weighted multivariate
estimation is carried out. Countries are divided into nine estimation groups, chosen on the
combined basis of broad economic similarity and geographical proximity. Based on the data
structure and the heterogeneity among the countries covered by the input data, the model was
specified using panel data with country fixed effects. The regressions are weighted by the inverse
of the likelihood of a labour force survey’s availability. The explanatory variables used include
economic and demographic variables. To produce estimates for 2020, a cross-validation
approach is used to select the model that minimizes prediction error in that specific year. The
tested models include annual averages of high-frequency indicators related to the evolution of
the COVID-19 pandemic. An additional module is used to produce estimates for the year 2021. In
addition to the use cross-validation procedure for model selection, macroeconomic and labour
market indicators are utilized to estimate short-run dynamics while accounting for the pre-2020
trend. The global figures are calculated using the benchmark population from the United Nations
World Population Prospects and the LFPRs.

Rebalancing the estimates ensures that the implied total rate obtained from summing the
demographic breakdowns matches the total rate derived from the labour force surveys or
estimated.

In previous editions of the ILO modelled estimates detailed age information for the labour force
were available.2 Currently, the model has been discontinued and the associated dataset (except
for the indicator of the median age of the labour force) is no longer published to avoid
inconsistencies with the LFPR model described above.

Estimates of unemployment
This model estimates a complete panel data set of unemployment rates disaggregated by sex
and age (15–24, 25+). For countries for which at least one observation is reported, 3 regressions
involving country fixed effects are used. Three models are combined with equal weighting to
impute missing values. The models have been chosen on the basis of pseudo-out-of-sample root
mean square error and stability of results (the two components are weighted using expert
judgement). For countries with no reported observations, models are selected based on cross-
validation. The evolution of the average unemployment rate of a particular demographic group
in a particular region is highly predictive of the evolution of the unemployment rate of that
particular group in a country in that region. A separate cross-validation approach is used to select
the model that minimizes prediction error in the year 2020. The candidate models include annual
averages of high-frequency indicators related to the evolution of the COVID-19 pandemic. An

2 Five-year age intervals (15–19, 20–24, and so on until 60–64) and a last age group of 65 years and above.
3 For ease of exposition, we abstract here from the case in which reported observations exist for some demographic
groups but not for others in a given country and year.
5
additional procedure is used to produce estimates for 2021 which also uses a cross-validation
procedure to select models. These models account for the historical trend and utilize
macroeconomic indicators, including the dynamics of the unemployment rate in 2020. The
procedure shows unemployment to have displayed a recovery towards that historical trend in
2021. Rebalancing the estimates ensures that the implied total rate obtained from summing the
demographic breakdowns matches the total rate derived from the labour force surveys or
estimated.

Estimates of the jobs gap


The aim of the model is to provide aggregate estimates of the jobs gap rate by sex for the
population aged 15 or older. The jobs gap rate is the target variable estimated for countries with
missing data and is computed as follows:

(Unemployed + Potential labour force + Willing non jobseekers)


Jobs gap rate =
(Labour Force + Potential labour force + Willing non jobseekers)

where the potential labour force and willing non-jobseekers include persons who were seeking
employment and were not available but would become available in a short time (unavailable
jobseekers), persons who were not seeking work but were currently available (available potential
jobseekers) and persons who were not seeking work and were not available but were willing to
work (willing non-jobseekers).

The imputations for missing data are produced through four separate econometric models. First,
a model produces estimates from 2004 to 2019 for countries with at least one yearly data point
for the jobs gap rate by sex. Second, a model produces estimates from 2004 to 2019 for those
countries with no data on the jobs gap rate during the entire period. The third and fourth models
produce estimates for, respectively, 2020 and the period of 2021–22.

The four distinct models were chosen from an array of candidate models on the basis of cross-
validation, which selects the models with the highest accuracy in predicting the jobs gap rates in
pseudo-out-of-sample simulations. The predictions from the models are used to estimate the
missing observations of the jobs gap rate by sex. Interpolation procedures are applied to the
predictions to ensure that the model estimate coincides with the real observations and that
imputed data are consistent with real observations that are close in time. Since the models
estimate the jobs gap rates for the total population and for women and men separately, the
aggregated estimates for women and men may be incompatible with the total-population
estimates. The subcomponents for women and men are adjusted proportionally to match the
total-population estimates.

6
Estimates of informal employment
The target variable of the model is the informality rate disaggregated by sex for the population
aged 15 and older. The informality estimates include both nationally reported observations and
imputed data for countries with missing data. The gender-specific country-level data used for the
models include self-employment and part-time employment rates. The country-level data include
the percentage of people below various poverty lines, the share of employment in agriculture
and industry, the urbanization rate, the logarithm of GDP per capita, and categorical variables for
geographical regions and levels of economic development.

The imputations for missing data are produced through five separate econometric models. First,
a model produces estimates from 2004 to 2019 for countries with at least one yearly data point
for the share of informal employment by sex. Second, a model produces estimates from 2004 to
2019 for those countries with no data on the share of informality during the entire period. The
third and fourth models are used to produce estimates for, respectively, 2020 and 2021. The final
model estimates the projections for 2022. The five distinct models were chosen from an array of
candidate models based on cross-validation, which selects the models with the highest accuracy
in predicting informality rates in pseudo-out-of-sample simulations. The predictions from the
models are used to estimate the missing observations of the share of informal employment by
sex. Since the models estimate the informal rates for the total population and for women and
men separately, the aggregated estimates for women and men may be incompatible with the
total-population estimates. The subcomponents for women and men are adjusted proportionally
to match the total-population estimates.

Estimates of youth not in employment education or training


The target variable of the model is the share of youth (aged 15 to 24), not in education,
employment or training (NEET):

Youth not in education, employment or training


NEET share =
Youth population

It is worth noting that, by definition, 100 minus the NEET share gives the share of young people
who are either in employment or enrolled in some educational or training programme. The NEET
share is included as one of the indicators used to measure progress towards the achievement of
the SDGs – specifically Goal 8 (“Promote sustained, inclusive and sustainable economic growth,
full and productive employment and decent work for all”).

The model uses the principles of cross-validation and uncertainty estimation to select the
regression models with the best pseudo-out-of-sample performance, not unlike the
unemployment rate model. The NEET model estimates all demographic groups jointly, using the
appropriate categorical variable as a control in the regression, because the groups are
interdependent and data availability is roughly uniform across breakdowns. The model

7
incorporates the information on unemployment, labour force and enrolment rates into the
regressions (using it alongside other variables to reflect economic and demographic factors). The
resulting estimates include the NEET share and the number of youth NEET.

Estimates of hours worked


The ratio of weekly hours worked to the population aged 15–64 is the target variable that is
estimated for countries with missing data. Total weekly working hours are derived by multiplying
this ratio by the estimate of the population aged 15–64.

For estimates up to and including 2019, the regression approach uses the share of the population
aged 15–64 in the total population, the employment-to-population ratio and the rate of time-
related underemployment to predict missing values. For countries with no observations of this
indicator, the country intercept is estimated by combining the regional mean and the income
group mean.

Working hours up to and including the third quarter of 2022 are estimated using the ILO
nowcasting model. This is a data-driven statistical prediction model that draws on the values of
high-frequency indicators in real time or with a very short publication lag in order to predict the
current value of the target variable. The specific target variable of the ILO nowcasting model is
the change in hours worked adjusted for population aged 15–64 relative to the fourth quarter of
2019 (seasonally adjusted). For an in-depth methodological description, consult Gomis et al.
(2022). The model produces an estimate of the change in hours worked adjusted for the
population aged 15–64 relative to this baseline. In addition, a benchmark of weekly hours worked
in the fourth quarter of 2019 is used to compute the full-time equivalent jobs represented by the
changes in working hours adjusted for population aged 15–64. This benchmark is also used to
compute the time series of average hours worked adjusted for population aged 15–64.

The ILO nowcasting model draws from multiple sources: labour force survey data up to the third
quarter of 2022 and up-to-date high-frequency economic data such as retail sales, administrative
labour market data and confidence survey data. up-to-date mobile phone data from Google
Community Mobility Reports and the most recent values of the COVID-19 Government Response
Stringency Index (hereafter “Oxford Stringency Index”) are also used in the estimates.

Drawing on available real-time data, the model estimates the historical statistical relationship
between these indicators and hours worked per person aged 15–64 and uses the resulting
coefficients to predict how hours worked adjusted for population aged 15–64 change in response
to the most recent observed values of the nowcasting indicators. Multiple candidate relationships
were evaluated on the basis of their prediction accuracy and performance around turning points
to construct a weighted average nowcast. For countries for which high-frequency data on
economic activity were available, but either data on the target variable were not available or the
above methodology did not work well, the estimated coefficients and data from the panel of

8
countries were used to produce an estimate. The resulting estimates are referred to as being
produced by a “direct nowcast”.

An indirect approach is applied for the remaining countries: this involves extrapolating the
observed or estimated (using the direct nowcast) change in hours adjusted for population aged
15–64. The extrapolation is based on the observed decline in mobility derived from the Google
Community Mobility Reports and the Oxford Stringency Index, since countries with comparable
drops in mobility and similarly stringent restrictions are likely to have experienced a similar
decline in hours worked adjusted for population aged 15–64. From the Google Community
Mobility Reports, an average of the workplace and “retail and recreation” indices is used. The
stringency and mobility indices are combined into a single variable using principal component
analysis.4 For countries without data on restrictions, mobility data (if available) and up-to-date
data on the incidence of COVID-19 were used to extrapolate the impact on hours worked adjusted
for population aged 15–64. Because of countries’ different practices in counting cases of COVID-
19 infection, the more homogeneous concept of deceased patients is used as a proxy for the local
intensity of the pandemic. The variable was averaged for each month, but the data were updated
daily on the basis of the Our World in Data online repository.5 Finally, for a small number of
countries with no data readily available at the time of estimation the regional average was used
to impute the target variable. In 2022, the model was modified to include GDP growth estimates
and regional trends data and to take into account time series properties of hours worked.

With the ILO nowcasting model estimates completed, the ratio of weekly hours worked relative
to the fourth quarter of 2019 is estimated for men and women separately. These estimates for
female and male changes in hours worked adjusted by the corresponding population aged 15–
64 relative to the fourth quarter of 2019 (seasonally adjusted) are produced using the ILO
nowcasting-by-gender model. The change in hours worked for country i, sex s and quarter t is
computed as follows:

Hours workedi,s,t
Population aged 15– 64i,s,t
Change in hours worked relative to Q4 2019i,s,t =
Hours workedi,s,Q4 2019
( Population aged 15– 64i,s,Q4 2019 )

4 Additionally, for the first three quarters of 2021, a dummy variable is used dividing countries into two clusters
to account for differential impacts of those variables on working-hours. The clusters are based on a k-means split
by geographical region, demographic characteristics and national income per capita. In addition, a de-trending
procedure for Google Mobility Reports data is used.
5 Hannah Ritchie, Edouard Mathieu, Lucas Rodés-Guirao, Cameron Appel, Charlie Giattino, Esteban Ortiz-
Ospina, Joe Hasell, Bobbie Macdonald, Diana Beltekian and Max Roser (2020) - "Coronavirus Pandemic
(COVID-19)". Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/coronavirus'
[Online Resource]
9
The data used in the model include estimates of the country’s sex-aggregated ratio of weekly
hours worked (see the ILO nowcasting model above), country demographic and economic
characteristics and a regional dummy variable. The gender decomposition model is composed of
four separate models. First, a model produces estimates from the first quarter of 2020 to the
fourth quarter of 2021 for countries with data on hours worked for at least one quarter. Second,
a model produces estimates from the first quarter of 2020 to the fourth quarter of 2021 for
countries with no hours worked data during that period. Third, a model produces estimates for
the first quarter of 2022. Finally, a model produces the projections for the second and third
quarters of 2022.6 These models that make up the nowcast by gender were chosen from an array
of models based on their accuracy in predicting changes in female and male hours worked. Next,
the predictions from the selected models are used to estimate the missing observations of hours
worked.7 Given that the models estimate the change in hours worked for women and men
separately, the aggregated estimates for women and men may be incompatible with the total
population estimates of the nowcasting model. To produce compatible estimates, the
subcomponents for women and men are adjusted proportionally to match the total loss in
worked hours adjusted for population aged 15–64 estimated by the nowcasting model.

For analytical purposes, an estimate of the gender gap in hours worked can be estimated using
the change in weekly hours worked relative to the fourth quarter of 2019 by sex. A change in the
gender gap can be computed as the change in working hours of men minus the change in
working hours of women at the country level. Finally, to obtain a weighted global aggregate,
countries’ changes in the gender gap relative to the fourth quarter of 2019 are aggregated, the
weights being given by each country’s female total hours worked in the relevant quarter. Thus,
the global aggregate estimate for the gender gap can be computed as follows:

Global change in the gender gap in hours worked relative to Q4 2019t


i=189

= ∑ ((Male change in hours worked relative to Q4 2019i,t


i=1
− Female change in hours worked relative to Q4 2019i,t )
Female hours workedi,t
× )
∑i=189
i=1 Female hours workedi,t

This weighting scheme avoids compositional effects that arise from the size of each country’s
initial gender gap.

Estimates of labour underutilization (LU2, LU3 and LU4 rates)


The target variables of the model are the measures of labour underutilization defined in the
resolution concerning statistics of work, employment and labour underutilization adopted by the

6 The different periods were selected owing to the differing availability of reported observations of hours worked.
7 The sex-disaggregated estimates of hours worked in India were obtained using urban employment levels as a
proxy for hours worked, since recent data were available from the Periodic Labour Force Survey.
10
19th International Conference of Labour Statisticians (ICLS) in October 2013. These measures
include the combined rate of time-related underemployment and unemployment (LU2), the
combined rate of unemployment and the potential labour force (LU3), and the composite
measure of labour underutilization (LU4). The measures are defined as:

Unemployed + Time related underemployed


LU2 =
Labour force

Unemployed + Potential labour force


LU3 =
Labour force + Potential labour force

Unemployed + Potential labour force + Time related underemployed


LU4 =
Labour force + Potential labour force

Persons in time-related underemployment are defined as all persons in employment who, during
a short reference period, wanted to work additional hours, whose working time in all their jobs
was below a specified threshold of hours, and who were available to work additional hours if they
had been given the opportunity to do so. The potential labour force consists of people of working-
age who were actively seeking employment, were not available to start work in the reference
week but would become available within a short subsequent period (unavailable jobseekers), or
who were not actively seeking employment but wanted to work and were available in the
reference week (available potential jobseekers).

The model uses the principles of cross-validation and uncertainty estimation to select the
regression models with the best pseudo-out-of-sample performance, not unlike the
unemployment rate model. The labour underutilization model, however, has three very specific
features. First, all demographic groups are jointly estimated, using the appropriate categorical
variable as a control in the regression, because the groups are interdependent and data
availability is roughly uniform across breakdowns. Second, the model incorporates the
information on unemployment and labour force into the regressions (used alongside other
variables to reflect economic and demographic factors). Finally, the LU4 rate is uniquely pinned
down by the LU2 and LU3 rates, since it is a composite measure based on the two indicators.

The resulting estimates include the LU2, LU3 and LU4 rates and the level of time-related
underemployment and of the potential labour force.

Estimates of the distribution of employment by status, occupation, and


economic activity
The distribution of employment by status, occupation, and economic activity (sector) is estimated
for total employment and is also disaggregated by sex. In the first step, a cross-country
regression is performed to identify the share of each of the employment-related categories in
countries for which no data are available. This step uses information on demography, per capita
income, economic structure, and a model-specific indicator with high predictive power for the
estimated distribution. The indicators for each category are as follows:

11
• for status, the index called “work for an employer” from the Gallup World Poll;
• for occupation, the share of value added of a sector in which people with a given
occupation are most likely to work;
• for sector, the share of value added of the sector.

The next step estimates the evolution of the shares of each category, using information on the
economic cycle and on economic structure and demographics. The third step estimates the
change in the shares of each category in the years 2020 and 2021. Lastly, the estimates are
rebalanced to ensure that the individual shares add up to 100 per cent.

The estimated sectors are based on an ILO-specific classification that ensures maximum
consistency between the third and fourth revisions of the United Nations International Standard
Industrial Classification of All Economic Activities (ISIC). The sectors A, B, C, F, G, I, K, O, P and Q
correspond to the ISIC Rev.4 classification. Furthermore, the following composite sectors are
defined:

• “Utilities” is composed of sectors D and E.


• “Transport, storage and communication” is composed of sectors H and J.
• “Real estate, business and administrative activities” is composed of sectors L, M and N.
• “Other services” is composed of sectors R, S, T and U.

The estimated occupations correspond in principle to the major categories of the 1988 and 2008
iterations of the ILO International Standard Classification of Occupations (ISCO-88 and ISCO-08).
However, subsistence farming occupations are classified inconsistently across countries, and
sometimes even within one country across years. According to ISCO-08, subsistence farmers
should be classified in ISCO category 6, namely as skilled agricultural workers. However, a
number of countries with a high incidence of subsistence farming reported a low share of workers
in category 6, but a high share in category 9 (elementary occupations). This means that the shares
of occupational categories 6 and 9 can differ widely between countries that have a very similar
economic structure. It is not feasible to determine the extent of misclassification between
categories 6 and 9. Consequently, to obtain a consistent and internationally comparable
classification, categories 6 and 9 are merged and estimated jointly.

Estimates of employment by economic class


The estimates of employment by economic class are produced for a subset of countries. The
model uses the data derived from the unemployment, status, and economic activity models as
inputs in addition to other demographic, social and economic variables.

The methodology involves two steps. In the first step, the various economic classes of workers
are estimated using the economic classes of the working plus non-working population (among
other explanatory variables). This procedure is based on the fact that the distribution of economic
class in the overall population and the distribution in the working population are closely related.
The economic class of the overall population is derived from the World Bank’s PovcalNet

12
database.8 In general, economic class is defined in terms of consumption, but in particular cases
for which no other data exist income data are used instead.

Once the estimates from this first step have been obtained, a second step estimates data for
those observations for which neither data on the economic class of the working population nor
estimates from step 1 are available. This second step relies on cross-validation and subsequent
selection of the best-performing model to ensure a satisfactory performance.

In the present edition of the model, employment is subdivided into five different economic
classes: workers living on less than US$1.90 per day, on more than US$1.90 and less than US$3.20
per day, on more than US$3.20 and less than US$5.50 per day and above US$5.50 per day, in PPP
terms.

Estimates of the labour income share and the labour income distribution
The model estimates a complete panel dataset of the labour income share and the labour income
distribution. To this end, national accounts data from the United Nations Statistics Division and
labour income data from the ILO Harmonized Microdata collection are combined. When national
accounts data or microdata are not available, the estimates rely on a regression analysis to
impute the necessary data. The imputation is based on countries that are similar in terms of key
economic and labour market variables.

The methodology involves two steps. The first step is to compute the labour income share,
adjusted for the labour income of the self-employed. Taking into account the labour income of
the self-employed has been recognized in the economic literature as a crucial element for
international comparability. In order to achieve this, detailed data on status in employment are
used (from the model outlined in the preceding section), which subdivides self-employment into
three different groups: own-account workers, contributing family workers and employers.
Furthermore, the labour income of each group of the self-employed relative to the income of
employees is estimated on the basis of a regression analysis of the microdata. The resulting
estimate corresponds to the share of total income that accrues to labour:

Labour income
Labour income share =
Gross domestic product

The second step, drawing on the level of labour income estimated in the first step and on the
microdata, produces a detailed distribution, at the percentile level, of the labour income for each
country and year. It is thus possible to determine the percentage of aggregate labour income
that accrues to the bottom (first) percentile, to the second percentile, and so on. Importantly,

8 The 2020-2021 poverty data are from the World Bank, Macro and Poverty Outlook: Country-by-country
analysis and projections for the developing world, World Bank, Washington, DC, 2021, combined with World
Bank estimates (June 2021 edition) of the impact of COVID-19 on poverty. For a discussion of the methodology
to estimate the impact, see Gerszon Mahler, Daniel, et al., ‘Updated Estimates of the Impact of COVID-19 on
Global Poverty: Turning the corner on the pandemic in 2021?’,World Bank Blogs, 24 June 2021.
13
given that the definition of employment follows the ICLS recommendations, the labour income is
estimated on a per worker basis, not on a full-time equivalent basis. Additionally, the distribution
of labour income at the global and regional level is computed, at the decile level. Because of the
cross-country differences in prices, the distribution of global and regional labour income deciles
is computed in purchasing power parity terms.

Estimates of key indicators by geographical area: Urban and rural labour


market indicators
Separate estimates for urban and rural areas are produced for the following indicators: labour
force, unemployment, LU2, LU3, LU4, youth NEET share and the employment distribution by
status, economic activity, and occupation.

To produce the estimates, the models decompose the variable of interest into two components.
The procedure described here is for the labour force model; an analogous procedure is used for
the other models. The labour force participation rate (LFPR) by geographical area that the model
estimates can be expressed as:

Labour forceij
Labour force participation rateij =
Populationij

i = {urban, rural} ; j = {gender × age}

One relationship of particular importance between the urban and rural rates and the national
rates is that the distance of the former rates to the latter rate determines the respective share of
the urban and rural population (the denominator of the LFPR expression). The strategy of the
modelling approach is to target, for the estimation, two variables that jointly determine the rural
and urban LFPRs. The main variable used to produce the LFPR is the spread between urban and
rural LFPR:

Urban LFPR 1
Spread urban = =
Rural LFPR Spread rural

This variable alone does not pin down both the urban and rural LFPRs. Another variable is
necessary to complete the system of equations that can be used to produce the two rates. The
other variable is the share of the denominator of the LFPR expression by type of area, which is
simply the population:

Urban labour force /Urban LFPR


Share urban = = 1 − Share rural
Rural labour force/Rural LFPR + Urban labour force/Urban LFPR

Decomposing the two rates into the spread and share variables has two main advantages. First,
it makes it possible to model explicitly the dependence between the distances of the two rates to
the total rate and the share of the population in urban and rural areas. The second advantage is
that this framework is easy to extrapolate to the other variables of interest. Once these two

14
auxiliary variables have been estimated using regression methods, the results can easily be used
to compute the urban and rural rates of interest:

LFPR
Urban LFPR =
Share rural
Share urban +
Urban spread

LFPR − Share urban ∗ Urban LFPR


Rural LFPR =
Share rural

As mentioned above, the unemployment, labour underutilization, NEET and employment


distribution models follow the same procedure.

To estimate the spread and share for all the variables, the models of key indicators by
geographical area use the principles of cross-validation and uncertainty estimation to select the
regression models with the best pseudo-out-of-sample performance, not unlike the
unemployment rate model. However, the targets of the estimation are the spread and share
variables instead of the variable of interest directly. In the geographical models, all demographic
groups are jointly estimated, using the appropriate categorical variable as a control in the
regression, because the groups are interdependent and data availability is roughly uniform
across breakdown. The models use various indicators to reflect economic and social factors as
explanatory variables for the imputation. Finally, the modelling procedure ensures the
consistency of interdependent variables. For this purpose, labour force estimates are used as a
basis for the models of the distribution of unemployment and labour underutilization by
geographical area. The population benchmark, derived from the labour force model, is used in
the model of the NEET distribution by geographical area. Similarly, estimates of unemployment
by rural and urban area are used as the basis for the estimates of labour underutilization by
geographical area. Finally, the employment estimates derived jointly from the models of the
distribution of the labour force and unemployment by geographical area are used as a basis for
estimating the distributions of employment with respect to status, economic activity, and
occupation by geographical area.

The resulting estimates are of the shares or rates and the corresponding levels. The following
estimates are available by rural and urban breakdown: LFPR, number of people in the labour
force, unemployment rate, unemployment level, LU2 rate, time-related underemployment, LU3
rate, potential labour force, LU4 rate, composite labour underutilization measure, and the
distribution of employment by status, economic activity, and occupation.

Models used to project labour market indicators

15
The ILO has developed projection models to estimate and forecast hours worked 9, employment,
unemployment and the labour force for the years 2022 to 2024. In a first step, projections are
made at quarterly frequency up to the fourth quarter of 2023 for around 50 countries where
labour market indicators are available at quarterly frequency for at least part of 2022. In a second
step, annual projections are made up to 2024 for all countries – taking as given the annual
averages of the projections from the first step for those countries where these are available.
Projections based on the first step have the advantage of taking into account the latest labour
market information and latest high-frequency data, which greatly enhances the accuracy of
estimates of labour market indicators for the year 2022 and also improves the short-term
forecasting performance.

Step 1. Projections at quarterly frequency


The quarterly projections for the unemployment rate, the employment-to-population ratio, the
labour force participation rate, and the ratio of hours worked to population aged 15–64 use high-
frequency data such as confidence indices in addition to economic growth forecasts to test a
series of models. The approach is in line with the direct nowcasting approach used to estimate
hours worked (Gomis et al. 2022). These models are evaluated using the model search routines
described above, including splitting the data into training and evaluation samples. Models are
combined using a “jackknife model-averaging” technique described by Hansen and Racine (2012),
which essentially finds the linear combination of models that minimizes the variance of the
prediction error. The hours worked per person aged 15–64 are only projected for the fourth
quarter of 2022 (nowcasts exist until the third quarter), and all other indicators are projected up
to the fourth quarter of 2023 – including the breakdowns by sex and age.

The ratios of employment and labour force to the population have been strongly affected by the
COVID-19 crisis. The projection model is based on the assumption that these ratios will have a
tendency to return to their long-term trend. Basically, people come back into the labour market
and try to find employment. In technical terms, the projection is based on an error correction
model, the correction parameter being estimated using an econometric specification that
includes the gap between the actual historical series and the long-term trend.10

Step 2. Projections at the annual frequency

The annual projection pools countries and utilizes vector error correction models. Five different
indicators are projected: the employment-to-population ratio, the labour force participation rate,

9 The projection in the case of hours worked starts at the fourth quarter of 2021, as the first three quarters of 2021
are produced by the nowcasting model.
10 The long-term trend is estimated using a Hodrick–Prescott filter with a smoothing parameter of 3,200, which
is larger than the parameter of 1,600 usually used in filtering time series at quarterly frequency and hence results
in less variability in the trend.
16
the unemployment rate, the ratio of weekly hours worked to population aged 15–64, and the
weekly hours worked per person employed. This estimation strategy over-identifies the target
variables: hours worked are projected twice, and the labour force can also be computed as the
sum of unemployment plus employment. The redundancies are averaged and reduce the reliance
on a single specification.

Three different approaches are used to derive projections, which are then combined into a
weighted average. In all three approaches the forecast variable of interest is the annual change
in the above-mentioned indicators. The first approach contains elements of error correction,
while the second and third approaches don’t. The first and second approach pool countries
globally, while the third approach pools countries according to similarity.

References
Gomis, Roger, Paloma Carrillo, Steven Kapsos, Stefan Kühn, and Avichal Mahajan. 2022. “The ILO
Nowcasting Model: Using High-Frequency Data to Track the Impact of the COVID-19 Pandemic
on Labour Markets”. Statistical Journal of the IAOS 38 (3): 815–830.

Hansen, Bruce, and Jeffrey Racine. 2012. “Jackknife Model Averaging”. Journal of Econometrics 167
(1): 38–46.

Mahler, Daniel Gerszon, Nishant Yonzan, Ruth Hill, Christoph Lakner, Haoyu Wu, and Nobuo
Yoshida. 2022. “Pandemic, Prices, and Poverty”. World Bank Blogs, 13 April 2022.

17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy