How statistical analyses help determine where and when students are most vulnerable to education disruption

Expert Contributions 2 Jun 2021 - 17:32 SAST

How statistical analyses help determine where and when students are most vulnerable to education disruption

Expert Contributions 2 Jun 2021 - 17:32 SAST

By the Malala Fund and SAS Data for Good

To identify countries where girls are most at risk of experiencing educational interruptions and predict lowering completion rates of girls’ primary and secondary education due to climate change, the Malala Fund has partnered with SAS to build the Girls’ Education and Climate Challenges Index.

It is the first time the annual Girls’ Education Challenges Index (GECI), which identifies the countries where it is most challenging for girls to access education, is combined with the ND-GAIN Index, which summarises a country’s vulnerability and resilience to climate change and other global challenges. The resulting composite index indicates where girls face the greatest threats to their education and are most vulnerable to climate change. It also shows how in each country climate change compounds existing education challenges for girls.

The effect of climate change on access to education can be less evident than the visible impacts. Yet, when natural disasters occur, young female students are often more at risk of educational disruption than their male counterparts. When access to water is scarce, girls are most often responsible for traveling the long distances to collect water, keeping them out of the classroom. When temperatures rise and income-producing agriculture is lost, girls most often leave their schooling behind because families can no longer afford to pay educational fees.

Our report confirms that girls’ education is one of the most powerful strategies to mitigate the impact of climate change. But as this data project with SAS shows, climate-related events are keeping millions of girls from learning.

Naomi Nyamweya, Malala Fund researcher

Data and methodology

The composite index uses public data on access, completion, learning outcomes and gender disparities in education in addition to wider contextual risks to analyse the relative challenges that girls in low- and lower- middle-income countries face in accessing 12 years of quality education. It draws on research backed by UNESCO’s UIS database, the INFORM Risk Index, and Notre Dame.

Through the creation of deciles, the Girls’ Education Challenges Index (GECI) assigns countries a score between 1 to 10 in each of these areas (with 1 being less challenged), after which the scores are aggregated and ranked.

This data is then combined with the ND-GAIN Country Index, which summarises a country’s vulnerability to climate change and other global challenges in combination with its readiness to leverage private and public sector investment for adaptive actions. It uses two decades of data across 45 indicators, and assesses vulnerability by considering:

  • The impact of the future changing climate conditions (exposure) on a country’s society and its supporting sectors;
  • How climate-related disturbances (sensitivity) impact people and the sectors they depend on; and
  • The ability of society and supporting sectors to adjust to and reduce potential damage and respond to the negative consequences of climate events (adaptive capacity).

The ND-GAIN Country Index considers three components to assess overall readiness: economic readiness, governance readiness and social readiness. Economic readiness refers to the investment climate that facilitates mobilising capital from the private sector. Governance readiness pertains to the stability of the society and institutional arrangements that contribute to investment risks. Social readiness refers to the social conditions that help society make efficient and equitable use of investment and yield more benefit from the investment.

The climate index variable uses gradient boosting to predict the impact of climate-associated events on designated GECI indicators. Pre-processing of the climate data included segmentation and sorting of climate and natural factors using k-means clustering. The resulting segments provided further insight into the relationships between scores and metrics that increased the accuracy of the overall climate model.

When each country had a predicted climate impact score, they were ranked according to socioeconomic status, which was then used as an input for the final rankings. The climate score and GECI score were weighted equally because the impact of climate events or any other disturbance on education is highly dependent on the resiliency, strength of infrastructure and stability of the country.

The final Girls’ Education and Climate Challenges Index score was created by taking the square root of the product of the two scores, equally contributing climate and girls’ education factors found to be significant in segmentation and clustering.

Estimating the completion rate 

K-means clustering was used to determine organic trends within the historical data before imputing missing completion rates with logistic regression. Once completion rates were established, the data was run through four separate forecasting methodologies: 

  • Hierarchical
  • AutoARIMA
  • Seasonal
  • Regression for time series

Running the model

Initially, the team ran the data through the model without any climate variables included and assumed consistent impacts of climate change in historical data. The resulting forecast became the baseline for any future assumptions to test education rates and null hypothesis. 

Next, they incorporated the climate factors into the algorithm with a weighting scale to account for changing and more intense climate events. The difference between the two forecasts (the reduction in completion rate) were used as the assumed impact of climate change on the girls’ completion rates at a global scale, with a 95% confidence interval.

Using these rates, the researchers estimated the number of girls affected by applying the derived difference to official projections of the total population of girls in that specific cohort (that is, those in the age range three to five years above the last intended age for that level of education) within each country. 

The differences derived from comparing the two models were also used to sense-check estimates, to verify consistency of magnitude and direction.

The algorithms operate on a forecast horizon of 11 years, the hierarchical levels are at the economical and country level and the best algorithm was selected using Weighted Mean Absolute Percentage Error (WMAPE).

The analysis includes breakdowns by education level – primary, lower secondary and upper secondary – as well as by country, focusing on low and lower middle-income countries.

Findings and recommendations

The analyses predict, by year, which countries are at most risk of girls’ education disruption. It considers educational information like grade-level completion rates and environmental factors, including likelihood of flooding, tsunamis and earthquakes in each particular country. 

Based on the combined indices, the region most affected is sub-Saharan Africa, though this region contributes the least to climate change. Countries in other regions are also affected, including the Philippines, Mongolia and Kiribati.

This information should help the development sector target technical and financial support both for climate adaptation and better girls’ education outcomes. Unless progress is made, Malala Fund’s report estimates that in 2021, climate-related events will prevent at least four million girls in low- and lower-middle-income countries from completing their education. If current trends continue, by 2025 climate change will play a part in preventing at least 12.5 million girls from completing their education each year.

From their findings the report presents the connections between girls’ education, gender equality and climate change and suggest recommendations on how governments can improve girls’ access to education, including:

  • Declaring 2022 a year of action on climate education and make climate change a core curriculum subject
  • Updating Nationally Determined Contributions (NDCs) to recognise the part that girls’ education play to climate adaptation and mitigation strategies, and plan for emergency education provisions in ways that do not disadvantage girls.
  • At COP28 in 2023 all governments should submit national climate change learning strategies.
  • Prioritising public investment in the green economy.

Learn more

The data analytics team included Naomi Nyamweya of the Malala Fund, and SAS Data for Good volunteers Ayana Littlejohn, Jenny Clay, Miriam Ramírez, Rochelle Fisher, Sarah Hiser, Selena Mau and Tammy Baird-Andrews.
The Malala Fund advocates for resources and policy changes needed to give all girls a secondary education, invests in local education leaders and amplifies the voices of girls fighting for change. Learn more at, or about their project partnership with SAS here