January 10, 2026 4 min read

Open-Sourcing the Causal Climate Dataset Scripts for Kenya’s ASALs

Leah Njuguna (MSc.)

Leah Njuguna (MSc.)

PhD Researcher

Open-Sourcing the Causal Climate Dataset Scripts for Kenya’s ASALs

A GreenScope Analytics Platform Test

Why We Open-Sourced This



At GreenScope Analytics, we are building a platform for causal climate intelligence, not just climate prediction.

As part of stress-testing the platform, we made a deliberate decision to open-source the code used to construct the core causal dataset for Kenya’s Arid and Semi-Arid Lands (ASALs).

This is not the entire platform.
It is the foundation layer — what we internally refer to as Module 0.

Why open-source this layer?

Because if the data itself is not:
  • transparent,

  • reproducible,

  • and causally coherent,


  • then no amount of modeling on top of it can be trusted.

    ---

    What Exactly Is Open-Sourced?



    The open-sourced repository contains the code used to build a causal-ready climate dataset, starting from raw global and local observations and ending with a unified ASAL data cube.

    Specifically, the repository includes:

  • Data ingestion and validation scripts

  • Spatial and temporal harmonization logic

  • Memory-efficient handling of large climate files

  • Explicit causal role assignment across variables


  • This code produces a dataset designed for causal analysis, not just correlation-based modeling.

    👉 The repository focuses on how the dataset is constructed, not on proprietary modeling, analytics, or downstream decision layers.

    ---

    The Datasets Involved



    The open-sourced pipeline integrates:

    Global Climate Drivers (Exogenous)



  • ENSO (El Niño–Southern Oscillation) — Niño 3.4 Index (NOAA)

  • Indian Ocean Dipole (IOD) — Dipole Mode Index (NOAA)


  • These are treated as external forcing mechanisms, capable of triggering large-scale climate regime shifts.

    Local Climate & Ecological Systems



  • CHIRPS – Precipitation (water input)

  • ERA5-Land – Potential Evaporation & Soil Moisture (atmospheric demand and hydrological memory)

  • MODIS NDVI – Vegetation response


  • Each variable is assigned a causal role (driver, stressor, mediator, or response) at the dataset level — before any modeling begins.

    ---

    Why This Is Not “Just Research Code”



    This work is part of a platform test, not a standalone academic exercise.

    The objective was to answer a very practical question:

    Can GreenScope’s infrastructure reliably turn heterogeneous climate data into a causally structured dataset at scale?


    By open-sourcing this step, we are:

  • Pressure-testing our data engineering assumptions

  • Inviting scrutiny of our causal design choices

  • Demonstrating how we handle real-world climate data constraints

  • Providing a reference implementation for others working on climate causality


  • In short: this is production-grade thinking, even though the platform itself remains private.

    ---

    What We Did Not Open-Source (Intentionally)



    To be clear, this repository does not include:

  • Proprietary causal discovery algorithms

  • Intervention or counterfactual engines

  • Decision-support logic

  • Platform orchestration layers


  • Those remain part of the GreenScope Analytics platform.

    The goal here is trust at the data foundation, not exposure of the full stack.

    ---

    Why This Matters



    Climate risk in regions like Kenya’s ASALs is driven by:

  • non-stationary relationships,

  • interacting global and local systems,

  • and delayed, cascading effects.


  • If we want climate intelligence systems that are:

  • explainable,

  • trustworthy,

  • and actionable,


  • then causal datasets must be treated as first-class infrastructure — not as an afterthought.

    This open-source release is our contribution toward that direction.

    ---

    Where to Find the Code



    The repository is publicly available on GitHub and documents:

  • How raw climate datasets are ingested

  • How they are aligned spatially and temporally

  • How causal structure is preserved at the data level

  • How large files are handled safely and reproducibly


  • Github Repo: (https://github.com/LeahN67/greenscope-asal-casuality-dataset)

    📌 This repository represents the dataset construction layer used to test the GreenScope Analytics platform for ASAL climate intelligence.

    ---

    Closing



    Open-sourcing this work is not about giving everything away.
    It is about earning trust where it matters most — at the data layer.

    We’re deep in the trenches building this platform, and this release reflects how seriously we take that responsibility.

    Leah Njuguna (MSc.)

    Leah Njuguna (MSc.)

    Published Jan 10, 2026

    0 shares

    Explore More Publications

    Discover more insights on climate analytics and sustainability.

    View All Publications