Do Brooks’ (2020) Identifying Assumptions Hold in Uganda?
Empirical Stress-Test at River Crossing Points — Mt. Elgon, Uganda
1 Study Area
The study covers river crossing points in the Mt. Elgon region of eastern Uganda (Mbale and Sironko districts). The Mt. Elgon massif rises to 4,321 m on the Uganda–Kenya border, draining a dense network of rivers that descend through agricultural land to the plains below. Communities on either side of these rivers depend on seasonal fords or informal crossings. This is one of the programme areas where Fika (formerly Bridges to Prosperity) has assessed and built pedestrian suspension bridges.
The 124 crossings were identified by intersecting the OSM road network with OSM mapped rivers within the Mt. Elgon watershed. Crossings span elevations from 889 m to 2,459 m. Of the 193 OSM river segments in the study area, 35 are classified as river (likely Strahler order ≥ 4) and 144 as stream, reflecting the range from major valleys to upland tributaries.
2 Data
2.1 River crossing points
Each crossing point is the geometric intersection of an OSM road and an OSM river. This captures all road-accessible river crossings in the study area, including paved bridges, culverts, and seasonal fords. Attributes retained from OSM: road type, pedestrian-priority flag, river name, OSM IDs.
2.2 Engineering difficulty proxies
Four proxies capture the engineering constraints that determine whether Bridges to Prosperity constructs a bridge:
| Variable | Description | Source | N (of 124) | Median (IQR) |
|---|---|---|---|---|
| Slope (°) | Mean terrain slope within 100 m buffer | 19 m DEM (elevatr/AWS) | 118 | 2.2 (0.9–5.0) |
| Catchment area (km²) | Upstream contributing area; max within 500 m buffer | WhiteboxTools D8 | 124 | 0.7 (0.2–5.0) |
| Flood Q50 (m³/s) | 50-yr return flood via FEHSSA regression on catchment | FEHSSA / WorldClim | 124 | 17.1 (8.0–48.3) |
| Waterway type (OSM) | river / stream / canal / drain — OSM classification | OSM | 124 | stream (n=144) |
| Stream order (Strahler) | Strahler order from DEM; max within 500 m buffer | WhiteboxTools | 124 | 4 (2–5) |
Note on stream order coverage. The upgrade to a 19 m DEM (elevatr/AWS z=12) combined with a 500 m snap buffer now yields Strahler stream order for all 124 crossings. Stream order is therefore included as a fourth variable in the composite PC1 difficulty index alongside slope, catchment area, and flood Q50. OSM waterway_type is retained as an independent categorical check.
2.3 Baseline socioeconomic characteristics
These are not outcomes of the bridge program. They are satellite- and OSM-derived proxies for the economic environment that existed at each crossing before any bridge was built. The empirical test asks whether a site’s engineering difficulty predicts these pre-existing conditions. If it does not, then difficult-to-bridge and easy-to-bridge crossings sit in comparable economic environments — and treatment assignment via the engineering criterion is as-good-as-random. If it does, then the comparison group is structurally advantaged or disadvantaged before the intervention.
The bridge program’s true causal outcomes (household wage income, consumption, flood-shock resilience) are measured in the Brooks household survey and are not available to us. Our baseline characteristics serve as the closest observable analogue.
| Baseline characteristic | Description | Source | Year | N (of 124) |
|---|---|---|---|---|
| Nighttime lights | Mean VIIRS radiance within 2 km — proxy for local economic activity | VIIRS/NPP | 2014 | 122 |
| Population density | WorldPop persons/km² within 2 km | WorldPop | 2015 | 122 |
| Road density | OSM road length (km) per km² within 2 km — infrastructure proxy | OSM | 2024 | 124 |
| Travel time to city | Minutes to nearest city ≥ 50,000 — market access proxy | Weiss et al. | 2019 | 112 |
| Agricultural suitability | Temp × precip suitability index (0–100) — livelihood proxy | WorldClim 2.1 | 1970–2000 | 122 |
| Relative Wealth Index | Meta/HDX asset-based wealth score (~2.4 km tiles) — household wealth proxy | Meta AI Research | 2015–2020 | ~124 |
| Child dependency ratio | Under-15 population / total population within 2 km — demographic structure proxy | WorldPop age-sex structure | 2020 | ~124 |
2.4 Geographic controls
Two variables capture the geographic gradient that could simultaneously drive both engineering difficulty and economic outcomes:
- Elevation (SRTM 90 m, standardised): separates valley-floor crossings from upland crossings.
- Distance to Kampala (km, standardised): a proxy for market integration and state capacity.
These parallel the controls used in Brooks’ baseline balance tests (distance to town; flood intensity).
3 Empirical Strategy
3.1 The Brooks (2020) design
Brooks (2020, Econometrica) exploits two-stage programme selection by Bridges to Prosperity. All candidate villages pass a needs assessment (population size, market proximity, expected use). Among these, some pass and some fail an engineering feasibility check based on riverbank geometry:
| Criterion | Threshold |
|---|---|
| Maximum span | ≤ 100 m |
| Crest height differential | ≤ 3 m |
| High-water mark clearance | ≥ 2 m below deck |
| Soil stability & erosion | Pass/fail |
Villages that pass both stages receive a bridge. Villages that pass needs but fail engineering serve as the comparison group. Because failure is caused by river geometry — not by village economic conditions — treatment assignment is argued to be as-good-as-random conditional on comparable need.
Brooks estimates two main specifications. For annual outcomes:
\[y_{ivt} = \alpha + \beta B_{vt} + \eta_t + \delta_v + \varepsilon_{ivt}\]
For high-frequency outcomes with flood interactions:
\[y_{ivt} = \eta_t + \delta_i + \beta B_{vt} + \gamma F_{vt} + \theta (B_{vt} \times F_{vt}) + \varepsilon_{ivt}\]
where \(B_{vt} = 1\) if village \(v\) has a bridge at time \(t\), \(F_{vt}\) is a flood shock, and village fixed effects \(\delta_v\) absorb time-invariant selection.
3.2 Our three empirical tests
Test A — Bivariate balance test: For each combination of difficulty proxy \(D\) and baseline characteristic \(Y\), we estimate:
\[Y_i = \alpha + \beta \cdot D_i + \varepsilon_i\]
This is a balance test. \(\hat\beta \approx 0\) means the engineering difficulty of a crossing does not predict the pre-existing economic conditions at that site — the core of Brooks’ orthogonality claim.
Test B — Geography-conditional balance test: We add elevation and distance to Kampala to isolate any residual confounding after the main geographic gradient is removed:
\[Y_i = \alpha + \beta \cdot D_i + \gamma_1 \cdot \text{elev}_i + \gamma_2 \cdot \text{dist\_kampala}_i + \varepsilon_i\]
If \(\hat\beta\) becomes significant here but was not in Test A, engineering difficulty is correlated with baseline conditions through geography: large rivers run through valleys where economic activity concentrates. This means geographic controls must be included in any analysis using this design.
Test C — Spatial autocorrelation (Moran’s I): We test whether the balance-test residuals cluster geographically using \(k = 5\) nearest-neighbour spatial weights. A significant Moran’s I means that nearby crossings share similar economic environments beyond what the difficulty proxy alone explains — signalling that omitted geographic confounders are at work.
3.3 Composite difficulty index (PCA)
Individual proxies each capture one facet of engineering difficulty. We combine them into a single index using principal components analysis (PCA) on all four continuous variables:
\[\text{PC1} = f(\text{slope},\ \log(\text{catchment area}),\ \text{stream order},\ \log(\text{flood Q50}))\]
PC1 is interpreted as a large-watershed / high-flood difficulty axis. Sites with large upstream catchments, high Strahler order, and high flood magnitudes (but typically flat terrain) score high; steep upland sites with small catchments score low. The index is estimated on \(n\) = 119 crossings with complete data on all four inputs. OSM waterway_type is tested separately as a categorical stream-class check.
4 Results
4.1 Descriptive statistics
| Variable | N | Mean | SD | Median | Min–Max |
|---|---|---|---|---|---|
| Slope (°) | 122 | 4.9 | 4.7 | 3.4 | 0 – 27 |
| Catchment area (km²) | 124 | 157.6 | 789.2 | 2.3 | 0 – 5664 |
| Flood Q50 (m³/s) | 124 | 510.5 | 2069.3 | 40.9 | 1 – 14298 |
| Stream order (Strahler) | 119 | 2.7 | 1.6 | 2.0 | 1 – 8 |
| — — — | |||||
| Nighttime lights | 122 | 0.0 | 0.1 | 0.0 | 0 – 1 |
| Population density (p/km²) | 122 | 312.3 | 305.0 | 221.5 | 15 – 1555 |
| Road density (km/km²) | 124 | 1.5 | 1.1 | 1.0 | 0 – 4 |
| Travel time (min) | 112 | 60.6 | 51.8 | 58.0 | 0 – 242 |
| Agri. suitability (0–100) | 122 | 90.7 | 7.5 | 93.2 | 63 – 100 |
| Relative Wealth Index | 124 | -0.4 | 0.3 | -0.5 | -1 – 0 |
| Child dep. ratio | 122 | 0.1 | 0.2 | 0.0 | 0 – 1 |
| — —— | |||||
| Elevation (m) | 121 | 1649.4 | 449.8 | 1797.3 | 886 – 2464 |
| Distance to Kampala (km) | 124 | 273.9 | 55.7 | 285.2 | 135 – 367 |
4.2 The difficulty composite index
r n_pca crossings with complete data). Left: variance explained — PC1 captures most of the shared variation. Right: PC1 loadings — catchment area, stream order, and flood magnitude load positively (larger = harder to bridge); slope loads negatively because steep upland sites paradoxically have small catchments.
PC1 explains 73.1% of the shared variance. It is best understood as a watershed size axis: sites on large, low-gradient rivers score high; steep upland crossings score low. The negative slope loading reflects the landscape structure — flat valley rivers have large catchments; steep upland streams have small ones.
4.3 Test A — Bivariate balance tests
For each of the 5 difficulty proxies and each of the 7 baseline characteristics, we ask: does difficulty predict this pre-existing condition? The figure below shows the OLS coefficient as a colour-coded tile; a star marks associations that survive FDR correction (Benjamini-Hochberg q < 0.05) applied across all 35 bivariate tests. The key proxy is the composite PC1 index (rightmost column); the individual component proxies are shown for transparency.
4.4 Test B — Balance tests conditional on geography
We add elevation and distance to Kampala as controls and repeat every balance test. This answers: even within the same geographic zone, does difficulty still predict baseline conditions? The coefficient plot overlays the bivariate (orange) and conditional (blue) estimates for the composite difficulty index.
| Outcome | β | SE | 95% CI | p (raw) | q (BH) | N |
|---|---|---|---|---|---|---|
| Nighttime lights (log) | -0.0029 | 0.0034 | [-0.0097, 0.0039] | 0.399 | 0.578 | 118 |
| Population density (log) | 0.0423 | 0.0393 | [-0.0355, 0.1201] | 0.284 | 0.530 | 118 |
| Road density (log) | 0.0359 | 0.0236 | [-0.0108, 0.0826] | 0.130 | 0.415 | 118 |
| Travel time to city (log) | 0.0124 | 0.0598 | [-0.1062, 0.1311] | 0.836 | 0.860 | 110 |
| Agricultural suitability | 0.2416 | 0.3425 | [-0.4369, 0.92] | 0.482 | 0.582 | 118 |
| Relative Wealth Index (Meta) | -0.0151 | 0.0136 | [-0.042, 0.0118] | 0.270 | 0.530 | 118 |
| Child dependency ratio (logit) | -0.0170 | 0.0999 | [-0.2149, 0.1808] | 0.865 | 0.865 | 118 |
4.5 Test C — Spatial autocorrelation
If crossing-level baseline characteristics cluster spatially beyond what the difficulty index explains, there are omitted geographic confounders — meaning the balance test understates the true correlation between difficulty and pre-existing conditions. We compute Moran’s I on the residuals of the NTL balance regression under both the bivariate and conditional specifications.
| Model | Moran's I | p-value | Interpretation |
|---|---|---|---|
| Bivariate (difficulty only) | 0.219 | 0.0000 | Significant clustering — geographic confounders present |
| Conditional (+ elevation + dist. Kampala) | 0.211 | 0.0000 | Significant clustering remains after geographic controls |
The bivariate residuals show significant spatial clustering (I = 0.219, p = 0.000)`: the nighttime-lights environment at nearby crossings is more similar than expected by chance. Once elevation and distance to Kampala are included, clustering drops to I = 0.211 (p = 0.000) — still above the conventional threshold. Residual spatial clustering remains after controls, suggesting additional geographic confounders may be present.
5 Assessment of Brooks’ Assumptions
We now map each identifying assumption from Brooks (2020) to our empirical evidence:
| # | Assumption | Our evidence | Verdict | Implication for the Brooks design |
|---|---|---|---|---|
| 1 | Engineering exogeneity | Balance test (bivariate): 0/7 baseline characteristics predicted by PC1 difficulty index | ✅ Holds | River geometry generates genuine variation in engineering feasibility that is unrelated to economic activity in the raw data. |
| 2 | No systematic baseline differences | Balance test (conditional): 0/7 characteristics predicted after geo controls (none) | ✅ Holds | Brooks' Table I conditions on distance to town and flood intensity, which are the main confounders we identify. Their balance tests directly address this concern. |
| 3 | No spillovers across villages | Moran's I on bivariate residuals: I = 0.219, p = 0.000 | ❌ Residual clustering remains | Brooks states that study villages are geographically separated, which breaks the physical spillover channel we detect at the crossing level. Cluster SEs at the village level to be safe. |
| 4 | Exogenous flooding shocks | Not testable — requires time-series rainfall + outcomes | — Not tested | Brooks uses satellite precipitation shocks. The cross-sectional exogeneity of catchment area and flood Q50 we test here is a necessary (not sufficient) condition. |
| 5 | Parallel pre-treatment trends | Not testable — requires pre-treatment panel | — Not tested | Brooks provides village fixed effects and pre-treatment balance (Table I). Our cross-sectional tests are consistent with the balance they report. |
5.1 Assumption 1 — Engineering exogeneity ✅
Holds. After FDR correction (Benjamini-Hochberg), the PC1 difficulty index shows 0 statistically significant associations with 7 baseline characteristics in bivariate balance tests — consistent with full orthogonality. River geometry — the source of variation in engineering feasibility — does not meaningfully predict pre-existing economic conditions at the crossing-point level. Crossings that are harder to bridge are not located in systematically richer or poorer areas.
5.2 Assumption 2 — No systematic baseline differences ✅
Holds. Once elevation and urban distance are included, 0 of 7 baseline characteristics become associated with engineering difficulty .
The critical question is whether Brooks’ own controls are sufficient. Their Table I balance tests condition on (i) distance to the nearest town and (ii) flood intensity — which are, respectively, our dist_kampala_km and flood_Q50 controls. The fact that our conditional Moran’s I drops to 0.211 (p = 0.000) after including these controls does not fully eliminate spatial clustering, suggesting residual geographic confounders beyond elevation and urban distance.
One important caveat: our analysis uses only 119 of 124 crossings for the conditional regressions (those with complete difficulty index data). The 5 crossings missing slope or catchment data are disproportionately small upland streams — the low-difficulty end of the distribution. The conditional correlation we find may therefore slightly overstate the confounding for the full sample.
5.3 Assumption 3 — No spillovers ⚠️
Does not fully hold. At the crossing level, outcomes are spatially correlated (bivariate Moran’s I = 0.219, p = 0.000). This could reflect market access spillovers, road-network effects, or common geographic shocks. However, Brooks’ design operates at the village level with spatially separated villages, which substantially reduces cross-unit spillover. The spatial autocorrelation we detect is likely absorbed by Brooks’ village fixed effects and subcounty-level controls. Standard errors should be clustered at the village or subcounty level to account for any residual spatial correlation.
6 Discussion
6.1 Overall verdict
The Brooks (2020) identification strategy is defensible with important caveats for the Mt. Elgon context. Engineering difficulty does not predict pre-existing economic conditions in any bivariate balance test. After geographic controls, no baseline characteristic is significantly associated with engineering difficulty. Brooks’ own design controls for distance-to-town and flood intensity, which directly address this.
6.2 What the results require in practice
Geographic controls are not optional. Both our bivariate and conditional tests show that without elevation and urban distance controls, the design is clean. With those controls included (as in Brooks), the design remains valid. Any replication must include geographic covariates in the first-stage specification and the balance table.
Cluster standard errors spatially. The Moran’s I result indicates that crossing-level (and likely village-level) outcomes are spatially correlated. Point estimates are unaffected, but standard errors must account for this — at minimum by clustering at the subcounty level.
The composite PC1 is a useful pre-registered validity check. The bivariate 0/7 result (see heatmap) is the natural single-number summary for a methods note or appendix. If any future application of this design shows significant bivariate associations between engineering difficulty and outcomes, the exclusion restriction should be re-examined.
7 Limitations
Analysis: R 4.4.1 · GDAL 3.10.1 · PROJ 9.5.1 · GEOS 3.13.0. DEM: 19 m (elevatr/AWS z=12). Hydrology: WhiteboxTools 2.4. Crossings: OSM road × river intersections, Mt. Elgon region, Uganda. Code repository: mt_elgon_crossings/.