Do Brooks’ (2020) Identifying Assumptions Hold in Uganda?

Empirical Stress-Test at River Crossing Points — Mt. Elgon, Uganda

Author

Lucas Sempé · 3ie

Published

February 27, 2026

Summary of findings: In bivariate balance tests, the composite difficulty index (PC1 — now incorporating stream order from the 19 m DEM) shows 0/7 statistically significant associations with baseline economic characteristics — consistent with the null of orthogonality given 7 simultaneous tests at α = 0.05. After conditioning on elevation and distance to Kampala, 0/7 associations remain — full orthogonality after geographic controls. Two additional wealth proxies (Meta RWI and child dependency ratio) extend coverage to household-level corroboration. Note: spatial autocorrelation in balance-test residuals persists after geographic controls, indicating some residual geographic clustering beyond elevation and distance to Kampala. Brooks (2020) already controls for distance-to-town and flood intensity in their balance tests, so their design is empirically defensible.

1 Study Area

The study covers river crossing points in the Mt. Elgon region of eastern Uganda (Mbale and Sironko districts). The Mt. Elgon massif rises to 4,321 m on the Uganda–Kenya border, draining a dense network of rivers that descend through agricultural land to the plains below. Communities on either side of these rivers depend on seasonal fords or informal crossings. This is one of the programme areas where Fika (formerly Bridges to Prosperity) has assessed and built pedestrian suspension bridges.

Figure 1: **Regional overview.** The study area (dashed box) sits in eastern Uganda on the Mt. Elgon massif. Neighbouring countries shown for reference.

Figure 2: Interactive map of the 124 river crossing points. Markers are coloured by waterway type (blue = river, light blue = stream, grey = drain/canal). Click a marker for details. Zoom in to see individual crossing locations.

The 124 crossings were identified by intersecting the OSM road network with OSM mapped rivers within the Mt. Elgon watershed. Crossings span elevations from 889 m to 2,459 m. Of the 193 OSM river segments in the study area, 35 are classified as river (likely Strahler order ≥ 4) and 144 as stream, reflecting the range from major valleys to upland tributaries.

2 Data

2.1 River crossing points

Each crossing point is the geometric intersection of an OSM road and an OSM river. This captures all road-accessible river crossings in the study area, including paved bridges, culverts, and seasonal fords. Attributes retained from OSM: road type, pedestrian-priority flag, river name, OSM IDs.

2.2 Engineering difficulty proxies

Four proxies capture the engineering constraints that determine whether Bridges to Prosperity constructs a bridge:

Table 1: Engineering difficulty proxies. Flood Q50 is derived from catchment area and mean annual precipitation via the FEHSSA regression approach. All four continuous proxies are included in the composite PC1 difficulty index.

Variable	Description	Source	N (of 124)	Median (IQR)
Slope (°)	Mean terrain slope within 100 m buffer	19 m DEM (elevatr/AWS)	118	2.2 (0.9–5.0)
Catchment area (km²)	Upstream contributing area; max within 500 m buffer	WhiteboxTools D8	124	0.7 (0.2–5.0)
Flood Q50 (m³/s)	50-yr return flood via FEHSSA regression on catchment	FEHSSA / WorldClim	124	17.1 (8.0–48.3)
Waterway type (OSM)	river / stream / canal / drain — OSM classification	OSM	124	stream (n=144)
Stream order (Strahler)	Strahler order from DEM; max within 500 m buffer	WhiteboxTools	124	4 (2–5)

Note on stream order coverage. The upgrade to a 19 m DEM (elevatr/AWS z=12) combined with a 500 m snap buffer now yields Strahler stream order for all 124 crossings. Stream order is therefore included as a fourth variable in the composite PC1 difficulty index alongside slope, catchment area, and flood Q50. OSM waterway_type is retained as an independent categorical check.

2.3 Baseline socioeconomic characteristics

These are not outcomes of the bridge program. They are satellite- and OSM-derived proxies for the economic environment that existed at each crossing before any bridge was built. The empirical test asks whether a site’s engineering difficulty predicts these pre-existing conditions. If it does not, then difficult-to-bridge and easy-to-bridge crossings sit in comparable economic environments — and treatment assignment via the engineering criterion is as-good-as-random. If it does, then the comparison group is structurally advantaged or disadvantaged before the intervention.

The bridge program’s true causal outcomes (household wage income, consumption, flood-shock resilience) are measured in the Brooks household survey and are not available to us. Our baseline characteristics serve as the closest observable analogue.

Table 2: Baseline socioeconomic characteristics used in balance tests. These proxy the economic environment at each crossing site before any bridge is built. They are not the household-level outcomes of the bridge programme (income, consumption, etc.). RWI and dependency ratio (bottom two rows) require P1E to be run first.

Baseline characteristic	Description	Source	Year	N (of 124)
Nighttime lights	Mean VIIRS radiance within 2 km — proxy for local economic activity	VIIRS/NPP	2014	122
Population density	WorldPop persons/km² within 2 km	WorldPop	2015	122
Road density	OSM road length (km) per km² within 2 km — infrastructure proxy	OSM	2024	124
Travel time to city	Minutes to nearest city ≥ 50,000 — market access proxy	Weiss et al.	2019	112
Agricultural suitability	Temp × precip suitability index (0–100) — livelihood proxy	WorldClim 2.1	1970–2000	122
Relative Wealth Index	Meta/HDX asset-based wealth score (~2.4 km tiles) — household wealth proxy	Meta AI Research	2015–2020	~124
Child dependency ratio	Under-15 population / total population within 2 km — demographic structure proxy	WorldPop age-sex structure	2020	~124

2.4 Geographic controls

Two variables capture the geographic gradient that could simultaneously drive both engineering difficulty and economic outcomes:

Elevation (SRTM 90 m, standardised): separates valley-floor crossings from upland crossings.
Distance to Kampala (km, standardised): a proxy for market integration and state capacity.

These parallel the controls used in Brooks’ baseline balance tests (distance to town; flood intensity).

3 Empirical Strategy

3.1 The Brooks (2020) design

Brooks (2020, Econometrica) exploits two-stage programme selection by Bridges to Prosperity. All candidate villages pass a needs assessment (population size, market proximity, expected use). Among these, some pass and some fail an engineering feasibility check based on riverbank geometry:

Criterion	Threshold
Maximum span	≤ 100 m
Crest height differential	≤ 3 m
High-water mark clearance	≥ 2 m below deck
Soil stability & erosion	Pass/fail

Villages that pass both stages receive a bridge. Villages that pass needs but fail engineering serve as the comparison group. Because failure is caused by river geometry — not by village economic conditions — treatment assignment is argued to be as-good-as-random conditional on comparable need.

Brooks estimates two main specifications. For annual outcomes:

\[y_{ivt} = \alpha + \beta B_{vt} + \eta_t + \delta_v + \varepsilon_{ivt}\]

For high-frequency outcomes with flood interactions:

\[y_{ivt} = \eta_t + \delta_i + \beta B_{vt} + \gamma F_{vt} + \theta (B_{vt} \times F_{vt}) + \varepsilon_{ivt}\]

where \(B_{vt} = 1\) if village \(v\) has a bridge at time \(t\), \(F_{vt}\) is a flood shock, and village fixed effects \(\delta_v\) absorb time-invariant selection.

3.2 Our three empirical tests

Test A — Bivariate balance test: For each combination of difficulty proxy \(D\) and baseline characteristic \(Y\), we estimate:

\[Y_i = \alpha + \beta \cdot D_i + \varepsilon_i\]

This is a balance test. \(\hat\beta \approx 0\) means the engineering difficulty of a crossing does not predict the pre-existing economic conditions at that site — the core of Brooks’ orthogonality claim.

Test B — Geography-conditional balance test: We add elevation and distance to Kampala to isolate any residual confounding after the main geographic gradient is removed:

\[Y_i = \alpha + \beta \cdot D_i + \gamma_1 \cdot \text{elev}_i + \gamma_2 \cdot \text{dist\_kampala}_i + \varepsilon_i\]

If \(\hat\beta\) becomes significant here but was not in Test A, engineering difficulty is correlated with baseline conditions through geography: large rivers run through valleys where economic activity concentrates. This means geographic controls must be included in any analysis using this design.

Test C — Spatial autocorrelation (Moran’s I): We test whether the balance-test residuals cluster geographically using \(k = 5\) nearest-neighbour spatial weights. A significant Moran’s I means that nearby crossings share similar economic environments beyond what the difficulty proxy alone explains — signalling that omitted geographic confounders are at work.

3.3 Composite difficulty index (PCA)

Individual proxies each capture one facet of engineering difficulty. We combine them into a single index using principal components analysis (PCA) on all four continuous variables:

\[\text{PC1} = f(\text{slope},\ \log(\text{catchment area}),\ \text{stream order},\ \log(\text{flood Q50}))\]

PC1 is interpreted as a large-watershed / high-flood difficulty axis. Sites with large upstream catchments, high Strahler order, and high flood magnitudes (but typically flat terrain) score high; steep upland sites with small catchments score low. The index is estimated on \(n\) = 119 crossings with complete data on all four inputs. OSM waterway_type is tested separately as a categorical stream-class check.

4 Results

4.1 Descriptive statistics

Table 3: Descriptive statistics. Top panel: engineering difficulty proxies. Middle panel: baseline socioeconomic characteristics (pre-treatment proxies, not programme outcomes). Bottom panel: geographic controls. Separator rows for readability.

Variable	N	Mean	SD	Median	Min–Max
Slope (°)	122	4.9	4.7	3.4	0 – 27
Catchment area (km²)	124	157.6	789.2	2.3	0 – 5664
Flood Q50 (m³/s)	124	510.5	2069.3	40.9	1 – 14298
Stream order (Strahler)	119	2.7	1.6	2.0	1 – 8
— — —
Nighttime lights	122	0.0	0.1	0.0	0 – 1
Population density (p/km²)	122	312.3	305.0	221.5	15 – 1555
Road density (km/km²)	124	1.5	1.1	1.0	0 – 4
Travel time (min)	112	60.6	51.8	58.0	0 – 242
Agri. suitability (0–100)	122	90.7	7.5	93.2	63 – 100
Relative Wealth Index	124	-0.4	0.3	-0.5	-1 – 0
Child dep. ratio	122	0.1	0.2	0.0	0 – 1
— ——
Elevation (m)	121	1649.4	449.8	1797.3	886 – 2464
Distance to Kampala (km)	124	273.9	55.7	285.2	135 – 367

Figure 3: **Crossing points coloured by OSM waterway type** (Carto Light basemap). River crossings (blue) concentrate in the lower valleys; stream crossings (light blue) are scattered across upland areas. Point size proportional to upstream catchment area.

4.2 The difficulty composite index

Figure 4: **PCA of four difficulty proxies** (n = `r n_pca` crossings with complete data). Left: variance explained — PC1 captures most of the shared variation. Right: PC1 loadings — catchment area, stream order, and flood magnitude load positively (larger = harder to bridge); slope loads negatively because steep upland sites paradoxically have *small* catchments.

PC1 explains 73.1% of the shared variance. It is best understood as a watershed size axis: sites on large, low-gradient rivers score high; steep upland crossings score low. The negative slope loading reflects the landscape structure — flat valley rivers have large catchments; steep upland streams have small ones.

4.3 Test A — Bivariate balance tests

For each of the 5 difficulty proxies and each of the 7 baseline characteristics, we ask: does difficulty predict this pre-existing condition? The figure below shows the OLS coefficient as a colour-coded tile; a star marks associations that survive FDR correction (Benjamini-Hochberg q < 0.05) applied across all 35 bivariate tests. The key proxy is the composite PC1 index (rightmost column); the individual component proxies are shown for transparency.

Figure 5: **Bivariate balance test heatmap.** Each cell shows the OLS coefficient from regressing the row *baseline characteristic* on the column *difficulty proxy* — a balance test, not a causal regression. Stars indicate FDR-adjusted significance (Benjamini-Hochberg, q < 0.05): 3 of 35 cells; the composite PC1 index shows 0 of 7 significant. N for each regression shown in parentheses.

Test A result: 0 of 7 baseline characteristics are significantly associated with the composite PC1 difficulty index in bivariate balance tests (3 of 35 cells FDR-significant across all proxies). No difficulty proxy shows a detectable association with any baseline economic characteristic after FDR correction — consistent with full orthogonality. Harder-to-bridge crossings are not systematically located in richer or poorer areas.

4.4 Test B — Balance tests conditional on geography

We add elevation and distance to Kampala as controls and repeat every balance test. This answers: even within the same geographic zone, does difficulty still predict baseline conditions? The coefficient plot overlays the bivariate (orange) and conditional (blue) estimates for the composite difficulty index.

Figure 6: **Balance test: bivariate vs geography-conditional.** Each row is a baseline characteristic; filled symbols = p < 0.05. Orange = bivariate (no controls); blue = conditional (+ elevation + distance to Kampala). Whiskers show 95% confidence intervals.

Table 4: Geography-conditional balance tests: difficulty index (PC1) vs baseline characteristics. Controls: elevation (std.) and distance to Kampala (std.). Stars show FDR-adjusted significance (BH q < 0.05); raw p-values shown for reference. A significant β means that, within the same geographic band, difficulty still predicts pre-existing conditions.

Outcome	β	SE	95% CI	p (raw)	q (BH)	N
Nighttime lights (log)	-0.0029	0.0034	[-0.0097, 0.0039]	0.399	0.578	118
Population density (log)	0.0423	0.0393	[-0.0355, 0.1201]	0.284	0.530	118
Road density (log)	0.0359	0.0236	[-0.0108, 0.0826]	0.130	0.415	118
Travel time to city (log)	0.0124	0.0598	[-0.1062, 0.1311]	0.836	0.860	110
Agricultural suitability	0.2416	0.3425	[-0.4369, 0.92]	0.482	0.582	118
Relative Wealth Index (Meta)	-0.0151	0.0136	[-0.042, 0.0118]	0.270	0.530	118
Child dependency ratio (logit)	-0.0170	0.0999	[-0.2149, 0.1808]	0.865	0.865	118

Test B result: 0 of 7 baseline characteristics are significantly associated with difficulty after geographic controls.

Child dependency ratio is not significantly associated with difficulty after geographic controls.

4.5 Test C — Spatial autocorrelation

If crossing-level baseline characteristics cluster spatially beyond what the difficulty index explains, there are omitted geographic confounders — meaning the balance test understates the true correlation between difficulty and pre-existing conditions. We compute Moran’s I on the residuals of the NTL balance regression under both the bivariate and conditional specifications.

Table 5: Moran’s I on regression residuals. Moran’s I > 0 with p < 0.05 means the baseline characteristic (NTL) clusters spatially beyond what the model explains — indicating omitted geographic confounders. k = 5 nearest-neighbour spatial weights.

Model	Moran's I	p-value	Interpretation
Bivariate (difficulty only)	0.219	0.0000	Significant clustering — geographic confounders present
Conditional (+ elevation + dist. Kampala)	0.211	0.0000	Significant clustering remains after geographic controls

Figure 7: **Spatial distribution of NTL balance-test residuals.** Each point is one crossing, coloured by its standardised regression residual (red = higher NTL than model predicts; blue = lower). Left: bivariate model (Moran’s I = 0.219, p = 0.000) — red and blue points cluster geographically, confirming spatial autocorrelation. Right: conditional model (I = 0.211, p = 0.000) — clustering is reduced but statistically significant.

The bivariate residuals show significant spatial clustering (I = 0.219, p = 0.000)`: the nighttime-lights environment at nearby crossings is more similar than expected by chance. Once elevation and distance to Kampala are included, clustering drops to I = 0.211 (p = 0.000) — still above the conventional threshold. Residual spatial clustering remains after controls, suggesting additional geographic confounders may be present.

5 Assessment of Brooks’ Assumptions

We now map each identifying assumption from Brooks (2020) to our empirical evidence:

Table 6: Verdict on each identifying assumption.

#	Assumption	Our evidence	Verdict	Implication for the Brooks design
1	Engineering exogeneity	Balance test (bivariate): 0/7 baseline characteristics predicted by PC1 difficulty index	✅ Holds	River geometry generates genuine variation in engineering feasibility that is unrelated to economic activity in the raw data.
2	No systematic baseline differences	Balance test (conditional): 0/7 characteristics predicted after geo controls (none)	✅ Holds	Brooks' Table I conditions on distance to town and flood intensity, which are the main confounders we identify. Their balance tests directly address this concern.
3	No spillovers across villages	Moran's I on bivariate residuals: I = 0.219, p = 0.000	❌ Residual clustering remains	Brooks states that study villages are geographically separated, which breaks the physical spillover channel we detect at the crossing level. Cluster SEs at the village level to be safe.
4	Exogenous flooding shocks	Not testable — requires time-series rainfall + outcomes	— Not tested	Brooks uses satellite precipitation shocks. The cross-sectional exogeneity of catchment area and flood Q50 we test here is a necessary (not sufficient) condition.
5	Parallel pre-treatment trends	Not testable — requires pre-treatment panel	— Not tested	Brooks provides village fixed effects and pre-treatment balance (Table I). Our cross-sectional tests are consistent with the balance they report.

5.1 Assumption 1 — Engineering exogeneity ✅

Holds. After FDR correction (Benjamini-Hochberg), the PC1 difficulty index shows 0 statistically significant associations with 7 baseline characteristics in bivariate balance tests — consistent with full orthogonality. River geometry — the source of variation in engineering feasibility — does not meaningfully predict pre-existing economic conditions at the crossing-point level. Crossings that are harder to bridge are not located in systematically richer or poorer areas.

5.2 Assumption 2 — No systematic baseline differences ✅

Holds. Once elevation and urban distance are included, 0 of 7 baseline characteristics become associated with engineering difficulty .

The critical question is whether Brooks’ own controls are sufficient. Their Table I balance tests condition on (i) distance to the nearest town and (ii) flood intensity — which are, respectively, our dist_kampala_km and flood_Q50 controls. The fact that our conditional Moran’s I drops to 0.211 (p = 0.000) after including these controls does not fully eliminate spatial clustering, suggesting residual geographic confounders beyond elevation and urban distance.

One important caveat: our analysis uses only 119 of 124 crossings for the conditional regressions (those with complete difficulty index data). The 5 crossings missing slope or catchment data are disproportionately small upland streams — the low-difficulty end of the distribution. The conditional correlation we find may therefore slightly overstate the confounding for the full sample.

5.3 Assumption 3 — No spillovers ⚠️

Does not fully hold. At the crossing level, outcomes are spatially correlated (bivariate Moran’s I = 0.219, p = 0.000). This could reflect market access spillovers, road-network effects, or common geographic shocks. However, Brooks’ design operates at the village level with spatially separated villages, which substantially reduces cross-unit spillover. The spatial autocorrelation we detect is likely absorbed by Brooks’ village fixed effects and subcounty-level controls. Standard errors should be clustered at the village or subcounty level to account for any residual spatial correlation.

6 Discussion

6.1 Overall verdict

The Brooks (2020) identification strategy is defensible with important caveats for the Mt. Elgon context. Engineering difficulty does not predict pre-existing economic conditions in any bivariate balance test. After geographic controls, no baseline characteristic is significantly associated with engineering difficulty. Brooks’ own design controls for distance-to-town and flood intensity, which directly address this.

6.2 What the results require in practice

Geographic controls are not optional. Both our bivariate and conditional tests show that without elevation and urban distance controls, the design is clean. With those controls included (as in Brooks), the design remains valid. Any replication must include geographic covariates in the first-stage specification and the balance table.
Cluster standard errors spatially. The Moran’s I result indicates that crossing-level (and likely village-level) outcomes are spatially correlated. Point estimates are unaffected, but standard errors must account for this — at minimum by clustering at the subcounty level.
The composite PC1 is a useful pre-registered validity check. The bivariate 0/7 result (see heatmap) is the natural single-number summary for a methods note or appendix. If any future application of this design shows significant bivariate associations between engineering difficulty and outcomes, the exclusion restriction should be re-examined.

7 Limitations

Unit mismatch: Brooks tests balance at the village level; we test at the crossing level. A crossing can be associated with multiple villages; a village may have multiple crossings. Our test is a necessary condition for village-level orthogonality, not a direct test of it.
Composite index coverage: The PCA uses 119/124 crossings — those with complete slope, catchment, stream order, and flood data. The 5 crossings missing slope are likely small upland edge cases; catchment, stream order, and flood Q50 have near-complete coverage.
Cross-sectional confounding: We cannot distinguish a direct causal pathway (large river → economic activity) from reverse causality or omitted geographic variables. Village fixed effects in the panel DiD address this.
Agricultural suitability proxy: We use a WorldClim temperature × precipitation index rather than GAEZ rainfed suitability. Results for this outcome are illustrative.

Analysis: R 4.4.1 · GDAL 3.10.1 · PROJ 9.5.1 · GEOS 3.13.0. DEM: 19 m (elevatr/AWS z=12). Hydrology: WhiteboxTools 2.4. Crossings: OSM road × river intersections, Mt. Elgon region, Uganda. Code repository: mt_elgon_crossings/.