Code
library(tidyverse)
library(glmmTMB) # to run the beta regression
library(kableExtra) # to make aesthetic tables
library(broom.mixed) # to make aesthetic summary tables
set.seed(12) # so the outputs are the same for data generationDecember 11, 2025
GitHub repository: https://github.com/sofiiir/gdp-influence-on-coastline-protection
This blog post investigates the hypothesis that nations with higher per capita GDP are more likely to protect their coastal ecosystems. This hypothesis is derived from the general knowledge that protecting ecosystems takes a large amount of resources. Policy must be put in place to create these protected areas. Countries with a higher per capita GDP would, in theory, have a greater amount of resources to spend on creating these zones.
The type of ecosystems are also included as predictors in this analysis as some differences may exist in the proportion of the protected area between them. The directed acyclic graph (DAG) shows the assumption that the ecosystem type affects the proportion of area that is protected. Some ecosystems are more valuable for a nation monetarily if they are not protected. For example, locations where fishing is important for the economy may be difficult to create as designated protected areas. Additionally, depending on the type of protection of an ecosystem, officially protecting an area may be implausible as surrounding communities may be reliant on the ecosystem to make a living.
In the DAG, total area of an ecosystem acts as a fork. Total area of an ecosystem is therefore a confounding variable. For proportion protected area, the more area a nation has of a specific ecosystem the more area they would need to protect to achieve a higher proportion of protected area. One example in which I would expect this to largely influence proportion protected area is in island nations where they may have a large area of a specific coastal ecosystems. Even if they protect a greater gross area than another nation their proportion of protected area may be low.
Total area per ecosystem also influences per capita GDP as locations with more of specific ecosystem’s coastal area are more likely to have higher per capita GDP. This higher per capita GDP would likely be due to an increase in resources and/or an increase in eco-tourism.
Total coastal area of a country is also a confounding variable that effects both per capita GDP and proportion protected area per ecosystem. Coastal areas other than the five ecosystems, cold-water coral, mangroves, salt marsh, sea grass, and warm-water coral are not accounted for in the data. Coastal areas such as beaches and ports would likely increase per capita GDP through a potential increase in eco-tourism and/or global trade. These other coastline types could also potentially influence protection statuses. For example, if a port is near a mangrove the mangrove would be less likely to gain a protected status. Alternatively, if wealthy homes are near sea grass the sea grass may be more likely to be protected to maintain its sea surge abilities. Unfortunately, total coastal area data of a country was not found or acquired for this analysis and is not included in the model.
The original data comes from the Ocean+Habitats webpage which works to inform policy about the state of our oceans coastlines, focusing on metrics concerning ocean protection. This analysis focuses on the five ecosystems for which data is available. The five ecosystems are cold-water corals, mangroves, salt marshes, sea grasses, and warm-water corals. The ecosystem data is from 2016.
Per capita GDP data was acquired from the International Monetary Fund. 2016 per capita GDP data was used to correlate with the 2016 ecosystem data.
All of the data is publicly available.
Data cleaning was performed in a separate file with the final product being a clean data frame that contains only the appropriate variables for this analysis. The data cleaning file can be found on the full repository for this project at https://github.com/sofiiir/gdp-influence-on-coastline-protection.
First we’ll load in the necessary libraries.
Let’s read in the clean data.
We’ll create an initial plot visualizing how per capita GDP influences percent protected areas differentiating between ecosystems.
ggplot(data = coastal_ecosystem, aes(x = pc_gdp_2016,
y = percent_protected,
color = ecosystem)) +
geom_point() +
scale_color_manual(name = "Ecosystem",
values = c("#D51977",
"forestgreen",
"#273E47",
"#9BC1BC",
"#FEA82F"),
labels = c("Cold-water coral",
"Mangroves",
"Salt marsh",
"Seagrass",
"Warm-water coral")) +
labs(x= "Per Capita GDP ($USD)",
y = "Percent protected",
title = print("2016 Percent Protected Areas by Per Capita GDP")) +
theme_classic()[1] "2016 Percent Protected Areas by Per Capita GDP"
At first glance the data does not follow any general trends, but we can run a statistical model to better assess any relationship between the predictors of per capita GDP, total area per ecosystem, and ecosystem on the response proportion protected area for each ecosystem. Since our data shows change in proportion we will be using a beta regression model.
Here is our proposed model based on our understanding of the variable interactions. The ecosystems are individual predictors in this model since they are dummy variables. Cold-water corals are the reference level since R defines the reference alphabetically. Therefore, there is no dummy variable for cold-water coral. \[
\begin{align}
\text{ProportionProtected} &\sim Beta(\mu, \phi) \\
logit(\mu) &= \beta_0 + \beta_1 \text{GDP}+ \beta_2 \text{TotalArea}+ \beta_3 \text{Mangroves} + \beta_4 \text{Saltmarshes}+ \beta_5 \text{Seagrass}+ \beta_6 \text{WarmwaterCorals}
\end{align}
\]
We simulate data to make sure we understand the model before using the model on our real world data. To ensure we truly understand the model we want to simulate data, then run the beta regression with the output being similar values to those we gave it when first creating the data parameters.
We only simulate data for the per capita GDP variable as our hypothesis is that per capita GDP is positively influencing proportion protected area. Since per capita GDP is a numeric predictor we can use the runif function to create variable values within a given range.
# assign values to our intercept and slope
beta0 <- 0.6
beta1 <- 1.9
# simulate predictor values
gdp <- runif(10000, 0.433, 0.71)
# link-space equation
logit_mu <- beta0 + beta1 * gdp
# assign value to phi
phi <- 15 # this is shape1+shape2
# calculate mu based on betas and simulated data
mu <- 1 / (1 + exp(-logit_mu)) # this is shape1/(shape1+shape2)To simulate values for the response variable we use rbeta. This requires the above parameters we have already assigned. According to the documentation, shape1 and shape2 are two variables that need to be calculated with mu and phi.
# calculate the parameters for rbeta
shape1 <- mu * phi
shape2 <- phi - shape1
# simulate response data based on parameters assigned above
proportion_protected_sim <- rbeta(10000, shape1 = shape1, shape2 = shape2)
# create a dataframe with the variables
sim_df <- tibble(proportion_protected_sim, gdp )
# run the model based on the simulated data
sim_model <- glmmTMB(proportion_protected_sim ~ gdp,
data = sim_df,
family = beta_family(link = "logit"))Let’s make sure the output values are the same as the ones we assigned to the simulation.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 0.5882411 | 0.0458635 | 12.82590 | p < 0.001 |
| gdp | 1.9164176 | 0.0803545 | 23.84952 | p < 0.001 |
Running a beta regression model on the simulated data outputs values that are close to those values we originally assigned to the variables. This is a sign we understand our model enough to move onto fitting the model on our own data!
The beta regression model only works on proportion so here we change percent_protected to proportion_protected.
The beta regression model uses the link function logit. Logit does not work on 0’s and 1’s. We need to adjust the data to transform any 0’s and 1’s to slightly above 0 and slightly below 1. Here’s a function created based on the vignette’s description of how to handle these values.
Then we adjust the non-numeric values to numeric and apply the zeros_and_ones function to our proportion data.
Let’s remind ourselves of the statistical notation of our investigation.
\[ \begin{align} \text{ProportionProtected} &\sim Beta(\mu, \phi) \\ logit(\mu) &= \beta_0 + \beta_1 \text{GDP}+ \beta_2 \text{TotalArea}+ \beta_3 \text{Mangroves} + \beta_4 \text{Saltmarshes}+ \beta_5 \text{Seagrass}+ \beta_6 \text{WarmwaterCorals} \end{align} \]
\[ \begin{align} H_0 &: \beta_1 = 0 \\ H_A &: \beta_1 > 0 \end{align} \]
The hypothesis as described in full detail in the introduction is that per capita GDP has a positive effect on the proportion of protected area. The null hypothesis is that per capita GDP does not have an effect on the proportion of protected area.
Again to reiterate, the beta regression model is useful when the response is a proportion. Since the protected areas are between the proportions of 0 and 1 this model will assess whether there is any relationship between the predictors, per capita GDP, total area per ecosystem, and ecosystem type, and the response, proportion protected area per ecosystem.
# run the beta regression model on our data
coastal_ecosystem_model <- glmmTMB(adj_proportion_protected ~ pc_gdp_2016 + total_area + ecosystem,
data = coastal_ecosystem,
family = beta_family(link = "logit"))
# summarize the model results to late call out the values
sum_coastal_ecosystem_mod <- summary(coastal_ecosystem_model)We can use broom.mixed::tidy() to make an aesthetic summary table of our model. When originally running the tidy() function some of the p-values were zero since the table does not round. We use an ifelse() statement to replace anywhere with a p-value less than 0.001 to simply state “p < 0.001.” P-values that are less than 0.001 show statistical significance since there is an extremely low probability of getting those values by chance alone.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -1.6119807 | 0.1378946 | -11.6899441 | p < 0.001 |
| pc_gdp_2016 | 0.0000185 | 0.0000037 | 5.0292800 | p < 0.001 |
| total_area | 0.0000033 | 0.0000119 | 0.2799932 | 0.779482695483335 |
| ecosystemmangroves | 0.9615269 | 0.1863184 | 5.1606665 | p < 0.001 |
| ecosystemsaltmarshes | 1.0816634 | 0.1877536 | 5.7610786 | p < 0.001 |
| ecosystemseagrasses | 0.5545267 | 0.1677406 | 3.3058597 | p < 0.001 |
| ecosystemwarm_coral | 0.7405975 | 0.1931282 | 3.8347457 | p < 0.001 |
We reject the null hypothesis that per capita GDP does not have an effect on the proportion of protected area. The beta value for per capita GDP is low but positive at 1.8455077^{-5}. This value makes sense logically since per capita GDP changes by the thousands. The p-value is less than 0.05 meaning that the likelihood that these findings are due to random chance are extremely low. The effect of per capita GDP on proportion protected area is positive and statistically significant.
Total area per ecosystem is not statistically significant. It is not statistically influencing the proportion of protected area. It is still important to keep total area per ecosystem in the model as it is a confounding variable as presented in the DAG above.
All of the ecosystem’s proportion protected values are statistically significant when comparing proportion protected area to the cold-water coral reference values. Mangroves, salt marshes, sea grasses, and warm-water coral are all more likely to be protected than cold-water corals.
Now that we ran the beta regression with the glmmTMB function let’s create prediction lines based on ecosystem. We create data that shows the spread of the expected values for per capita GDP.
Here we use seq() to create values in the range of our real world data by a specific interval. We also want all of the ecosystems in the prediction grid. Every unique combination between the sequenced along per capita GDP values and different ecosystem type is included in the prediction grid. We keep the total area variable constant by using the mean since the values are numeric.
Let’s create prediction values for expected protected proportion area per ecosystem based on the different combinations of possible per capita GDP and ecosystems holding total area per ecosystem constant. This uses the glmmTMB model we already specified above to calculate what the protected proportion values are based on the prediction grid.
Let’s plot the predictions, differentiating by ecosystem.
ggplot(data = coastal_model_pred, aes(x = pc_gdp_2016, y = predict)) +
geom_line(aes(color = ecosystem))+
scale_color_manual(name = "Ecosystem",
values = c("#D51977",
"forestgreen",
"#273E47",
"#9BC1BC",
"#FEA82F"),
labels = c("Cold-water coral",
"Mangroves",
"Salt marsh",
"Seagrass",
"Warm-water coral")) +
labs(x= "Per Capita GDP ($USD)",
y = "Percent protected",
title = print("2016 Percent Protected Areas by Per Capita GDP")) +
theme_minimal()[1] "2016 Percent Protected Areas by Per Capita GDP"
The prediction lines are broken up by ecosystem. Across the board, all of the ecosystems are more likely to be protected the higher the per capita GDP is in a country. Salt marshes are the most likely to be protected followed by mangroves, warm-water coral, and sea grasses. Cold-water corrals are the least likely to be protected. We’ll look at the confidence intervals below to analyze if these differences in proportion protected area are significant between the ecosystem types.
Key takeaway: Overall, as GDP increases the expected proportion of protected areas increases, regardless of the ecosystem type.
Confidence intervals show where we expect 95% of the time we will find the range for the parameter in question. For beta regression when analyzing the confidence interval for betas we expect the beta’s value to fall within the confidence interval 95% of the time. When analyzing the prediction’s confidence intervals, the intervals show where the mu is expected to fall 95% of the time.
Typically, we can use the predict() function to calculate confidence intervals. Unfortunately, the various models that run beta regressions do not allow us to create confidence intervals using the predict function so we calculate these by hand below and by bootstrapping.
The values from our summary model are utilized to create the confidence intervals for the individual beta values.
First, we’ll access the individual beta values from the summary table as well as the standard error values from the summary table.
beta0 <- sum_coastal_ecosystem_mod$coefficients$cond[1]
beta1 <- sum_coastal_ecosystem_mod$coefficients$cond[2]
beta2 <- sum_coastal_ecosystem_mod$coefficients$cond[3]
beta3_mangroves <- sum_coastal_ecosystem_mod$coefficients$cond[4]
beta3_saltmarsh <- sum_coastal_ecosystem_mod$coefficients$cond[5]
beta3_seagrass <- sum_coastal_ecosystem_mod$coefficients$cond[6]
beta3_warmwater_coral <- sum_coastal_ecosystem_mod$coefficients$cond[7]
se <- sum_coastal_ecosystem_mod$coefficients$cond[, "Std. Error"]Confidence intervals are then calculated by subtracting 1.96 times the standard error for the lower bound and by adding 1.96 times the standard error for the upper bound for each beta value.
# Beta 0 confidence intervals
beta0_CI_lwr <- beta0 - (1.96* se[1])
beta0_CI_upr <- beta0 + (1.96* se[1])
# Beta 1 confidence intervals
beta1_CI_lwr <- beta1 - (1.96* se[2])
beta1_CI_upr <- beta1 + (1.96* se[2])
# Beta 2 confidence intervals
beta2_CI_lwr <- beta2 - (1.96* se[3])
beta2_CI_upr <- beta2 + (1.96* se[3])
# Beta 3 mangrove confidence intervals
beta3_man_CI_lwr <- beta3_mangroves - (1.96* se[4])
beta3_man_CI_upr <- beta3_mangroves + (1.96* se[4])
# Beta 3 saltmarsh confidence intervals
beta3_salt_CI_lwr <- beta3_saltmarsh - (1.96* se[5])
beta3_salt_CI_upr <- beta3_saltmarsh + (1.96* se[5])
# Beta 3 seagrass confidence intervals
beta3_seag_CI_lwr <- beta3_seagrass - (1.96* se[6])
beta3_seag_CI_upr <- beta3_seagrass + (1.96* se[6])
# Beta 3 confidence intervals
beta3_wwc_CI_lwr <- beta3_warmwater_coral - (1.96* se[7])
beta3_wwc_CI_upr <- beta3_warmwater_coral + (1.96* se[7])Finally, we create a data frame with the beta’s confidence interval values.
CI_df <- tibble(
Variable = c("Intercept", "GDP", "Total Area", "Mangroves", "Saltmarsh", "Seagrass", "Warm-water Coral"),
BetaValues = c(beta0, beta1, beta2, beta3_mangroves, beta3_saltmarsh, beta3_seagrass, beta3_warmwater_coral),
LowerCI = c(beta0_CI_lwr, beta1_CI_lwr, beta2_CI_lwr, beta3_man_CI_lwr, beta3_salt_CI_lwr, beta3_seag_CI_lwr, beta3_wwc_CI_lwr),
UpperCI =c(beta0_CI_upr, beta1_CI_upr, beta2_CI_upr, beta3_man_CI_upr, beta3_salt_CI_upr, beta3_seag_CI_upr, beta3_wwc_CI_upr)
)
CI_df |>
kable() |>
kable_styling()| Variable | BetaValues | LowerCI | UpperCI |
|---|---|---|---|
| Intercept | -1.6119807 | -1.8822542 | -1.3417072 |
| GDP | 0.0000185 | 0.0000113 | 0.0000256 |
| Total Area | 0.0000033 | -0.0000200 | 0.0000266 |
| Mangroves | 0.9615269 | 0.5963429 | 1.3267109 |
| Saltmarsh | 1.0816634 | 0.7136663 | 1.4496604 |
| Seagrass | 0.5545267 | 0.2257552 | 0.8832982 |
| Warm-water Coral | 0.7405975 | 0.3620662 | 1.1191287 |
These confidence intervals are ranges specific to the individual predictors and show the confidence interval only based on that one variable.
The per capita GDP confidence interval range is positive showing that proportion protected area is positively correlated with per capita GDP.
The total area per ecosystem confidence interval includes zero in its range. This signifies that total area per ecosystem does not have an effect on proportion protected area. This confidence interval analysis also supports and is supported by the p-value for total area per ecosystem being greater than 0.05 and not statistically significant as denoted in the beta regression summary table above.
The ecosystem’s beta values’ confidence interval ranges are all positive and do not include zero. R uses the first ecosystem alphabetically as the reference. In this case the reference is cold-water coral. The confidence intervals are then constructed for mangroves, salt marshes, sea grasses, and warm-water coral. The confidence interval ranges indicate, similarly to the prediction lines calculated above, that mangroves, salt marshes, sea grasses, and warm-water coral are more likely to be protected than cold-water corals. The confidence intervals for mangroves, salt marshes, sea grasses, and warm-water coral overlap. This signifies the differences between mangroves, salt marshes, sea grasses, and warm-water coral are likely not significant but additional analysis would be necessary to confirm this statement.
We can calculate the confidence intervals for predictions by constructing bootstraps. Bootstraps are run by sampling our original data with replacement. Since we have 446 rows of data in our data frame, we choose to select 446 samples in the boot per bootstrap calculated for consistency. Below we are running a bootstrap 1000 times.
# reiterate through the boot 1000 times
boot <- map_dfr(1:1000, \(i) {
# sample with replacement
boot_sample <- slice_sample(coastal_ecosystem, n=446, replace = TRUE)
# run a beta regression on every single boot sample
boot_mod <- glmmTMB(adj_proportion_protected ~ pc_gdp_2016 + total_area + ecosystem,
family = beta_family(link = "logit"),
data = boot_sample)
# predict the boots values (the outcomes are best fit lines)
mutate(pred_grid,
adj_prop_protected = predict(object = boot_mod,
newdata = pred_grid,
type = "response"),
boot = i)
})Then we create a data frame calculating the confidence intervals from the bootstrapped values.
Let’s plot the confidence intervals for the predictions differentiating by ecosystems.
ggplot(coastal_model_pred, aes(x = pc_gdp_2016)) +
geom_ribbon(data = boot_ci,
aes(ymin = adj_prop_protected_lwr,
ymax = adj_prop_protected_upr,
fill = ecosystem),
alpha = 0.2) +
scale_color_manual(name = "Ecosystem",
values = c(coldwater_coral = "#D51977",
mangroves = "forestgreen",
saltmarshes = "#273E47",
seagrasses = "#9BC1BC",
warm_coral = "#FEA82F"),
labels = c("Cold-water coral",
"Mangroves",
"Salt marsh",
"Seagrass",
"Warm-water coral")) +
scale_fill_manual(name = "Ecosystem",
values = c(coldwater_coral = "#D51977",
mangroves = "forestgreen",
saltmarshes = "#273E47",
seagrasses = "#9BC1BC",
warm_coral = "#FEA82F"),
labels = c("Cold-water coral",
"Mangroves",
"Salt marsh",
"Seagrass",
"Warm-water coral")) +
geom_line(aes(color = ecosystem, y = predict)) +
labs(x= "Per Capita GDP ($USD)",
y = "Percent protected",
title = print("2016 Percent Protected Areas by Per Capita GDP")) +
theme_minimal()[1] "2016 Percent Protected Areas by Per Capita GDP"
The confidence intervals constructed based on the predictions show the variation for the different ecosystems based on all of the predictors confounding effects. The confidence intervals for the mangroves, salt marshes, sea grasses, and warm-water corals are all overlapping which supports the findings that the beta values confidence intervals are also overlapping as stated above. The effect of per capita GDP on the protection of ecosystems do not greatly differ between the ecosystems with the exception of cold-water coral.
Cold-water coral is less likely to be protected than the other ecosystems. At approximately $50,000 USD per capita GDP, the cold-water coral confidence interval overlaps with the sea grass and warm-water coral confidence intervals. The overlap in confidence intervals means that the difference between the expected mu values for warm-water corals, sea grass, and cold-water coral are not significantly different when per capita GDP is 50,000 (USD) and up.
Per capita GDP has a positive effect on proportion protected area regardless of the ecosystem type. Cold-water coral is less likely to be protected than mangroves, salt marshes, sea grass, and warm-water coral. The difference in expected proportion protected area between cold-water coral and the other analyzed ecosystems is reduced at approximately $50,000 USD. The difference in proportion of protected areas not being high between the ecosystems makes sense logically since a nation that protects some of its ecosystems is more likely to protect their other ecosystems.
The repository containing all of the code for this analysis can be found here.
Brooks, M., Kristensen K., Maechler M., Magnusson A., McGillycuddy M., Skaug H., Nielsen A., Berg C., van Bentham K., Sadat N., Lüdecke D., Lenth R., O’Brien J., Geyer C.J., Jagan M., Wiernik B., Stouffer D.B., Agronah M., Akdur H.T.K., Bové D.S., and Krieger N. (Oct. 9, 2025). Package ‘glmmTMB’ (CRAN). Available at: https://cran.r-project.org/web/packages/glmmTMB/glmmTMB.pdf
Czapanskiy, M. and Grimes N. EDS 222: Statistics for Environmental Data Science. Available at: https://eds-222-stats-f25.github.io/
IMF. (2023). World Economic Outlook, October 2023. Washington, DC: International Monetary Fund. ©IMF. https://doi.org/10.5089/9798400235801.081 Available at: https://www.imf.org/external/datamapper/NGDPDPC@WEO/OEMDC/ADVEC/WEOWORLD
UNEP-WCMC. (2025). Ocean+ Habitats [On-line], [Downloaded: October 2025]. Available at: habitats.oceanplus.org
@online{rodas2025,
author = {Rodas, Sofia},
title = {Per {Capita} {GDP} {Influence} on the {Protection} of
{Coastal} {Ecosystems}},
date = {2025-12-11},
url = {https://sofiiir.github.io/blog/gdp-influence-on-coastline-protection-blog/},
langid = {en}
}