This page presents an illustrative R workflow for visualizing county-level diabetes prevalence in New York State (NYS) using 2021 NYS BRFSS data. The examples include an interactive county-level map, a bubble map, and a county ranking plot created using the R packages leaflet, ggplot2, sf, and tigris.
The NYS BRFSS county-level 2021 diabetes data are obtained from the NYS Open Data API in .csv format.
# load packages
library(tidyverse)
library(readr)
library(sf)
library(tigris)
library(scales)
library(forcats)
library(ggplot2)
library(leaflet)
options(tigris_use_cache = TRUE)
# save the brfss csv api link
brfss_url_csv = "https://health.data.ny.gov/resource/jsy7-eb4n.csv?$limit=50000"
tmp_csv = tempfile(fileext = ".csv")
# download brfss data to a temporary csv file
download.file(
url = brfss_url_csv,
destfile = tmp_csv,
mode = "wb",
quiet = TRUE,
method = "libcurl"
)
# read the downloaded brfss csv file
brfss_all = read_csv(
tmp_csv,
show_col_types = FALSE,
progress = FALSE
)
# define a helper function to standardize county names
clean_county = function(x){
x %>%
str_remove(" County$") %>%
str_replace_all("\\.", "") %>%
str_squish() %>%
str_to_title()
}
# filter to 2021 county-level diabetes records and clean fields
brfss_county = brfss_all %>%
filter(
health_indicator_short_name == "Diabetes",
geography == "County",
as.character(years) == "2021"
) %>%
transmute(
county = clean_county(region_county),
sample_n = sample_size_n,
rate_num = unadjusted_rate,
ci_low_num = cl_95_lower_unadjusted,
ci_high_num = cl_95_upper_unadjusted
) %>%
mutate(
prev_rate = rate_num / 100,
ci_low = ci_low_num / 100,
ci_high = ci_high_num / 100
) %>%
filter(!is.na(prev_rate))
# download and prepare nys county boundary data
nys_sf = counties(state = "NY", cb = TRUE, resolution = "5m") %>%
st_transform(crs = 4326) %>%
transmute(
county = clean_county(NAME),
geometry = geometry
)
# join brfss county data to the nys map data
nys_map = nys_sf %>%
left_join(brfss_county, by = "county")
# calculate county centroid coordinates for the bubble map
nys_centroids = nys_map %>%
st_point_on_surface() %>%
mutate(
lon = st_coordinates(.)[, 1],
lat = st_coordinates(.)[, 2]
) %>%
st_drop_geometry()
The data access and preparation code is shown above for transparency. The visualization code used to generate the maps and ranking plot is not displayed in the rendered page, but it is available upon request.
Here, three figures are created:
Questions, suggestions, and collaboration inquiries are welcome. Please feel free to contact me if you are interested in adapting this workflow to other public health datasets or research settings.
Eun Jeong Oh, PhD
Assistant Professor of Biostatistics, Research Intelligence
Northwell Health
GitHub: https://github.com/oheunj
Email: eoh2 [at] northwell [dot] edu