1 Overview

This page presents an illustrative R workflow for visualizing county-level diabetes prevalence in New York State (NYS) using 2021 NYS BRFSS data. The examples include an interactive county-level map, a bubble map, and a county ranking plot created using the R packages leaflet, ggplot2, sf, and tigris.

2 NYS BRFSS county-level 2021 diabetes data

The NYS BRFSS county-level 2021 diabetes data are obtained from the NYS Open Data API in .csv format.

# load packages
library(tidyverse)
library(readr)
library(sf)
library(tigris)
library(scales)
library(forcats)
library(ggplot2)
library(leaflet)
options(tigris_use_cache = TRUE)

# save the brfss csv api link
brfss_url_csv = "https://health.data.ny.gov/resource/jsy7-eb4n.csv?$limit=50000"

tmp_csv = tempfile(fileext = ".csv")

# download brfss data to a temporary csv file
download.file(
  url = brfss_url_csv,
  destfile = tmp_csv,
  mode = "wb",
  quiet = TRUE,
  method = "libcurl"
)

# read the downloaded brfss csv file
brfss_all = read_csv(
  tmp_csv,
  show_col_types = FALSE,
  progress = FALSE
)

# define a helper function to standardize county names
clean_county = function(x){
  x %>%
    str_remove(" County$") %>%
    str_replace_all("\\.", "") %>%
    str_squish() %>%
    str_to_title()
}

# filter to 2021 county-level diabetes records and clean fields
brfss_county = brfss_all %>%
  filter(
    health_indicator_short_name == "Diabetes",
    geography == "County",
    as.character(years) == "2021"
  ) %>%
  transmute(
    county = clean_county(region_county),
    sample_n = sample_size_n,
    rate_num = unadjusted_rate,
    ci_low_num = cl_95_lower_unadjusted,
    ci_high_num = cl_95_upper_unadjusted
  ) %>%
  mutate(
    prev_rate = rate_num / 100,
    ci_low = ci_low_num / 100,
    ci_high = ci_high_num / 100
  ) %>%
  filter(!is.na(prev_rate))

# download and prepare nys county boundary data
nys_sf = counties(state = "NY", cb = TRUE, resolution = "5m") %>%
  st_transform(crs = 4326) %>%
  transmute(
    county = clean_county(NAME),
    geometry = geometry
  )

# join brfss county data to the nys map data
nys_map = nys_sf %>%
  left_join(brfss_county, by = "county")

# calculate county centroid coordinates for the bubble map
nys_centroids = nys_map %>%
  st_point_on_surface() %>%
  mutate(
    lon = st_coordinates(.)[, 1],
    lat = st_coordinates(.)[, 2]
  ) %>%
  st_drop_geometry()

3 Code Availability

The data access and preparation code is shown above for transparency. The visualization code used to generate the maps and ranking plot is not displayed in the rendered page, but it is available upon request.

4 Maps

Here, three figures are created:

Interative map of county-level diabetes prevalence
Bubble map showing prevalence and sample size
Ranking plot by county

4.1 Interative map

4.2 Bubble map

4.3 Ranking plot

5 Contact

Questions, suggestions, and collaboration inquiries are welcome. Please feel free to contact me if you are interested in adapting this workflow to other public health datasets or research settings.

Eun Jeong Oh, PhD
Assistant Professor of Biostatistics, Research Intelligence
Northwell Health
GitHub: https://github.com/oheunj
Email: eoh2 [at] northwell [dot] edu

NYS BRFSS county-level diabetes prevalence map using R

Oh, Eun Jeong

2026-05-21