Project 1 - "Just Breathe"

View Project on Shinyapps

Please note that the edges of this video were cut off during the transcoding process. My slider for selecting the year and the menus for selecting state and county are offscreen to the left.


This is a visualization of air quality data as collected by the EPA throughout the United States from 1980 to 2018. For the given state, county, and year, we show that year's proportion of days with an air quality index that is good, moderate, unhealthy for sensitive groups, very unhealthy, and hazardous, as well as the absolute number of days that fall into each of these categories. We also show the proportion and absolute number of days on which each of the following pollutants was the most concentrated: Carbon dioxide, nitrogen dioxide, ozone, particulate matter with a size of 2.5 microns, particulate matter with a size of 10 microns, and sulfur dioxide. For the given state and county across all years, we show the maximum, 90th percentile, and median air quality index, as well as the proportion of days with a majority of each of the measured pollutants. We also show the location of this county in an interactive map.

Data and Transformations

This visualization presents data from the Environmental Protection Agency's air quality measurements. In particular, it presents information about Air Quality Index (AQI) measurements, as well as on how many days of the given year each of the measured pollutants was the most prevalent. The data is broken down by county, nationwide.

I read in air quality data from the given csv files for each year, and combined them row-wise into a single data frame.

To find the location of each county, I read information about locations of EPA sensor locations from the linked csv ( Since the precise locations of these stations are not relevant to the visualization, and since there are several stations listed per county, many of which are inactive, I grouped together all stations for each county (using the dplyr library), and calculated the mean latitude and longitude. This provides us with a coarse location for each county.

I then performed transformations on the data on a visualization-by-visualization basis:

Setup Instructions

Source Code

This visualization requires a working R environment. First, install RStudio as appropriate for your platform. Then, through the R terminal, install the packages "shiny", "ggplot2", "dplyr", "purrr", and "leaflet". It should now be possible to run the Shiny application by clicking on the "Run App" button in RStudio. You may wish to run "downloadData.r" to download fresh copies of the CSV data. One dependency, st, required me on Ubuntu to add the ppa "ubuntugis/ubuntugis-unstable" and install the packages "libudunits2-dev", "libgdal-dev", "libgeos-dev", and "libproj-dev". However, I now suspect that this was unnecessary.



Overall, this picture is hopeful. The proportion of days which are categorized as "Good Days" increases significantly into the 2010's. Obviously rural areas have a higher air quality than urban areas, but even these show improvement. In rural areas, Ozone is the most significant pollutant, while urban areas have more carbon monoxide and nitrogen dioxide. It is difficult to find any days categorized as "Hazardous" - however, many of these days are from the west and southwest, presumably due to fire. The highest AQIs of all come from California, the only state to have an AQI above 10,000 during the given period, which makes sense given the state's history of flammability.