Thetazero Pubs
JoshuaSlocumdata stormdata total storm storms events event impact impacts impactful

Synopsis

Severe weather poses a major concern for both economic and public health outcomes. Different types of events have different consequences, and require different methods of damage prevention, mitigation and recovery. Thus, analyzing the historical outcomes of different types weather events can help us assess what steps need to be taken to prepare for different severe weather events.


In this analysis we ask tow main questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

To conduct this analysis we will use data from the US NOAA storm database. All code to conduct the analysis is provided in this document.

Data Processing

Data Source Background Info

The data used in this analysis comes from the US NOAA storm database which contains data on events from 1950 to November 2011. Earlier years tend to be a bit thin, but later years can be considered to be more complete. The database tracks several characteristics of major storms and weather events including: fatalities, injuries, property damage, location, and time of occurrence.

The data can be found here (47Mb).

For more information and source documentation please see the:


Loading the Data

This section will describe how to download the data directly into the working directory and then load it into the workspace as a data frame.

  1. Download the Data and save it as StormData.csv.bz2.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2",method="curl")
  1. Load the Data into the workspace as a data frame called StormData.
StormData <- read.csv("StormData.csv.bz2")
  1. Check the Data: How many rows and columns?
paste("There are ", dim(StormData)[1], " rows and ", dim(StormData)[2], " columns in the dataset.")
## [1] "There are  902297  rows and  37  columns in the dataset."

Rescaling Economic Variables

The relevant variables for economic damages are PROPDMG, PROPDMGEXP (which gives the magnitude of PROPDMG), CROPDMG, and CROPDMGEXP. Since each observation is of a different magnitude, we first baseline them to all be in thousands of dollars. We store the new values in PDMGK and CDMGK respectively, which are now the event damages in thousands of dollars.

# Initialize as 0's
StormData$PDMGK <- 0
StormData$CDMGK <- 0

# Function to convert values
convert <- function(x,y,z){
  r<- z
  n <- length(x)
  out <- 0
  for(i in 1:n){
    if(tolower(y[i]) == 'k'){out <- x[i]}
    else if(tolower(y[i]) == 'm'){out <- x[i]*1000}
    else if(tolower(y[i]) == 'b'){out <- x[i]*1000000}
    else{out <- x[i]}
    r[i] <- out
  }
  return(r)
}

# Convert the values
StormData$PDMGK <- with(StormData, convert(PROPDMG, PROPDMGEXP, PDMGK))
StormData$CDMGK <- with(StormData, convert(CROPDMG, CROPDMGEXP, CDMGK))

Creating Datasets For Analysis

Now that we have re-scaled the economic variables, we create two subsets of the data. The first will be used to analyze the health impacts (h.data), and the second for the economics impacts (e.data).

For the health data, we will use the sum of FATALITIES and INJURIES by EVTYPE to judge impact, so we aggregate the StormData across EVTYPE. This helps us better measure overall impact since some events may result in localized deaths (like extreme cold and heat), but others cause widespread injuries and fatalities (like Tornadoes and Hurricanes).

h.data <- aggregate(FATALITIES+INJURIES~EVTYPE, StormData, FUN=sum)
names(h.data) <- c("EVTYPE", "Total")

For the economic data, we will use the sum of our newly created PDMGK and CDMGK variables. That is, we will judge the economic impact of an event by the sum of its property and crop damages (in thousands of dollars). Since both variables are measured in dollars we can sum them, and we look at their sum because both types of damage have profound effects on the economy, both locally and nationally. Crop damage is especially important because many crops are not able to be quickly replaced, and reductions in food supply cause prices to spike up.

e.data <- aggregate(PDMGK+CDMGK~EVTYPE, StormData, FUN=sum)
names(e.data) <- c("EVTYPE", "Total")

Since there are many event types, and since we are only interested in the most impactful ones, we sort both newly created datasets to contain just the top 10 events by Total impact.

h.data <- head(h.data[order(h.data$Total, decreasing = TRUE),], 10)
e.data <- head(e.data[order(e.data$Total, decreasing = TRUE),], 10)

Results

Now that we have made some adjustments to the data we can look at the results.

Most Harmful Events to Population Health

The figure below is bar plot of Total health impact, which is the sum of FATALITIES and INJURIES for each event type across all years and states. We see the most harmful event with respect to human health is the Tornado with 96,979 total fatalities and injuries. The second most impactful event is excessive heat with 8,428 fatalities and injuries.

library(ggplot2)
ggplot(data=h.data, aes(x=reorder(EVTYPE,Total), y=Total)) +
    labs(title="Impact on Human Health",y="Total Health Impact",x="Event Type") +
    geom_bar(stat="identity", fill="#AD3333") +
    geom_text(data=h.data, aes(x=EVTYPE,y=Total,label=Total),vjust=0) +
    theme(axis.text.x = element_text(angle = 45, hjust=1))

plot of chunk health.plot

Most Harmful Events to the Economy

The figure below is a bar plot of Total economic impact, which is the sum of property and crop damages in thousands of dollars. We see from the plot that Flood has the greatest economic impact, with about double the impact of Hurricane/Typhoon. This is unsurprising since despite the well-known force of hurricanes, floods occur much more regularly and frequently in the same locations (i..e fertile river valleys). These locations happen to be great for farming (Hurricanes typically hit the coast, which is not so great for farming), but of course are flood prone. Floods also cause significant property damage by sweeping away houses, cars, roads and other infrastructure.

ggplot(data=e.data, aes(x=reorder(EVTYPE,Total), y=Total)) +
    labs(title="Impact on the Economy",y="Total Economic Impact",x="Event Type") +
    geom_bar(stat="identity", fill="#58A158") +
    theme(axis.text.x = element_text(angle = 45, hjust=1))

plot of chunk econ.plot

Below is the economic impact data used to generate the charts.

e.data
##                EVTYPE     Total
## 170             FLOOD 150319685
## 411 HURRICANE/TYPHOON  71913713
## 834           TORNADO  57352572
## 670       STORM SURGE  43323541
## 244              HAIL  18758572
## 153       FLASH FLOOD  17562686
## 95            DROUGHT  15018672
## 402         HURRICANE  14610229
## 590       RIVER FLOOD  10148404
## 427         ICE STORM   8967091
Copyright © 2016 thetazero.com All Rights Reserved. Privacy Policy