Thetazero Pubs


In this study, I study the damage of weather events happened between 1950 and 2011. I find the most dangerous weather type is tornado, which caused 5633 deathes and 91346 injuries. In the economics measurement, the drought caused the most damage and result 28 billion dollar loss.

Data Processing

In this project, I use the dplyr R package to study the data. The details of the data source could be found from National Weather Service website.

I first load the raw data into and then using dplyr to select the damage data with the event type EVTYPE.

library(dplyr) <- read.csv(gzfile("repdata-data-StormData.csv.bz2")) <- read.csv("repdata-data-StormData.csv.") # if file is upzipped.
data <- tbl_df(
by_damage <- data %>% select(c(FATALITIES:CROPDMGEXP,EVTYPE)) %>% group_by(EVTYPE)

After this processing, I have a data table by_damage contains the event types and corresponding FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP.

I further convert the damage expenses by joining the units in PROPDMGEXP and CROPDMGEXP to PROPDMG and CROPDMG. I convert ?B,M,K? into $1e9, $1e6, and $1e3 (regardless of capitalization). For other cases, I use $1. The code is following:

update.prop <- by_damage %>% mutate(PROPDMGD=PROPDMG*sapply(PROPDMGEXP, function(s){
update.crop <- update.prop %>% mutate(CROPDMGD=CROPDMG*sapply(CROPDMGEXP, function(s){


Population Health

To find the most harmful event to the population health, I search the event type associated the with highest fatality number and injury number. Using the following code, I found the tornado results the highest fatalities and injuries.

most_fatal <- by_damage %>% summarise_each(funs(sum)) %>% arrange(desc(FATALITIES))
most_injury <-by_damage %>% summarise_each(funs(sum)) %>% arrange(desc(INJURIES))

## Source: local data frame [1 x 2]
## 1 TORNADO       5633
## Source: local data frame [1 x 2]
## 1 TORNADO    91346

Economics Damage

Similary, after aggregating the property damage and corp damage, I find the most harmful weather to economy is drought.

total_dmg <- update.crop %>% mutate(DMG=CROPDMGD+CROPDMGD)
most_dmg <- total_dmg %>% summarise(sumDMG=sum(DMG)) %>% arrange(desc(sumDMG))
## Source: local data frame [1 x 2]
##    EVTYPE      sumDMG
## 1 DROUGHT 27945132000

In the Fig. 1, I also plot the top 10 costy weather types in terms of the economy damages.

qplot(EVTYPE, sumDMG, data=most_dmg[1:10,]) +
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
        xlab("Event type") +
        ylab("Total economics damage") +
        ggtitle("Figure 1. Top 10 most costy weather types in US")


In this study, I found the tonardo is the most dangerous weather event in US and drought caused the most economic loss. A further study including 1) inflation, 2) more generalized weather types, 3) time evolution would be an interesting topic.

Copyright © 2016 All Rights Reserved. Privacy Policy