Thetazero Pubs
e070001evtype data table storms storm group event groups events health fatalities


Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.

The database is from

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.


The csv data is loaded first and subset only the necessary variables: FATALITIES, INJURIES, PROPDMG (property damage) and CROPDMG (crop damage). Next, event groups are formed using keywords in the event type. The damange amounts are decoded by the exponent code and aggregate variables by group.

storm.csv <- "repdata-data-StormData.csv.bz2"
raw.ds <- read.csv(storm.csv)
storm.table <-
work.table <- storm.table[storm.table$FATALITIES > 0 | storm.table$INJURIES > 
    0 | storm.table$PROPDMG > 0 | storm.table$CROPDMG > 0, list(EVTYPE, FATALITIES, 
EVTYPE <- data.table(EVTYPE = sort(unique(storm.table$EVTYPE)))
EVTYPE$edited <- toupper(EVTYPE$EVTYPE)
EVTYPE$group <- "OTHER"
EVTYPE$group[grep("HEAT|DROUGHT", EVTYPE$edited)] <- "DROUGHT..."
EVTYPE$group[grep("FLOOD|RAIN", EVTYPE$edited)] <- "FLOOD..."
EVTYPE$group[grep("TORNADO", EVTYPE$edited)] <- "TORNADO"
multiplier <- c(1, 0, 0, 0, 1, 10, 100, 1000, 10000, 1e+05, 1e+06, 1e+07, 1e+08, 
    100, 100, 1000, 1000, 1e+06, 1e+06, 1e+09, 1e+09)
names(multiplier) <- c("", "-", "?", "+", "0", "1", "2", "3", "4", "5", "6", 
    "7", "8", "h", "H", "k", "K", "m", "M", "b", "B")
work.table$propertyDamage <- work.table$PROPDMG * multiplier[as.character(work.table$PROPDMGEXP)]
work.table$cropDamage <- work.table$CROPDMG * multiplier[as.character(work.table$CROPDMGEXP)]
work.table$propertyDamage[$propertyDamage)] <- 0
work.table$cropDamage[$cropDamage)] <- 0
byGroup <- aggregate(cbind(FATALITIES, INJURIES, propertyDamage, cropDamage) ~ 
    group, data = merge([, c("EVTYPE", "FATALITIES", 
    "INJURIES", "propertyDamage", "cropDamage")],[, c("EVTYPE", 
    "group")], by = "EVTYPE"), sum)


Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

qplot(x = variable, y = value, fill = group, data = melt(byGroup[, c("group", 
    "FATALITIES", "INJURIES")], id.var = "group"), geom = "bar", stat = "identity", 
    position = "dodge", xlab = "", ylab = "") + ggtitle("Population Health Effects")

plot of chunk Result1

More injuries are occurred than fatalities. Tornado events are dominating.

Question 2: Across the United States, which types of events have the greatest economic consequences?

qplot(x = variable, y = value/10^9, fill = group, data = melt(byGroup[, c("group", 
    "propertyDamage", "cropDamage")], id.var = "group"), geom = "bar", stat = "identity", 
    position = "dodge", xlab = "", ylab = "") + ggtitle("Economic Consequences (billions of dollars)")

plot of chunk Result2

There are more property damages than crop damages. Flood is the greatest economic consequence in the property damages.

Copyright © 2016 All Rights Reserved. Privacy Policy