Thetazero Pubs
davidmarekdata storm total evtype injuries flood floods dmg wind fatalities

We studied the impact of weather events across the United States between 1950 and 2011. The U.S. National Oceanic and Atmospheric Administrations?s (NOAA) storm database was used to find which weather events were most harmful with respect to population health and which had the greatest economic consequences. When it comes to population health, the storm data contains the number of fatalities as well as the number of injuries caused by events. We took booth into consideration. To find events with the greatest economic consequences we summed all property damages for each event type. We show the top 10 events for each aspect.


Data processing

For data processing we?ll use dplyr and ggplot2 libraries.

library(dplyr)
library(ggplot2)

The data are stored in csv file so we need to read them.

storm.data <- read.csv(bzfile('repdata_data_StormData.csv.bz2'))

Events which are the most harmful to population health

To measure the most harmful events we need to group them by event type and then summarise them to get the total number of fatalities and injuries.

total.health.issues <- storm.data %>%
    group_by(EVTYPE) %>%
    summarise(total.fatalities = sum(FATALITIES), 
              total.injuries = sum(INJURIES))

We?ll order the summarised events by total number of fatalities and plot the top 10 by using a bar chart. We need to reorder the event type factor to get a proper ordering of events in the chart.

top.fatalities <- total.health.issues %>% arrange(desc(total.fatalities))
top.fatalities$EVTYPE <- factor(top.fatalities$EVTYPE, levels = as.character(top.fatalities$EVTYPE))
top.fatalities
## Source: local data frame [985 x 3]
## 
##            EVTYPE total.fatalities total.injuries
## 1         TORNADO             5633          91346
## 2  EXCESSIVE HEAT             1903           6525
## 3     FLASH FLOOD              978           1777
## 4            HEAT              937           2100
## 5       LIGHTNING              816           5230
## 6       TSTM WIND              504           6957
## 7           FLOOD              470           6789
## 8     RIP CURRENT              368            232
## 9       HIGH WIND              248           1137
## 10      AVALANCHE              224            170
## ..            ...              ...            ...
ggplot() +
    geom_bar(aes(EVTYPE, total.fatalities),
             data = head(top.fatalities, 10),
             stat = "identity") +
    labs(title = "Top 10 events by total number of fatalities.",
         x = "Event type",
         y = "Total number of fatalities") +
    coord_flip()




Now we?ll order the summarised events by total number of injuries and plot the top 10 by using a bar chart.

top.injuries <- total.health.issues %>% arrange(desc(total.injuries))
top.injuries$EVTYPE <- factor(top.injuries$EVTYPE, levels = as.character(top.injuries$EVTYPE))
top.injuries
## Source: local data frame [985 x 3]
## 
##               EVTYPE total.fatalities total.injuries
## 1            TORNADO             5633          91346
## 2          TSTM WIND              504           6957
## 3              FLOOD              470           6789
## 4     EXCESSIVE HEAT             1903           6525
## 5          LIGHTNING              816           5230
## 6               HEAT              937           2100
## 7          ICE STORM               89           1975
## 8        FLASH FLOOD              978           1777
## 9  THUNDERSTORM WIND              133           1488
## 10              HAIL               15           1361
## ..               ...              ...            ...
ggplot() +
    geom_bar(aes(EVTYPE, total.injuries),
             data = head(top.injuries, 10),
             stat = "identity") +
    labs(title = "Top 10 events by total number of injuries.",
         x = "Event type",
         y = "Total number of injuries") +
    coord_flip()

Events with the greatest economic consequences

To get the most harmful event types with respect to economic consequences we will group them by event type and property damage exponent (?K? for thousands, ?M? for millions and ?B? for billions). We will obtain the total property damage by summing the property damages in each group.

grouped.storm.data <- storm.data %>% 
    group_by(EVTYPE, PROPDMGEXP) %>%
    summarise(DMG = sum(PROPDMG))

For each event type we have at most four groups: thousands, millions, billions, and the rest. We will multiply each group by its exponent (1000 for thousands, 1000000 for millions, etc).

grouped.storm.data[is.na(grouped.storm.data$DMG)] = 0
K <- grouped.storm.data$PROPDMGEXP == "K"
M <- grouped.storm.data$PROPDMGEXP == "M"
B <- grouped.storm.data$PROPDMGEXP == "B"
grouped.storm.data[K, ]$DMG <- grouped.storm.data[K, ]$DMG * 1000
grouped.storm.data[M, ]$DMG <- grouped.storm.data[M, ]$DMG * 1000000
grouped.storm.data[B, ]$DMG <- grouped.storm.data[B, ]$DMG * 1000000000

Each group now contains the amount of property damages in dollars so we can add them together, order them by the total damage, reorder the event type factor, and plot the top 10 event types using a bar chart.

top.total.dmg <- grouped.storm.data %>% 
    group_by(EVTYPE) %>%
    summarise(TOTALDMG = sum(DMG)) %>%
    arrange(desc(TOTALDMG))
top.total.dmg$EVTYPE <- factor(top.total.dmg$EVTYPE, levels = as.character(top.total.dmg$EVTYPE))
top.total.dmg
## Source: local data frame [985 x 2]
## 
##               EVTYPE     TOTALDMG
## 1              FLOOD 144657709807
## 2  HURRICANE/TYPHOON  69305840000
## 3            TORNADO  56925660790
## 4        STORM SURGE  43323536000
## 5        FLASH FLOOD  16140812067
## 6               HAIL  15727367053
## 7          HURRICANE  11868319010
## 8     TROPICAL STORM   7703890550
## 9       WINTER STORM   6688497251
## 10         HIGH WIND   5270046295
## ..               ...          ...
ggplot(head(top.total.dmg, 10),
       aes(EVTYPE, TOTALDMG)) +
    labs(title = "Top 10 events by total property damage.",
         y = "Total property damage",
         x = "Event type") +
    geom_bar(stat="identity") +
    coord_flip()

Results

We investigated the most harmful weather events with respect to population health and economic consequences.

When measured by total number of injuries and total number of fatalities, tornadoes are the most harmful to public health with 5633 fatalities and 91346 injuries. Other harmful events are heat, floods and wind. Excessive heat is second w.r.t. fatalities (1903), third are flash floods with 978 fatalities.

When measured by economic consequences, the most harmful event type is flood with over 144 billion dollars in property damages. Second most harmful event type is hurricane/typhoon with almost 70 billion dollars.

Copyright © 2016 thetazero.com All Rights Reserved. Privacy Policy