Thetazero Pubs
Alfiyadata event damage injuries fatalities categories category high storm bygroup

Most harmful weather events for population and economic

Synopsis

In this document the most harmful weather events with respect to population health and econimic are reported.


Data

For analysis U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database was used.

All weather events were divided into 13 groups:

  1. Convection (e.g. tornado, lightning, thunderstorm, hail)
  2. Flood (e.g. flash flood, river flood)
  3. Extreme temperatures (e.g. extreme cold, extreme hot)
  4. Marine (e.g. tsunami, coastal storm, rip current, high waves, high seas)
  5. Winter (e.g. avalanche, snow, blizzard, icy roads, freeze)
  6. Tropical Cyclones (e.g. tropical storm, hurricane)
  7. High Wind (e.g. winds, microburst)
  8. Fire
  9. Rain
  10. Drought/Dust (e.g. drought, dust storm, dust)
  11. Landslide
  12. Fog
  13. Miscellaneous

All events, except Convection, are recorded from 1993 year, so for more fair comparison between events the data starting from 1993 year was used.

Events that did not cause any damage were ignored.

Damage measures

There are 2 types of damage:


Fatalities and injuries were used as a measure of populational damage.

Sum of property and crop damage (in mln $) were used as a measure of economical damage.

Weather events were compared between each other by average annual fatalities, injuries and average annual economical damage.

Event is considered to be most harmful if the damage caused by this event is above the mean damage over all events.

Data Processing

For data processing and analysis R programming language was used.

Load data

The following code downloads the data from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and loads it into a data frame.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2?accessType=DOWNLOAD", 
    "stormData.bz2", "curl", quiet = TRUE)

con <- bzfile("stormData.bz2", "r")
data <- read.csv(con, stringsAsFactors = FALSE)
close(con)

Columns processing

The following code

# keep BGN_DATE PROPDMG CROPDMG EVTYPE INJURIES FATALITIES
i <- which(colnames(data) %in% c("BGN_DATE", "PROPDMG", "CROPDMG", "EVTYPE", 
    "INJURIES", "FATALITIES"))
data <- data[, i]
data$YEAR <- as.integer(format(as.Date(data$BGN_DATE, "%m/%d/%Y 0:00:00"), "%Y"))
data$EVTYPE <- toupper(data$EVTYPE)
# new column described economical damage
data$ECONOMICDMG <- data$PROPDMG + data$CROPDMG

# remove events with no population or economical damage
data <- subset(data, data$FATALITIES > 0 | data$ECONOMICDMG > 0 | data$INJURIES > 
    0)

Aggregate data

We don't need detailed info about every event during the year, so the data was summarized by YEAR & EVTYPE and recorded to byEventYear data frame. The primary data frame was removed after aggregation.

library(plyr)
# aggregate data by YEAR & EVTYPE. This dataset is used below.
byEventYear <- ddply(data[, -1], .(YEAR, EVTYPE), .fun = function(x) {
    return(c(sum(x$FATALITIES), sum(x$ECONOMICDMG), sum(x$INJURIES)))
})
names(byEventYear) <- c("YEAR", "EVTYPE", "FATALITIES", "ECONOMICDMG", "INJURIES")
rm(data)  #delete primary dataset

Group events

There are a lot of event types in the data, so the events were groupped. And the data was aggregated by YEAR and EVGROUP(Event Group) after that.

# this function calculates the group of the event (13 categories)
evcategory <- function(x) {
    ev <- x$EVTYPE[1]
    if (grepl("LIG(H|N)T(N|)ING|TORNADO|T(H|)U(N|)(DER|ER|DEER|DERE)(STORM|STROM|TORM)|TSTM|HAIL", 
        ev)) {
        category <- "Convection"
    } else if (grepl("WINT(ER|RY)|ICE|AVALANC(H|)E|SNOW|BLIZZARD|FREEZ|ICY|FROST", 
        ev)) {
        category <- "Winter"
    } else if (grepl("COLD|HEAT|HOT|TEMPERATURE|COOL|WARM", ev)) {
        category <- "Extreme Temp"
    } else if (grepl("FLOOD| FLD$", ev)) {
        category <- "Flood"
    } else if (grepl("COASTAL|TSUNAMI|RIP CURRENT|MARINE|WATERSPOUT|SURF|SLEET|SEAS|(HIGH|RISING|HEAVY) (WAVES|SWELLS|WATER)", 
        ev)) {
        category <- "Marine"
    } else if (grepl("TROPICAL|HURRICANE|STORM SURGE|TYPHOON", ev)) {
        category <- "Tropical Cyclones"
    } else if (grepl("WIND|MICROBURST", ev)) {
        category <- "High Wind"
    } else if (grepl("FIRE", ev)) {
        category <- "Fire"
    } else if (grepl("RAIN|PRECIP", ev)) {
        category <- "Rain"
    } else if (grepl("DROUGHT|DUST", ev)) {
        category <- "Drought/Dust"
    } else if (grepl("LANDSLIDE|MUD.*SLIDE", ev)) {
        category <- "Landslide"
    } else if (grepl("FOG|VOG", ev)) {
        category <- "Fog"
    } else {
        category <- "Miscellaneous"
    }

    x$EVGROUP <- rep(category, dim(x)[1])
    return(x)
}
byEventYear <- ddply(byEventYear, .(EVTYPE), .fun = evcategory)

# aggregate data by group & year
byGroupYear <- ddply(byEventYear, .(YEAR, EVGROUP), .fun = function(x) {
    return(c(sum(x$FATALITIES), sum(x$ECONOMICDMG), sum(x$INJURIES)))
})
names(byGroupYear) <- c("YEAR", "EVGROUP", "FATALITIES", "ECONOMICDMG", "INJURIES")

Cut the data by year

The following code calculates the first year, when the event were recorded in data.

# calculate average annual damage by group
first.year.of.event <- ddply(byGroupYear, .(EVGROUP), .fun = function(x) {
    return(c(min(x$YEAR)))
})
names(first.year.of.event) <- c("Weather.Event", "First.Year")
# According to this data all events, exept Convection, were recorded from
# 1993 year
library(xtable)
print(xtable(first.year.of.event), include.rownames = FALSE, type = "html")
Weather.Event First.Year
Convection 1950
Drought/Dust 1993
Extreme Temp 1993
Fire 1993
Flood 1993
Fog 1993
High Wind 1993
Landslide 1993
Marine 1993
Miscellaneous 1993
Rain 1993
Tropical Cyclones 1993
Winter 1993

According to the table above all events, exept Convection, were recorded from 1993. So for more fair comparison of events it's better to analyse data starting from 1993.

## So for more fair comparison of events it's better to analyse data starting
## from 1993
byGroupYear <- subset(byGroupYear, YEAR >= 1993)

Final data frame

Yearly fatalities, injuries and yearly economical damage were averaged.

# calculate average annual damage by group
byGroup <- ddply(byGroupYear, .(EVGROUP), .fun = function(x) {
    return(c(mean(x$FATALITIES), mean(x$ECONOMICDMG), mean(x$INJURIES)))
})
names(byGroup) <- c("EVGROUP", "AVG.FATALITIES", "AVG.ECONOMICDMG", "AVG.INJURIES")

Results

Harmful events for population

The following two histogramms show the annual fatalities and injuries respectfully for each weather event.

# plot histogramms
library(ggplot2)
library(scales)
# average annual populational damage by group of event
byGroup$EVGROUP <- with(byGroup, reorder(EVGROUP, -AVG.FATALITIES))
g <- ggplot(byGroup, aes(x = EVGROUP))
g + geom_histogram(aes(weight = AVG.FATALITIES, fill = AVG.FATALITIES), binwidth = 1, 
    color = "black") + ggtitle("Average annual fatalities") + ylab("Fatalities") + 
    xlab("Weather Event") + theme(axis.text.x = element_text(angle = 40, hjust = 1)) + 
    scale_fill_gradient(guide = FALSE, low = "green", high = "red") + geom_line(aes(y = AVG.FATALITIES, 
    group = FALSE), stat = "hline", yintercept = "mean", color = "red")

plot of chunk PopulationDamageHist



# average annual populational damage by group of event
byGroup$EVGROUP <- with(byGroup, reorder(EVGROUP, -AVG.INJURIES))
g <- ggplot(byGroup, aes(x = EVGROUP))
g + geom_histogram(aes(weight = AVG.INJURIES, fill = AVG.INJURIES), binwidth = 1, 
    color = "black") + ggtitle("Average annual injuries") + ylab("Injuries") + 
    xlab("Weather Event") + theme(axis.text.x = element_text(angle = 40, hjust = 1)) + 
    scale_fill_gradient(guide = FALSE, low = "green", high = "red") + geom_line(aes(y = AVG.INJURIES, 
    group = FALSE), stat = "hline", yintercept = "mean", color = "red")

plot of chunk PopulationDamageHist

According to the histograms above the most harmful events with respect to populational health are

byGroup <- byGroup[order(-byGroup$AVG.FATALITIES), ]
mean.f <- mean(byGroup$AVG.FATALITIES)
mean.i <- mean(byGroup$AVG.INJURIES)
i <- (byGroup$AVG.FATALITIES > mean.f) & (byGroup$AVG.INJURIES > mean.i)
print(xtable(cbind(Weather.Event = as.character(byGroup$EVGROUP[i]), Annual.Fatalities = round(byGroup$AVG.FATALITIES[i]), 
    Annual.Injuries = round(byGroup$AVG.INJURIES[i]))), type = "html")
Weather.Event Annual.Fatalities Annual.Injuries
1 Extreme Temp 191 503
2 Convection 155 1884
3 Flood 82 457
4 Winter 47 333

Harmful events for economic

The following histogramm shows the annual economical damage (property + crop damage, in mln $) for each weather event.

# average annual economical damage by group of event
byGroup$EVGROUP <- with(byGroup, reorder(EVGROUP, -AVG.ECONOMICDMG))
g <- ggplot(byGroup, aes(x = EVGROUP))
g + geom_histogram(aes(weight = AVG.ECONOMICDMG, fill = AVG.ECONOMICDMG), binwidth = 1, 
    color = "black") + ggtitle("Average annual economic damage") + ylab("Economical damage, mln $") + 
    xlab("Weather Event") + theme(axis.text.x = element_text(angle = 40, hjust = 1)) + 
    scale_fill_gradient(guide = FALSE, low = "green", high = "red") + scale_y_continuous(labels = comma) + 
    geom_line(aes(y = AVG.ECONOMICDMG, group = FALSE), stat = "hline", yintercept = "mean", 
        color = "red")

plot of chunk EconomicalDamageHist

According to the histogram above the following events have the greatest economic consequences

byGroup <- byGroup[order(-byGroup$AVG.ECONOMICDMG), ]
mean.e <- mean(byGroup$AVG.ECONOMICDMG)
i <- byGroup$AVG.ECONOMICDMG > mean.e
print(xtable(cbind(Weather.Event = as.character(byGroup$EVGROUP[i]), Annual.Damage = round(byGroup$AVG.ECONOMICDMG[i]))), 
    type = "html")
Weather.Event Annual.Damage
1 Convection 328815
2 Flood 148846

Summary

According to the analysis

the extremely harmful events for population are

events caused high damage for population are

the extremely harmful events for economy are

Copyright © 2016 thetazero.com All Rights Reserved. Privacy Policy