Thetazero Pubs
alkim

by Alkim Ozaygen



1- Synopsis:

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration?s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

In this report we tried to show the results of severe weather events on public health and economic problems for communities and municipalities using NOAA storm database. Results show that the most fatalities and injuries were caused by Tornadoes. The economic consequences are represented in bar plots at the end of the report.

2- Data Processing

Downloading and unzipping the data

We will create a folder named ?downloads? and save the zip file as project.csv.bz2

fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("./downloads")){dir.create("./downloads")}
if (!file.exists("./downloads/project.csv.bz2")){download.file(fileUrl, destfile="./downloads/project.csv.bz2", method="curl")}

Loading and preprocessing the data

Import the project data to raw_data data frame.


raw_data <- read.csv("./downloads/project.csv.bz2")

Select necessary columns and copy to data frame filtered_data

filtered_data <- raw_data[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Change the PROPDMGEXP and CROPDMGEXP columns to character

filtered_data$PROPDMGEXP <- as.character(filtered_data$PROPDMGEXP)
filtered_data$CROPDMGEXP <- as.character(filtered_data$CROPDMGEXP)

Find the characters inside PROPDMGEXP and CROPDMGEXP columns

unique(filtered_data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(filtered_data$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

Change the exponent columns (PROPDMGEXP and CROPDMGEXP) values to numeric values (i.e. 3 and ?K? to 1000, 6 or ?M? to 1000000, ?B? to 1000000000) and if there is no value or some nonsense characters to 1, so that it doesn?t have any effect.

filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP == "0"] <- 1 
filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP == "1"] <- 10 
filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP %in% c("H", "h", "2")] <- 100
filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP %in% c("K", "3")] <- 1000
filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP == "4"] <- 10000
filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP == "5"] <- 100000
filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP %in% c("M", "m", "6")] <- 1000000 
filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP %in% c("7")] <- 10000000
filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP %in% c("8")] <- 100000000
filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP %in% c("B")] <- 1000000000
filtered_data$PROPDMGEXP[filtered_data$PROPDMGEXP %in% c("", "+", "?", "-")] <- 1
filtered_data$CROPDMGEXP[filtered_data$CROPDMGEXP == "0"] <- 1 
filtered_data$CROPDMGEXP[filtered_data$CROPDMGEXP == "2"] <- 100 
filtered_data$CROPDMGEXP[filtered_data$CROPDMGEXP %in% c("K", "k")] <- 1000 
filtered_data$CROPDMGEXP[filtered_data$CROPDMGEXP %in% c("M", "m")] <- 1000000 
filtered_data$CROPDMGEXP[filtered_data$CROPDMGEXP == "B"] <- 1000000000
filtered_data$CROPDMGEXP[filtered_data$CROPDMGEXP %in% c("?", "")] <- 1

Multiply PROPDMG and PROPDMGEXP and assign the value to the new column PROPDAMAGE

filtered_data$PROPDAMAGE <- as.numeric(filtered_data$PROPDMG) * as.numeric(filtered_data$PROPDMGEXP)

Multiply CROPDMG and CROPDMGEXP and assign the value to the new column CROPDAMAGE

filtered_data$CROPDAMAGE <- as.numeric(filtered_data$CROPDMG) * as.numeric(filtered_data$CROPDMGEXP)

Copy the filtered_data dataframe to cleaned_data dataframe.

cleaned_data <- filtered_data

Remove the unnecessary column (i.e. PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

cleaned_data$PROPDMG <- NULL
cleaned_data$PROPDMGEXP <- NULL
cleaned_data$CROPDMG <- NULL
cleaned_data$CROPDMGEXP <- NULL

3- Results

Effects on Public Health

Display the top ten events resulting in fatalities

health_f <- aggregate(FATALITIES ~ EVTYPE, cleaned_data, sum)
health_f <- health_f[order(health_f$FATALITIES, decreasing = TRUE), ]
head(health_f, 10)
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

Display the top ten events resulting in injuries

health_i <- aggregate(INJURIES ~ EVTYPE, cleaned_data, sum)
health_i <- health_i[order(health_i$INJURIES, decreasing = TRUE), ]
head(health_i, 10)
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

Load the necessary libraries for graphs

library(ggplot2)
library(gridExtra)

Graph showing the five most harmful events with respect to population health.

hf <- ggplot(data = head(health_f, 5), aes(x = reorder(EVTYPE, FATALITIES), y = FATALITIES)) +
    geom_bar(stat = "identity", fill = "red") +
    geom_text(aes(label = FATALITIES), hjust = 1.1, color="white", size = 3.5) +
    xlab("Event Type") +
    ylab("Total Number of Fatalities") +
    ggtitle("Most Harmful Events Resulting in Fatalities") +
    coord_flip()
    
hi <- ggplot(data = head(health_i, 5), aes(x = reorder(EVTYPE, INJURIES), y = INJURIES)) +
    geom_bar(stat ="identity", fill = "blue") +
    geom_text(aes(label=INJURIES), hjust = 1.1, color = "white", size = 3.5) +
    xlab("Event Type") +
    ylab("Total Number of Injuries") +
    ggtitle("Most Harmful Events Resulting in Injuries") +
    coord_flip()
    
grid.arrange(hf , hi, nrow = 2, ncol = 1)

Economic Consequences

Display the top ten events resulting in property damages

econ_p <- aggregate(PROPDAMAGE ~ EVTYPE, cleaned_data, sum)
econ_p <- econ_p[order(econ_p$PROPDAMAGE, decreasing = TRUE), ]
head(econ_p, 10)
##                EVTYPE   PROPDAMAGE
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56947381878
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822676332
## 244              HAIL  15735269634
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497260
## 359         HIGH WIND   5270046295

Display the top ten events resulting in crop damages

econ_c <- aggregate(CROPDAMAGE ~ EVTYPE, cleaned_data, sum)
econ_c <- econ_c[order(econ_c$CROPDAMAGE, decreasing = TRUE), ]
head(econ_c, 10)
##                EVTYPE  CROPDAMAGE
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954473
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000

Graph showing the five most harmful events with respect to economic consequences.

ep <- ggplot(data = head(econ_p, 5), aes(x = reorder(EVTYPE, PROPDAMAGE), y = PROPDAMAGE)) +
    geom_bar(stat = "identity", fill = "red") +
    geom_text(aes(label = PROPDAMAGE), hjust = 1.1, color="white", size = 3.5) +
    xlab("Event Type") +
    ylab("Cost of Damage (in $)") +
    ggtitle("Most Harmful Events Resulting in Property Damage") +
    coord_flip()
    
ec <- ggplot(data = head(econ_c, 5), aes(x = reorder(EVTYPE, CROPDAMAGE), y = CROPDAMAGE)) +
    geom_bar(stat ="identity", fill = "blue") +
    geom_text(aes(label=CROPDAMAGE), hjust = 1.1, color = "white", size = 3.5) +
    xlab("Event Type") +
    ylab("Cost of Damage (in $)") +
    ggtitle("Most Harmful Events Resulting in Crop Damage") +
    coord_flip()
    
grid.arrange(ep, ec, nrow = 2, ncol = 1)

Conclusions

We can conclude that the most harmful event which causes the maximum number of fatalities and injuries is Tornadoes. This is followed by Excessive Heat for the number of fatalities and Thunderstorm Wind for the number of injuries. On the economical side the most harmful event which causes the maximum property damage is Flood. This is followed by Hurricanes and Typhoons. It is also found that Droughts causes the maximum crop damage, followed by Floods.


Copyright © 2016 thetazero.com All Rights Reserved. Privacy Policy