Thetazero Pubs
rkromannnum factor fatalities df fatalities levels data raw data economic consequences type damaging types

Synopsis

Using data from the NOAA Storm Database we present the impact of extreme weather events on the health of the population and on the economy. The impact on the health is analyzed by considering the number of fatalities and injuries for each type of event. The impact on the economy is analyzed by calculating the total damage on properties and crops for each type of event. The aim is to find the top 10 most damaging types of extreme weather. In both cases (population health and economy), we find that tornadoes are the most damaging type of extreme weather event.


Data Processing

The data set is downloaded from the source and unzipped. It is then read into a data frame called raw_data.

Next, we issue a str() command to find out what what kind of data we have downloaded.

download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.zip");
unzip("StormData.zip", "StormData.csv");
raw_data=read.csv("StormData.csv");
str(raw_data);
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

By inspecting the output from str() and by consulting Storm Data Documentation, we choose columns FATALITIES, INJURIES, PROPDMG and CROPDMG as the basis for our analysis.

We use tapply to calculate the number of fatalities and injuries and the sum of property damage and crop damage for each type of event and store the result in 4 corresponding variables. We then do a str() on one af the variables.

fatalities=tapply(raw_data$FATALITIES, raw_data$EVTYPE, sum);
injuries=tapply(raw_data$INJURIES, raw_data$EVTYPE, sum);
property_damage=tapply(raw_data$PROPDMG, raw_data$EVTYPE, sum);
crop_damage=tapply(raw_data$CROPDMG, raw_data$EVTYPE, sum);
str(fatalities);
##  num [1:985(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "dimnames")=List of 1
##   ..$ : chr [1:985] "   HIGH SURF ADVISORY" " COASTAL FLOOD" " FLASH FLOOD" " LIGHTNING" ...

It turns out, that there 985 different types of extreme weather. This is clearly not correct, since the Storm Data Documentation only lists 48 different types. If we print out the whole content of the variable fatalities (not included in this report) it turns out that nearly all the entries are of the correct type. The unofficial event types only occur a few times each, and are clearly entered by mistake. However, since we are only interested in the top 10 types out of the 48 possible types we could try to simply ignore the incorrect entries and hope that they will be filtered out automatically. This turns out to be correct, except for two cases, namely TSTM WIND and THUNDERSTORM WINDS. Apparently, TSTM WIND is a common abbreviation for THUNDERSTORM WIND. We therefore replace all entries of TSTM WIND and THUNDERSTORM WINDS in raw_data with THUNDERSTORM WIND.


raw_data$EVTYPE=gsub("TSTM WIND", "THUNDERSTORM WIND", raw_data$EVTYPE);
raw_data$EVTYPE=gsub("THUNDERSTORM WINDS", "THUNDERSTORM WIND", raw_data$EVTYPE)

We then do the tapply() functions again to update the 4 variables.

fatalities=tapply(raw_data$FATALITIES, raw_data$EVTYPE, sum);
injuries=tapply(raw_data$INJURIES, raw_data$EVTYPE, sum);
property_damage=tapply(raw_data$PROPDMG, raw_data$EVTYPE, sum);
crop_damage=tapply(raw_data$CROPDMG, raw_data$EVTYPE, sum);

The two variables fatalities and injuries are combined into a new data frame called population_health. This data frame is sorted by fatalities in descending order. The number of injuries seems to be more or less proportional to the number of fatalities, thus we have made no attempt to sort the data by some sort sort af weighed average between the two, but rather let the number of fatalities by itself determine the ranking among the event types

Likewise, the two variables property_damage and crop_damage are combined into a new data frame called economic_consequences. We add up the property and the crop damage to yield the Total Damage and sort the data frame by this column.

df_fatalities=as.data.frame(as.matrix(fatalities));
df_injuries=as.data.frame(as.matrix(injuries));

population_health=data.frame(df_fatalities,df_injuries$V1);
colnames(population_health)=c("Fatalities", "Injuries");
population_health=population_health[order(-population_health[ ,1], -population_health[ ,2]), ];


df_property_damage=as.data.frame(as.matrix(property_damage));
df_crop_damage=as.data.frame(as.matrix(crop_damage));

economic_consequences=data.frame(df_property_damage, df_crop_damage$V1, df_property_damage$V1+df_crop_damage$V1 );
colnames(economic_consequences)=c("Property Damage", "Crop Damage", "Total Damage");
economic_consequences=economic_consequences[order(-economic_consequences[ ,3]), ];

Finally, we list the top 10 most dangerous types of weather.

head(population_health,10);
##                   Fatalities Injuries
## TORNADO                 5633    91346
## EXCESSIVE HEAT          1903     6525
## FLASH FLOOD              978     1777
## HEAT                     937     2100
## LIGHTNING                816     5230
## THUNDERSTORM WIND        701     9353
## FLOOD                    470     6789
## RIP CURRENT              368      232
## HIGH WIND                248     1137
## AVALANCHE                224      170
head(economic_consequences,10);
##                   Property Damage Crop Damage Total Damage
## TORNADO                   3212258      100019      3312277
## THUNDERSTORM WIND         2659162      194679      2853841
## FLASH FLOOD               1420125      179200      1599325
## HAIL                       688693      579596      1268290
## FLOOD                      899938      168038      1067976
## LIGHTNING                  603352        3581       606932
## HIGH WIND                  324732       17283       342015
## WINTER STORM               132721        1979       134700
## HEAVY SNOW                 122252        2166       124418
## WILDFIRE                    84459        4364        88824

Results

In order to illustrate the result of the analysis, we make a pie chart of the 10 most dangerous types of extreme weather based on the impact on the population health. We see that Tornadoes are by far reponsible for the most fatalities, followed by Excessive heat and Flash flood.

pie(head(population_health$Fatalities,10), labels=rownames(head(population_health,10)), main="The 10 types of extreme weather causing most fatalities");
box(which = "plot", lty = "solid");

plot of chunk unnamed-chunk-7

Likewise, we make a pie chart of the 10 most dangerous types of extreme weather based on Total damage on property and crops. Again, we find that Tornadoes cause the most damage, followed this time by Thunderstorm wind and Flash flood.

pie(head(economic_consequences$"Total Damage",10), labels=rownames(head(economic_consequences,10)), main="The 10 types of extreme weather causing most economic loss");
box(which = "plot", lty = "solid");

plot of chunk unnamed-chunk-8

Copyright © 2016 thetazero.com All Rights Reserved. Privacy Policy