Thetazero Pubs
johnwest
library(ggplot2)
library(ggthemes)
library(plyr)
library(dplyr)
library(xlsx)
library(reshape2)
library(GGally)
library(johntools)
library(knitr)
library(gPdtest)

Read, process, and clean the data.

## [1] "Reading ./query_results_Nov15Dec15_2015.csv"
## [1] "Eliminating 0 records with NA for CPU_Usage"
## [1] "Eliminating 0 records with CPU_Usage over the max posible"
## [1] "Eliminating 0 records with NA wayness"
## [1] "Eliminating 0 records with 0 wayness"
## [1] "Eliminating 4838 records with nmbw > 1.0"
## [1] "Eliminating 8 records with normed CPU_Usage > 1.0"
## [1] "Eliminating 56 records with negative average IB BW"
## [1] "Eliminating 1210 records with missing average IB BW"
## [1] "Eliminating 0 records with missing average Lustre network BW"
## [1] "Eliminating 0 records with NA metadatarate"
## [1] "Eliminating 0 records with negative MDCReqs"
## [1] "Summary of job counts by queue"
##           Var1  Freq     segment
## 1  development  7089 Development
## 2          gpu  1486     Offload
## 3       gpudev   334     Offload
## 4        large    92        <NA>
## 5     largemem   340        <NA>
## 6       normal 43422        <NA>
## 7  normal-2mic   107     Offload
## 8   normal-mic   985     Offload
## 9          osu    55 Development
## 10      serial  1334        <NA>
## 11     systest     3 Development
## 12         vis   710         Vis
## 13      visdev   122         Vis
## [1] "Deleted jobs from these queues: "
##  [1] "development" "gpu"         "gpudev"      "vis"         "visdev"     
##  [6] "sysdebug"    "systest"     "osu"         "normal-mic"  "normal-2mic"
##       segment Freq
## 1 Development 7147
## 2     Offload 2912
## 3         Vis  832
## [1] "Deleting a total of  10891  jobs"
## [1] "The cleaned population of jobs over the sample period includes  45,188  jobs"

Introduction

Now that the data are clean, we are interested in undertanding a more about the shape of the InfiniBand data in TACC Stats. We?ll look at several metrics in this analysis; the definition of each follows


rawdata$job.JobIBAveBW <- (rawdata$job.InternodeIBAveBW)*(rawdata$job.nodes) 

Characteristics of job.InternodeIBAveBW

summary(rawdata$job.InternodeIBAveBW)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 3.810e+02 2.157e+04 2.776e+05 1.831e+08 2.167e+08 2.147e+10

Among all jobs, ?job.InternodeIBAveBW? rate spans the range [381.2259553, 2.146835410^{10}]. The plot below shows the base 10 log transformation of job.InternodeIBAveBW by quantile.

quant_plot(log10(rawdata$job.InternodeIBAveBW),ylabel="log10(InternodeIBAveBW), all jobs",quantiles=c(0,.25,.5,.75,.9,.99,1),color="red")

Figure 1

It is clearer in Figure 2 that the data in this quantile plot are ?weakly? bimodal.


plt <- ggplot(rawdata,aes(log10(rawdata$job.InternodeIBAveBW))) + geom_histogram()
print(plt)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Figure 2

Characteristics of job.JobIBAveBW

summary(rawdata$job.JobIBAveBW)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 3.810e+02 2.809e+04 6.189e+05 3.024e+09 1.075e+09 4.642e+12

Among all jobs, ?job.JobIBAveBW? rate spans the range [381.2259553, 4.641552510^{12}]. The plot below shows the base 10 log transformation of job.JobIBAveBW by quantile.

quant_plot(log10(rawdata$job.JobIBAveBW),ylabel="log10(job.JobIBAveBW), all jobs",quantiles=c(0,.25,.5,.75,.9,.99,1),color="red")

Figure 3

Figure 4 shows that this data retains its basic shape from Figure 2, although there are some differences (discussed further down).

plt <- ggplot(rawdata,aes(log10(rawdata$job.JobIBAveBW))) + geom_histogram()
print(plt)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Copyright © 2016 thetazero.com All Rights Reserved. Privacy Policy