```
library(ggplot2)
library(ggthemes)
library(plyr)
library(dplyr)
library(xlsx)
library(reshape2)
library(GGally)
library(johntools)
library(knitr)
library(gPdtest)
```

`## [1] "Reading ./query_results_Nov15Dec15_2015.csv"`

`## [1] "Eliminating 0 records with NA for CPU_Usage"`

`## [1] "Eliminating 0 records with CPU_Usage over the max posible"`

`## [1] "Eliminating 0 records with NA wayness"`

`## [1] "Eliminating 0 records with 0 wayness"`

`## [1] "Eliminating 4838 records with nmbw > 1.0"`

`## [1] "Eliminating 8 records with normed CPU_Usage > 1.0"`

`## [1] "Eliminating 56 records with negative average IB BW"`

`## [1] "Eliminating 1210 records with missing average IB BW"`

`## [1] "Eliminating 0 records with missing average Lustre network BW"`

`## [1] "Eliminating 0 records with NA metadatarate"`

`## [1] "Eliminating 0 records with negative MDCReqs"`

`## [1] "Summary of job counts by queue"`

```
## Var1 Freq segment
## 1 development 7089 Development
## 2 gpu 1486 Offload
## 3 gpudev 334 Offload
## 4 large 92 <NA>
## 5 largemem 340 <NA>
## 6 normal 43422 <NA>
## 7 normal-2mic 107 Offload
## 8 normal-mic 985 Offload
## 9 osu 55 Development
## 10 serial 1334 <NA>
## 11 systest 3 Development
## 12 vis 710 Vis
## 13 visdev 122 Vis
```

`## [1] "Deleted jobs from these queues: "`

```
## [1] "development" "gpu" "gpudev" "vis" "visdev"
## [6] "sysdebug" "systest" "osu" "normal-mic" "normal-2mic"
```

```
## segment Freq
## 1 Development 7147
## 2 Offload 2912
## 3 Vis 832
```

`## [1] "Deleting a total of 10891 jobs"`

`## [1] "The cleaned population of jobs over the sample period includes 45,188 jobs"`

Now that the data are clean, we are interested in undertanding a more about the shape of the InfiniBand data in TACC Stats. We?ll look at several metrics in this analysis; the definition of each follows

**job.InternodeIBAveBW**

Per node average bandwidth induced during the job over the IB fabric minus Lustre traffic (bytes/s); The job average BW can be obtained by multiplying by the number of nodes.**job.JobIBAveBW**

This is a derived metric, computed by multiplying job.InternodeIBAveBW (the per node average bandwidth induced during the job) by the number of nodes used in the job, and is the job average bandwidth.

`rawdata$job.JobIBAveBW <- (rawdata$job.InternodeIBAveBW)*(rawdata$job.nodes) `

`summary(rawdata$job.InternodeIBAveBW)`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.810e+02 2.157e+04 2.776e+05 1.831e+08 2.167e+08 2.147e+10
```

Among all jobs, ?job.InternodeIBAveBW? rate spans the range [381.2259553, 2.146835410^{10}]. The plot below shows the base 10 log transformation of job.InternodeIBAveBW by quantile.

`quant_plot(log10(rawdata$job.InternodeIBAveBW),ylabel="log10(InternodeIBAveBW), all jobs",quantiles=c(0,.25,.5,.75,.9,.99,1),color="red")`

**Figure 1**

It is clearer in Figure 2 that the data in this quantile plot are ?weakly? bimodal.

```
plt <- ggplot(rawdata,aes(log10(rawdata$job.InternodeIBAveBW))) + geom_histogram()
print(plt)
```

`## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.`

**Figure 2**

`summary(rawdata$job.JobIBAveBW)`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.810e+02 2.809e+04 6.189e+05 3.024e+09 1.075e+09 4.642e+12
```

Among all jobs, ?job.JobIBAveBW? rate spans the range [381.2259553, 4.641552510^{12}]. The plot below shows the base 10 log transformation of job.JobIBAveBW by quantile.

`quant_plot(log10(rawdata$job.JobIBAveBW),ylabel="log10(job.JobIBAveBW), all jobs",quantiles=c(0,.25,.5,.75,.9,.99,1),color="red")`

**Figure 3**

Figure 4 shows that this data retains its basic shape from Figure 2, although there are some differences (discussed further down).

```
plt <- ggplot(rawdata,aes(log10(rawdata$job.JobIBAveBW))) + geom_histogram()
print(plt)
```

`## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.`

Copyright © 2016 thetazero.com All Rights Reserved. Privacy Policy