Segmentation of Hemorrhagic Stroke in CT data

All code for this document is located at here.

Goal

In this tutorial, we will discuss segmentation of X-ray computed tomography (CT) scans. The data is discussed in Validated automatic brain extraction of head CT images" (http://doi.org/10.1016/j.neuroimage.2015.03.074). The data is located at https://archive.data.jhu.edu/dataset.xhtml?persistentId=doi:10.7281/T1/CZDPSX and was from the MISTIE (https://doi.org/10.1016/S1474-4422(16)30234-4Get) and CLEAR (https://doi.org/10.1111/ijs.12097) studies. The MISTIE study focused on patients with intraparenchymal/intracerebral hemorrhage (ICH) and CLEAR focused on intraventricular hemorrhage (IVH), but also has patients with ICH.

Setting up the Dataverse

The JHU archive is a Dataverse archive. We can use the dataverse package. We will set the DATAVERSE_SERVER variable as this is the default variable that is used in the dataverse package. I have set the environment variable JHU_DATAVERSE_API_TOKEN with the API token for this repository.

library(dataverse)
Sys.setenv("DATAVERSE_SERVER" = "archive.data.jhu.edu")
token = Sys.getenv("JHU_DATAVERSE_API_TOKEN")

With these set up, we can use the dataverse functions, by passing in key = token for all functions. Alternatively, we can set:

Sys.setenv("DATAVERSE_KEY" = Sys.getenv("JHU_DATAVERSE_API_TOKEN"))

and not have to set anything again.

Finding the ID of the Dataset

Although we know the DOI is 10.7281/T1/CZDPSX as we can see this in the URL itself https://archive.data.jhu.edu/dataset.xhtml?persistentId=doi:10.7281/T1/CZDPSX, we will use the dataverse functionality:

x = dataverse_search("muschelli AND head ct")
1 of 1 result retrieved
doi = x$global_id
doi
[1] "doi:10.7281/T1/CZDPSX"

Listing the Data Files

We will get the tiles from the data set so that we can download individual files and show how to segment a specific scan.

files = dataverse::get_dataset(doi)
files
Dataset (304): 
Version: 1.0, RELEASED
Release Date: 2019-09-18T15:58:57Z
License: NONE
21 Files:
                  label version   id               contentType
1          00_README.md       6 1241  application/octet-stream
2         00_README.pdf       5 1239           application/pdf
3             01.tar.xz       2 1311          application/x-xz
4             02.tar.xz       1 1283          application/x-xz
5             03.tar.xz       1 1298          application/x-xz
6             04.tar.xz       1 1289          application/x-xz
7             05.tar.xz       1 1288          application/x-xz
8             06.tar.xz       1 1286          application/x-xz
9             07.tar.xz       1 1295          application/x-xz
10            08.tar.xz       1 1306          application/x-xz
11            09.tar.xz       1 1309          application/x-xz
12            10.tar.xz       1 1296          application/x-xz
13            11.tar.xz       1 1297          application/x-xz
14            12.tar.xz       1 1287          application/x-xz
15            13.tar.xz       1 1299          application/x-xz
16            14.tar.xz       1 1291          application/x-xz
17            15.tar.xz       1 1302          application/x-xz
18            16.tar.xz       1 1310          application/x-xz
19            17.tar.xz       1 1284          application/x-xz
20            18.tar.xz       1 1279          application/x-xz
21            19.tar.xz       1 1303          application/x-xz
22            20.tar.xz       1 1305          application/x-xz
23            21.tar.xz       1 1290          application/x-xz
24            22.tar.xz       1 1300          application/x-xz
25            23.tar.xz       1 1308          application/x-xz
26            24.tar.xz       1 1285          application/x-xz
27            25.tar.xz       1 1312          application/x-xz
28            26.tar.xz       1 1280          application/x-xz
29            27.tar.xz       1 1292          application/x-xz
30            28.tar.xz       1 1278          application/x-xz
31            29.tar.xz       1 1294          application/x-xz
32            30.tar.xz       1 1293          application/x-xz
33            31.tar.xz       1 1304          application/x-xz
34            32.tar.xz       1 1301          application/x-xz
35            33.tar.xz       1 1282          application/x-xz
36            34.tar.xz       1 1307          application/x-xz
37            35.tar.xz       1 1281          application/x-xz
38     Demographics.tab       2 1238 text/tab-separated-values
39 ichseg_0.16.1.tar.gz       5 1240        application/x-gzip

We can download the demographics data from the repository so we can see some information about these patients. We will create a wrapper function as the get_file function always returns a raw vector:

library(readr)
dl_file = function(file, ...) {
  outfile = file.path(tempdir(), basename(file))
  out = get_file(file, ...)
  writeBin(out, outfile)
  return(outfile)
}
fname = grep("Demog", files$files$label, value = TRUE)
demo_file = dl_file(fname, dataset = doi)
demo = readr::read_csv(demo_file)

── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────
cols(
  id = col_character(),
  age = col_double(),
  sex = col_character(),
  race = col_character(),
  hispanic = col_character(),
  dx = col_character(),
  site = col_character()
)
head(demo)
# A tibble: 6 x 7
  id      age sex    race                    hispanic          dx          site 
  <chr> <dbl> <chr>  <chr>                   <chr>             <chr>       <chr>
1 01       50 Female Black/African American  Not Hispanic/Lat… ICH with I… 02   
2 02       66 Female Black/African American  Not Hispanic/Lat… ICH         15   
3 03       43 Male   Black/African American  Not Hispanic/Lat… ICH         10   
4 04       70 Male   White/Caucasian         Not Hispanic/Lat… ICH         14   
5 05       78 Male   Asian or Pacific Islan… Not Hispanic/Lat… ICH with I… 16   
6 06       52 Male   Black/African American  Not Hispanic/Lat… ICH with I… 02   

Here we will grab one patient, download the tarball, and then untar the files:

library(dplyr)
set.seed(20210217)
run_id = demo %>% 
  filter(dx == "ICH") %>% 
  sample_n(1) %>% 
  pull(id)
fname = paste0(run_id, ".tar.xz")
tarball = dl_file(fname, dataset = doi)
xz_files = untar(tarball, list = TRUE)

Here we create a temporary directory and extract the tarball to that directory. We create a vector of the file names and extract specifically the image and the mask:

tdir = tempfile()
dir.create(tdir)
untar(tarball, exdir = tdir)
nii_files = list.files(path = tdir, recursive = TRUE, full.names = TRUE)
nii_file = nii_files[!grepl("Mask", nii_files) & grepl(".nii.gz", nii_files)]
mask_file = nii_files[grepl("_Mask.nii.gz", nii_files)]

Reading in the Data

Here we read the data into R into a nifti object:

library(neurobase)
img = readnii(nii_file)
mask = readnii(mask_file)
ortho2(img)

range(img)
[1] -1024  3068

Here we plot the image and the Winsorized version to see the brain tissue:

ortho2(img, window = c(0, 100))

masked = window_img(mask_img(img, mask))
ortho2(masked)