- Home
- Neuroconductor Tutorials
- Ct ich seg
Segmentation of Hemorrhagic Stroke in CT data
John Muschelli
2021-02-17
All code for this document is located at here.
Goal
In this tutorial, we will discuss segmentation of X-ray computed tomography (CT) scans. The data is discussed in Validated automatic brain extraction of head CT images" (http://doi.org/10.1016/j.neuroimage.2015.03.074). The data is located at https://archive.data.jhu.edu/dataset.xhtml?persistentId=doi:10.7281/T1/CZDPSX and was from the MISTIE (https://doi.org/10.1016/S1474-4422(16)30234-4Get) and CLEAR (https://doi.org/10.1111/ijs.12097) studies. The MISTIE study focused on patients with intraparenchymal/intracerebral hemorrhage (ICH) and CLEAR focused on intraventricular hemorrhage (IVH), but also has patients with ICH.
Setting up the Dataverse
The JHU archive is a Dataverse archive. We can use the dataverse
package. We will set the DATAVERSE_SERVER
variable as this is the default variable that is used in the dataverse
package. I have set the environment variable JHU_DATAVERSE_API_TOKEN
with the API token for this repository.
library(dataverse)
Sys.setenv("DATAVERSE_SERVER" = "archive.data.jhu.edu")
token = Sys.getenv("JHU_DATAVERSE_API_TOKEN")
With these set up, we can use the dataverse
functions, by passing in key = token
for all functions. Alternatively, we can set:
Sys.setenv("DATAVERSE_KEY" = Sys.getenv("JHU_DATAVERSE_API_TOKEN"))
and not have to set anything again.
Finding the ID of the Dataset
Although we know the DOI is 10.7281/T1/CZDPSX
as we can see this in the URL itself https://archive.data.jhu.edu/dataset.xhtml?persistentId=doi:10.7281/T1/CZDPSX, we will use the dataverse
functionality:
x = dataverse_search("muschelli AND head ct")
1 of 1 result retrieved
doi = x$global_id
doi
[1] "doi:10.7281/T1/CZDPSX"
Listing the Data Files
We will get the tiles from the data set so that we can download individual files and show how to segment a specific scan.
files = dataverse::get_dataset(doi)
files
Dataset (304):
Version: 1.0, RELEASED
Release Date: 2019-09-18T15:58:57Z
License: NONE
21 Files:
label version id contentType
1 00_README.md 6 1241 application/octet-stream
2 00_README.pdf 5 1239 application/pdf
3 01.tar.xz 2 1311 application/x-xz
4 02.tar.xz 1 1283 application/x-xz
5 03.tar.xz 1 1298 application/x-xz
6 04.tar.xz 1 1289 application/x-xz
7 05.tar.xz 1 1288 application/x-xz
8 06.tar.xz 1 1286 application/x-xz
9 07.tar.xz 1 1295 application/x-xz
10 08.tar.xz 1 1306 application/x-xz
11 09.tar.xz 1 1309 application/x-xz
12 10.tar.xz 1 1296 application/x-xz
13 11.tar.xz 1 1297 application/x-xz
14 12.tar.xz 1 1287 application/x-xz
15 13.tar.xz 1 1299 application/x-xz
16 14.tar.xz 1 1291 application/x-xz
17 15.tar.xz 1 1302 application/x-xz
18 16.tar.xz 1 1310 application/x-xz
19 17.tar.xz 1 1284 application/x-xz
20 18.tar.xz 1 1279 application/x-xz
21 19.tar.xz 1 1303 application/x-xz
22 20.tar.xz 1 1305 application/x-xz
23 21.tar.xz 1 1290 application/x-xz
24 22.tar.xz 1 1300 application/x-xz
25 23.tar.xz 1 1308 application/x-xz
26 24.tar.xz 1 1285 application/x-xz
27 25.tar.xz 1 1312 application/x-xz
28 26.tar.xz 1 1280 application/x-xz
29 27.tar.xz 1 1292 application/x-xz
30 28.tar.xz 1 1278 application/x-xz
31 29.tar.xz 1 1294 application/x-xz
32 30.tar.xz 1 1293 application/x-xz
33 31.tar.xz 1 1304 application/x-xz
34 32.tar.xz 1 1301 application/x-xz
35 33.tar.xz 1 1282 application/x-xz
36 34.tar.xz 1 1307 application/x-xz
37 35.tar.xz 1 1281 application/x-xz
38 Demographics.tab 2 1238 text/tab-separated-values
39 ichseg_0.16.1.tar.gz 5 1240 application/x-gzip
We can download the demographics data from the repository so we can see some information about these patients. We will create a wrapper function as the get_file
function always returns a raw vector:
library(readr)
dl_file = function(file, ...) {
outfile = file.path(tempdir(), basename(file))
out = get_file(file, ...)
writeBin(out, outfile)
return(outfile)
}
fname = grep("Demog", files$files$label, value = TRUE)
demo_file = dl_file(fname, dataset = doi)
demo = readr::read_csv(demo_file)
── Column specification ────────────────────────────────────────────────────────────────────────────────────────────────────
cols(
id = col_character(),
age = col_double(),
sex = col_character(),
race = col_character(),
hispanic = col_character(),
dx = col_character(),
site = col_character()
)
head(demo)
# A tibble: 6 x 7
id age sex race hispanic dx site
<chr> <dbl> <chr> <chr> <chr> <chr> <chr>
1 01 50 Female Black/African American Not Hispanic/Lat… ICH with I… 02
2 02 66 Female Black/African American Not Hispanic/Lat… ICH 15
3 03 43 Male Black/African American Not Hispanic/Lat… ICH 10
4 04 70 Male White/Caucasian Not Hispanic/Lat… ICH 14
5 05 78 Male Asian or Pacific Islan… Not Hispanic/Lat… ICH with I… 16
6 06 52 Male Black/African American Not Hispanic/Lat… ICH with I… 02
Here we will grab one patient, download the tarball, and then untar
the files:
library(dplyr)
set.seed(20210217)
run_id = demo %>%
filter(dx == "ICH") %>%
sample_n(1) %>%
pull(id)
fname = paste0(run_id, ".tar.xz")
tarball = dl_file(fname, dataset = doi)
xz_files = untar(tarball, list = TRUE)
Here we create a temporary directory and extract the tarball to that directory. We create a vector of the file names and extract specifically the image and the mask:
tdir = tempfile()
dir.create(tdir)
untar(tarball, exdir = tdir)
nii_files = list.files(path = tdir, recursive = TRUE, full.names = TRUE)
nii_file = nii_files[!grepl("Mask", nii_files) & grepl(".nii.gz", nii_files)]
mask_file = nii_files[grepl("_Mask.nii.gz", nii_files)]
Reading in the Data
Here we read the data into R
into a nifti
object:
library(neurobase)
img = readnii(nii_file)
mask = readnii(mask_file)
ortho2(img)
range(img)
[1] -1024 3068
Here we plot the image and the Winsorized version to see the brain tissue:
ortho2(img, window = c(0, 100))
masked = window_img(mask_img(img, mask))
ortho2(masked)