This document outlines step-by-step how we built our pipeline to obtain atlas-wise individual-level brain representations. This is my attempt to make the decisions processes as transparent as possible.
The final pipeline used to process individual-level neuroimaging data is here.
The UK Biobank study provides field ID 20263 as processed neuroimaging data in FreeSurfer formats. We use field 20263 to extract our atlases and vertex-wise measures. As the data is very large, we proceed to process each participant individually (about 300 in parallel), so we can then save the information we need, and delete the rest to make space for more data.
We downloaded the software here, and installed it on our Linux system. The software will only work if a license file is generated and saved in the FreeSurfer directory.
#download
wget https://surfer.nmr.mgh.harvard.edu/pub/dist/freesurfer/6.0.0/freesurfer-Linux-centos6_x86_64-stable-pub-v6.0.0.tar.gz
#unpack
tar -C /users/anna -xzvf freesurfer-Linux-centos6_x86_64-stable-pub-v6.0.0.tar.gz
# rename
mv freesurfer freesurfer-6.0.0
# setup
export FREESURFER_HOME=/users/anna/freesurfer-6.0.0
source $FREESURFER_HOME/SetUpFreeSurfer.sh
# remove downloaded file
rm freesurfer-Linux-centos6_x86_64-stable-pub-v6.0.0.tar.gz
The Desikan-Killiany and the Destrieux atlases have previously been processed by UKB and are provided as anatomical stats as part of field 20263 (stats/?h.aparc.a2009s.stats for Destrieux; stats/?h.aparc.stats for DK). For all other atlases, we need annotation files in order to be able to map then onto our individuals brains, and to know which vertices fall into specific regions-of-interest (ROIs). Ideally, we use .gcs mapping files, because the mapping of the atlas onto each participant is straightforward (find an illustration here). We obtain .gcs files for most of our atlases from the multiAtlasTT repository here. However, we could only find annotation files in .annot format for the Julich-Brain (here), and Yeo (?h.Yeo2011_17Networks.annot files are included when downloading FreeSurfer software). To process the latter, we use a pipeline from the CJNeuroLab. Throughout this document, I may sometimes refer to this pipeline as the Barcelona pipeline.
####################################################################################
### Process mris_ca_label script to obtain atlases from gcs files
### illustration of what it does here: https://surfer.nmr.mgh.harvard.edu/fswiki/SurfaceLabelAtlas
####################################################################################
for hemisphere in lh rh
do
for atlasName in schaefer500-yeo17 hcp-mmp-b gordon333dil
do
echo $hemisphere
echo $atlasName
# 2) create annot file for each subject
${fsdir}/bin/mris_ca_label -sdir ${wd} ${subjID} $hemisphere ${wd}/${subjID}/surf/$hemisphere.sphere.reg ${atlasFolder}/$atlasName/$hemisphere.${atlasName}_6p0.gcs ${wd}/${subjID}/${hemisphere}.${atlasName}.${subjID}.annot
# 3) create stat file
${fsdir}/bin/mris_anatomical_stats -a ${wd}/${subjID}/${hemisphere}.${atlasName}.${subjID}.annot -f ${wd}/${subjID}/stats/$hemisphere.$atlasName.stats -b ${subjID} $hemisphere
done
done
echo "Done schaefer500-yeo17 hcp-mmp-b gordon333dil"
The output files we get from this step can be transformed into OSCA format with the script below (we use OSCA to calculate morphometricity estimates).
The OSCA format requires the following headers as described here:
individual ID and names of probes (e.g. FID IID cg00000658 cg26036652 cg00489772 …)
So, we are aiming for a file in which we have the following headers:
IID ROI1 ROI2 ROI3 etc.
The script outputs one file for each participant with 1 row and these headers. We will later be able to easily cbind
all the result into one big file in OSCA format (I chose to do it this way because I am launching parallel scripts processing many participants simultaneously, and would be risking to lose data if the different arrays were editing one file simultaneously).
Launch script to format .gcs-processed files with:
Rscript save_FS_stats_in_OSCA_format.R ${wd} ${subjID} ${od}/indiv_output_OSCA
# launch this as:
# Rscript save_FS_stats_in_OSCA_format.R ${wd} ${subjID} ${od}/indiv_output_OSCA
# the script assumes that object "ref" (below) incorporates all required atlases
args <- commandArgs(trailingOnly = TRUE)
print(args[1])
print(args[2])
print(args[3])
subjectdir <- args[1]
subjectID <- args[2]
outputdir <- args[3]
# dependencies
packages <-c("data.table","berryFunctions")
package.check <- lapply(
packages,
FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)
print("All required R packages are loaded")
# This assumes FS directory
setwd(paste0(subjectdir,"/",subjectID,"/stats/"))
files = c(".aparc.stats",".aparc.a2009s.stats",".gordon333dil.stats",".hcp-mmp-b.stats",".schaefer500-yeo17.stats")
atlases = c("DK","Des","Gordon","Glasser","Schaefer")
ref <- cbind(files, atlases)
# repeating by atlas
for(i in 1:nrow(ref)){
# read left stats file (we read left and right hemisphere because Gordon atlas for example has different numbers of regions in each hemisphere)
lh.stats <- as.data.frame(read.table(paste0(subjectdir,"/",subjectID,"/stats/lh", as.character(ref[i,1]), sep="")))
# name columns
names(lh.stats) <- c("StructName","NumVert","SurfArea","GrayVol","ThickAvg","ThickStd","MeanCurv","GausCurv","FoldInd","CurvInd")
# read right stats file
rh.stats <- as.data.frame(read.table(paste0(subjectdir,"/",subjectID,"/stats/rh", as.character(ref[i,1]), sep="")))
# name columns
names(rh.stats) <- c("StructName","NumVert","SurfArea","GrayVol","ThickAvg","ThickStd","MeanCurv","GausCurv","FoldInd","CurvInd")
# read stats file in to get structnames
lh.areas <- paste0("lh.",as.character(lh.stats$StructName))
rh.areas <- paste0("rh.",as.character(rh.stats$StructName))
# in Schaefer atlas we still have Background and Medial wall as a variable which we don't want to include
if(as.character(ref[i, "atlases"]) == "Schaefer"){
lh.areas <- lh.areas[which(lh.areas !="lh.Background+FreeSurfer_Defined_Medial_Wall")]
rh.areas <- rh.areas[which(rh.areas !="rh.Background+FreeSurfer_Defined_Medial_Wall")]
}
areas <- append(lh.areas,rh.areas)
## now we create empty frames and save values in it
# repeating three times for surfarea, thickness and volume
for(meas in c("SurfArea","ThickAvg","GrayVol")){
#print(meas)
# the modality we're interested in: eg. DK_Volume or Glasser_Thickness
modality <-paste0(as.character(ref[i,2]),"_",meas)
print(modality)
# create empty matrix
frame <- as.data.frame(matrix(nrow=1, ncol = length(areas)+1))
names(frame) <- append("IID",areas)
frame[1,"IID"] <- subjectID
row.names(frame) <- subjectID
# show dimensions of the frame
print(ref[i,2])
print(dim(frame))
# take region name and value for modality
lh.areas <- paste0("lh.",as.character(lh.stats$StructName))
lh.vector <- lh.stats[,meas]
rh.areas <- paste0("rh.",as.character(rh.stats$StructName))
rh.vector <- rh.stats[,meas]
# in Schaefer atlas we still have Background and Medial wall as a variable which we don't want to include
if(as.character(ref[i, "atlases"]) == "Schaefer"){
lh.areas <- lh.areas[which(lh.areas !="lh.Background+FreeSurfer_Defined_Medial_Wall")]
rh.areas <- rh.areas[which(rh.areas !="rh.Background+FreeSurfer_Defined_Medial_Wall")]
}
vector <- append(lh.vector,rh.vector)
test_areas <- append(lh.areas, rh.areas)
# check that regions in the value file are in the same order as the output file (not applicable anymore)
if(!all(areas == test_areas)){print(paste0("Areas in subject file and output template file saved in ",modality, " don't include the same areas")); break}
# copy measures into frame into the row corresponding to subjectID
frame <- insertRows(frame, which(row.names(frame) == subjectID),append(subjectID, vector))
# print what's been saved in the file
print("This has been saved in our output file")
print(frame[which(row.names(frame) == subjectID),])
# save this file, and overwrite previous one
fwrite(frame[1,], file = paste0(outputdir,"/",subjectID,"_", modality), na=NA, quote=FALSE, col.names=T, row.names=F, sep = "\t")
rm("frame")
}
print("Output above should correspond to these stats files")
print(head(lh.stats))
print(head(rh.stats))
}
print(paste0("Done saving results from ",subjectID," as outfiles in ",outputdir))
####################################################################################################
### Run Barcelona pipeline for Yeo and Julich atlases
### https://cjneurolab.org/2016/11/22/hcp-mmp1-0-volumetric-nifti-masks-in-native-structural-space/
####################################################################################################
# create file containing subjectname as input to pipeline
cd ${wd}
echo $subjID > ${wd}/target_${subjID}.txt
# start loop to process Julich and Yeo
for atlasName in JulichBrain Yeo2011_17Networks test.aparc
do
echo ${atlasName}
bash create_subj_volume_parcellation.sh -L ${wd}/target_${subjID}.txt -f 1 -l 1 -a ${atlasName} -d ${wd}/${subjID} -s NO -m NO -t YES
# move into stats folder and rename
mv ${wd}/${subjID}/tables/lh.${atlasName}.${subjID}.txt ${wd}/${subjID}/stats/lh.${atlasName}.stats
mv ${wd}/${subjID}/tables/rh.${atlasName}.${subjID}.txt ${wd}/${subjID}/stats/rh.${atlasName}.stats
done
# remove subject_list created above
rm ${wd}/target_${subjID}.txt
echo "Done Julich-Brain & Yeo" $subjID
Launch script to format .annot-processed files with:
Rscript save_FS_stats_in_OSCA_format_Yeo_Julich.R ${wd} ${subjID} ${od}/indiv_output_OSCA
# the script assumes that object "ref" (below) incorporates all required atlases
args <- commandArgs(trailingOnly = TRUE)
print(args[1])
print(args[2])
print(args[3])
subjectdir <- args[1]
subjectID <- args[2]
outputdir <- args[3]
# dependencies
packages <-c("data.table","berryFunctions")
package.check <- lapply(
packages,
FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)
print("All required R packages are loaded")
# This assumes FS directory
setwd(paste0(subjectdir,"/",subjectID,"/stats/"))
files = c(".Yeo2011_17Networks.stats",".JulichBrain.stats",".test.aparc.stats")
atlases = c("Yeo","JulichBrain","test.aparc")
ref <- cbind(files, atlases)
# repeating by atlas
for(i in 1:nrow(ref)){
# read left stats file (we read left and right hemisphere because Gordon atlas for example has different numbers of regions in each hemisphere)
lh.stats <- as.data.frame(read.table(paste0(subjectdir,"/",subjectID,"/stats/lh", as.character(ref[i,1]), sep=""), skip=15))
# name columns
names(lh.stats) <- c("NumVert","SurfArea","GrayVol","ThickAvg","ThickStd","MeanCurv","GausCurv","FoldInd","CurvInd","StructName")
# read right stats file
rh.stats <- as.data.frame(read.table(paste0(subjectdir,"/",subjectID,"/stats/rh", as.character(ref[i,1]), sep=""), skip=15))
# name columns
names(rh.stats) <- c("NumVert","SurfArea","GrayVol","ThickAvg","ThickStd","MeanCurv","GausCurv","FoldInd","CurvInd","StructName")
# read stats file in to get structnames
lh.areas <- paste0("lh.",as.character(lh.stats$StructName))
rh.areas <- paste0("rh.",as.character(rh.stats$StructName))
# in Schaefer atlas we still have Background and Medial wall as a variable which we don't want to include
if(as.character(ref[i, "atlases"]) == "Yeo"){
lh.areas <- lh.areas[which(lh.areas !="lh.FreeSurfer_Defined_Medial_Wall")]
rh.areas <- rh.areas[which(rh.areas !="rh.FreeSurfer_Defined_Medial_Wall")]
}
if(as.character(ref[i, "atlases"]) == "JulichBrain"){
lh.areas <- lh.areas[which(lh.areas !="lh.unlabelled")]
rh.areas <- rh.areas[which(rh.areas !="rh.unlabelled")]
}
# store all area names in the same vector
areas <- append(lh.areas,rh.areas)
## now we create empty frames and save values in it
# repeating three times for surfarea, thickness and volume
for(meas in c("SurfArea","ThickAvg","GrayVol")){
#print(meas)
# the modality we're interested in: eg. DK_Volume or Glasser_Thickness
modality <-paste0(as.character(ref[i,2]),"_",meas)
print(modality)
# create empty matrix
frame <- as.data.frame(matrix(nrow=1, ncol = length(areas)+1))
names(frame) <- append("IID",areas)
frame[1,"IID"] <- subjectID
row.names(frame) <- subjectID
# show dimensions of the frame
print(ref[i,2])
print(dim(frame))
# take region name and value for modality
lh.areas <- paste0("lh.",as.character(lh.stats$StructName))
lh.vector <- lh.stats[,meas]
rh.areas <- paste0("rh.",as.character(rh.stats$StructName))
rh.vector <- rh.stats[,meas]
# in Schaefer atlas we still have Background and Medial wall as a variable which we don't want to include
if(as.character(ref[i, "atlases"]) == "Yeo"){
lh.areas <- lh.areas[which(lh.areas !="lh.FreeSurfer_Defined_Medial_Wall")]
rh.areas <- rh.areas[which(rh.areas !="rh.FreeSurfer_Defined_Medial_Wall")]
}
if(as.character(ref[i, "atlases"]) == "JulichBrain"){
lh.areas <- lh.areas[which(lh.areas !="lh.unlabelled")]
rh.areas <- rh.areas[which(rh.areas !="rh.unlabelled")]
}
vector <- append(lh.vector,rh.vector)
test_areas <- append(lh.areas, rh.areas)
# check that regions in the value file are in the same order as the output file (not applicable anymore)
if(!all(areas == test_areas)){print(paste0("Areas in subject file and output template file saved in ",modality, " don't include the same areas")); break}
# copy measures into frame into the row corresponding to subjectID
frame <- insertRows(frame, which(row.names(frame) == subjectID),append(subjectID, vector))
# print what's been saved in the file
#print("This has been saved in our output file")
#print(frame[which(row.names(frame) == subjectID),])
# save this file, and overwrite previous one
fwrite(frame[1,], file = paste0(outputdir,"/",subjectID,"_", modality), na=NA, quote=FALSE, col.names=T, row.names=F, sep = "\t")
rm("frame")
}
#print("Output above should correspond to these stats files")
#print(head(lh.stats))
#print(head(rh.stats))
}
print(paste0("Done saving results from ",subjectID," as outfiles in ",outputdir))
Note that here we also process the “test_aparc” atlas. I have only introduced this to get a quantification of how accurate the Barcelona pipeline works, but we won’t get morphometricity estimates from it lateron. I process the Desikan-Killiany atlas through the Barcelona pipeline, and then compare the output per region with the results as downloaded from UKB.
#!/usr/bin/env bash
# Creates subject-level parcellation image from annotation files in fsaverage space. Can be used with the HCP-MMP1.0 projected on fsaverage annotation files available from https://figshare.com/articles/HCP-MMP1_0_projected_on_fsaverage/3498446
# usage:
# bash create_subj_volume_parcellation -L <subject_list> -a <name_of_annot_file> -f <first_subject_row> -l <last_subject_row> -d <name_of_output_dir>
#
#
# HOW TO USE
#
# Ingredients:
#
# Subject data. First of all, you need to have your subjects’ structural data preprocessed with FreeSurfer.
# Shell script. Download the script: create_subj_volume_parcellation, unzip it (because wordpress won’t upload .sh files directly), and copy it to to your $SUBJECTS_DIR/ folder.
# Fsaverage data. Copy the fsaverage folder from the FreeSurfer directory ($FREESURFER_HOME/subjects/fsaverage) to your $SUBJECTS_DIR/ folder.
# Annotation files. Download rh.HCPMMP1.annot and lh.HCPMMP1.annot from https://figshare.com/articles/HCP-MMP1_0_projected_on_fsaverage/3498446. Copy them to your $SUBJECTS_DIR/ folder or to $SUBJECTS_DIR/fsaverage/label/.
# Subject list. Create a list with the identifiers of the desired target subjects (named exactly as their corresponding names in $SUBJECTS_DIR/, of course).
#
# Instructions:
#
# Launch the script: bash create_subj_volume_parcellation.sh (this will show the compulsory and optional arguments).
# The compulsory arguments are:
# -L subject_list_name
# -a name_of_annotation_file (without hemisphere or extension; in this case, HCPMMP1)
# -d name_of_output_dir (will be created in $SUBJECTS_DIR)
# Optional arguments:
# -f and -l indicate the first and last subjects in the subject list to be processed. Eg, in order to process the third till the fifth subject, one would enter -f 3 -l 5 (whole thing takes a bit of time, so one might want to launch it in separate terminals for speed)
# -m (YES or NO, default NO) indicates whether individual volume files for each parcellation region should be created. This requires FSL
# -s (“YES” or “NO”, default is NO) indicates whether individual volume files for each subcortical aseg region should be created. Also requires FSL, and requires that the FreeSurferColorLUT.txt file be present at the base (subjects) folder
# -t (YES or NO, default YES) indicates whether to create anatomical stats table (number of vertices, area, volume, mean thickness, etc.) per region
# Output:
# An output folder named as specified with the -d option will be created, which will contain a directory called label/, where the labels for the regions projected on fsaverage will be stored. The output directory will also contain a folder for each subject. Inside these subject folders, a .nii.gz file named as # the annotation file (-a option) will contain the final parcellation volume. A look-up table will also be created inside each subject’s folder, named LUT_<name_of_annotation_file>.txt. In each subject’s folder, a directory called label/ will also be created, where the transformed labels will be stored
# If the -m option is set to YES, each subject’s directory will also contain a masks/ directory containing one volume .nii.gz file for each binary mask
# If the -s option is set to YES, an aseg_masks/ directory will be created, containing one .nii.gz file for each subcortical region
# Inside the original subjects’ label folders, post-transformation annotation files will be created. These are not overwritten if the script is relaunched; so, if you ran into a problem and want to start over, you should delete these files (named lh(rh).<subject>_<name_of_annotation_file>.annot)
# define compulsory and optional arguments
while getopts ":L:f:l:a:d::m:t:s:" o; do
case "${o}" in
L)
L=${OPTARG}
;;
f)
f=${OPTARG}
;;
l)
l=${OPTARG}
;;
a)
a=${OPTARG}
;;
d)
d=${OPTARG}
;;
m)
m=${OPTARG}
;;
t)
t=${OPTARG}
;;
s)
s=${OPTARG}
;;
esac
done
if [ -z "${L}" ] || [ -z "${a}" ] || [ -z "${d}" ]; then printf '\n Usage:\n -To be run from the directory containing FreeSurfer subject folders, which should also contain the fsaverage folder. User must have writing permission. Original annotation files (eg, lh.HCPMMP1.annot, rh.HCPMMP1.annot) must be present in $SUBJECTS_DIR/fsaverage/label/, or in the base folder ($SUBJECTS_DIR/).\n -Output: individual nifti volume, where regions are indicated by voxel values, is stored in each subject·s folder inside the output directory. Regions can then be identified through the region_index_table.txt stored in the output folder. Final annotation file in subject space is stored in original subject·s folder ($SUBJECTS_DIR/subject/label/), and will NOT ovewrite old files.\n\n Compulsory arguments:\n -L <subject_list> (names must correspond to names of folders in $SUBJECTS_DIR)\n -a <name_of_input_annot_file> (without specifying hemisphere and without extension. Usually, HCPMMP1)\n -d <name_of_ouput_dir> \n\n Optional arguments:\n -f: row in subject list indicating first subject to be processed\n -l: row in subject list indicating last subject to be processed\n -m <YES or NO> create individual nii.gz masks for cortical regions (requires FSL. Default is NO. Masks will be saved in /output_dir/subject/masks/)\n -s <YES or NO> create individual nii.gz masks for 14 subcortical regions (from the FreeSurfer automatic segmentation. Requires the FreeSurferColorLUT.txt file in the base folder. Requires FSL. Defaults is NO)\n -t <YES or NO> generate anatomical stats (mean area, volume, thickness, etc., per region. Saved in /output_dir/subject/tables/. Default is YES) table\n\n 2018 CJNeurolab\n University of Barcelona\n by Hugo C Baggio & Alexandra Abos\n\n'; exit 1; fi
create_individual_masks="NO"
annotation_file=$a
subject_list_all=$L
output_dir=$d
get_anatomical_stats="YES"
create_aseg_files="NO"
if [ ! -z "${f}" ] ; then first=$f; else first=1; fi
if [ ! -z "${l}" ] ; then last=$l; else last=`wc -l < ${subject_list_all}`; fi
if [ ! -z "${m}" ] ; then create_individual_masks=$m; fi
if [ ! -z "${s}" ] ; then create_aseg_files=$s; fi
if [ ! -z "${t}" ] ; then get_anatomical_stats=$t; fi
printf "\n >>>> Current FreeSurfer subjects folder is $SUBJECTS_DIR\n\n"
#Check if FreeSurferColorLUT.txt is present in base folder
if [[ ${create_aseg_files} == "YES" ]]; then if [[ ! -e FreeSurferColorLUT.txt ]]; then printf " >>>> ERROR: FreeSurferColorLUT.txt file not found. Subcortical masks will NOT be created\n\n"; create_aseg_files=NO; colorlut_miss=YES; fi; fi
# Create subject list with subjects defined in the input
#sed -n "${first},${last} p" ${subject_list_all} > temp_subject_list_${first}_${last}
#subject_list=temp_subject_list_${first}_${last}
# MODIFICATION, AEF, 22.11.2021, for simplicity
subject_list=$L
#mkdir -p ${output_dir}
mkdir -p ${output_dir}/label_new
rand_id=$RANDOM
mkdir -p ${output_dir}/temp_${first}_${last}_${rand_id}
rm -f ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_?
rm -f ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}?
# Check whether original annotation files are in fsaverage/label folder, copy them if not
if [[ ! -e $SUBJECTS_DIR/fsaverage/label/lh.${annotation_file}.annot ]]
then cp $SUBJECTS_DIR/lh.${annotation_file}.annot $SUBJECTS_DIR/fsaverage/label/
fi
if [[ ! -e $SUBJECTS_DIR/fsaverage/label/rh.${annotation_file}.annot ]]
then cp $SUBJECTS_DIR/rh.${annotation_file}.annot $SUBJECTS_DIR/fsaverage/label/
fi
# Convert annotation to label, and get color lookup tables
rm -f ./${output_dir}/log_annotation2label
/users/anna/freesurfer-6.0.0/bin/mri_annotation2label --subject fsaverage --hemi lh --outdir ${output_dir}/label_new --annotation ${annotation_file} >> ${output_dir}/temp_${first}_${last}_${rand_id}/log_annotation2label
/users/anna/freesurfer-6.0.0/bin/mri_annotation2label --subject fsaverage --hemi lh --outdir ${output_dir}/label_new --annotation ${annotation_file} --ctab ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_L1 >> ${output_dir}/temp_${first}_${last}_${rand_id}/log_annotation2label
/users/anna/freesurfer-6.0.0/bin/mri_annotation2label --subject fsaverage --hemi rh --outdir ${output_dir}/label_new --annotation ${annotation_file} >> ${output_dir}/temp_${first}_${last}_${rand_id}/log_annotation2label
/users/anna/freesurfer-6.0.0/bin/mri_annotation2label --subject fsaverage --hemi rh --outdir ${output_dir}/label_new --annotation ${annotation_file} --ctab ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_R1 >> ${output_dir}/temp_${first}_${last}_${rand_id}/log_annotation2label
# Remove number columns from ctab
awk '!($1="")' ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_L1 >> ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_L2
awk '!($1="")' ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_R1 >> ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_R2
# Create list with region names
awk '{print $2}' ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_L1 > ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}L1
awk '{print $2}' ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_R1 > ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}R1
echo "Done mri_annotation2label"
## MODIFICATION AEF: 22.11.2021
# Figured out that the label names in ./label don't match the label names in list_labels_${annotation_file}L because the label names are seperated by spaces
# The following line gets rid of spaces
if [ $annotation_file = "JulichBrain" ]
then
cd ${output_dir}/label_new
for f in *\ *; do mv "$f" "${f// /}"; done
fi
# Create lists with regions that actually have corresponding labels
for labelsL in `cat ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}L1`
do if [[ -e ${output_dir}/label_new/lh.${labelsL}.label ]]
then
echo lh.${labelsL}.label >> ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}L
grep " ${labelsL} " ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_L2 >> ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_L3
fi
done
echo "Done past line 141"
for labelsR in `cat ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}R1`
do if [[ -e ${output_dir}/label_new/rh.${labelsR}.label ]]
then
echo rh.${labelsR}.label >> ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}R
grep " ${labelsR} " ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_R2 >> ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_R3
fi
done
# Create new numbers column
number_labels_R=`wc -l < ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}R`
number_labels_L=`wc -l < ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}L`
for ((i=1;i<=${number_labels_L};i+=1))
do num=`echo "${i}+1000" | bc`
printf "$num\n" >> ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_number_table_${annotation_file}L
printf "$i\n" >> ${output_dir}/temp_${first}_${last}_${rand_id}/${annotation_file}_number_tableL
done
for ((i=1;i<=${number_labels_R};i+=1))
do num=`echo "${i}+2000" | bc`
printf "$num\n" >> ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_number_table_${annotation_file}R
printf "$i\n" >> ${output_dir}/temp_${first}_${last}_${rand_id}/${annotation_file}_number_tableR
done
# Create ctabs with actual regions
paste ${output_dir}/temp_${first}_${last}_${rand_id}/${annotation_file}_number_tableL ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_L3 > ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_L
paste ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_number_table_${annotation_file}L ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}L > ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_left_${annotation_file}
paste ${output_dir}/temp_${first}_${last}_${rand_id}/${annotation_file}_number_tableR ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_R3 > ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_R
paste ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_number_table_${annotation_file}R ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}R > ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_right_${annotation_file}
cat ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_left_${annotation_file} ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_right_${annotation_file} > ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_${annotation_file}.txt
echo "Done past line 175"
# Take labels from fsaverage to subject space
cd $SUBJECTS_DIR
for subject in `cat ${subject_list}`
do printf "\n >>>> PREPROCESSING ${subject} <<<< \n"
echo $(date) > ${output_dir}/temp_${first}_${last}_${rand_id}/start_date
echo " >>>> START TIME: `cat ${output_dir}/temp_${first}_${last}_${rand_id}/start_date` <<<<"
#mkdir -p ${output_dir}/${subject}
#mkdir -p ${output_dir}/${subject}/label
sed '/_H_ROI/d' ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_${annotation_file}.txt > ${output_dir}/LUT_${annotation_file}.txt
if [[ -e $SUBJECTS_DIR/${subject}/label/lh.${subject}_${annotation_file}.annot ]] && [[ -e $SUBJECTS_DIR/${subject}/label/rh.${subject}_${annotation_file}.annot ]]
then
echo ">>>> Annotation files lh.${subject}_${annotation_file}.annot and rh.${subject}_${annotation_file}.annot already exist in ${subject}/label. Won't perform transformations"
else
rm -f $SUBJECTS_DIR/${subject}/label2annot_${annotation_file}?h.log
rm -f $SUBJECTS_DIR/${subject}/log_label2label
for label in `cat ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}R`
do echo "transforming ${label}"
mri_label2label --srcsubject fsaverage --srclabel ${output_dir}/label_new/${label} --trgsubject ${subject} --trglabel ${output_dir}/label_new/${label}.label --regmethod surface --hemi rh >> ${output_dir}/log_label2label
done
for label in `cat ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}L`
do echo "transforming ${label}"
mri_label2label --srcsubject fsaverage --srclabel ${output_dir}/label_new/${label} --trgsubject ${subject} --trglabel ${output_dir}/label_new/${label}.label --regmethod surface --hemi lh >> ${output_dir}/log_label2label
done
# Convert labels to annot (in subject space)
rm -f ${output_dir}/temp_${first}_${last}_${rand_id}/temp_cat_${annotation_file}_R
rm -f ${output_dir}/temp_${first}_${last}_${rand_id}/temp_cat_${annotation_file}_L
for labelsR in `cat ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}R`
do printf " --l ${output_dir}/label_new/${labelsR}" >> ${output_dir}/temp_${first}_${last}_${rand_id}/temp_cat_${annotation_file}_R
done
for labelsL in `cat ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}L`
do if [ -e ${output_dir}/label_new/${labelsL} ]
then printf " --l ${output_dir}/label_new/${labelsL}" >> ${output_dir}/temp_${first}_${last}_${rand_id}/temp_cat_${annotation_file}_L
fi
done
mris_label2annot --s ${subject} --h rh `cat ${output_dir}/temp_${first}_${last}_${rand_id}/temp_cat_${annotation_file}_R` --a ${subject}_${annotation_file} --ctab ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_R >> ${output_dir}/label2annot_${annotation_file}rh.log
mris_label2annot --s ${subject} --h lh `cat ${output_dir}/temp_${first}_${last}_${rand_id}/temp_cat_${annotation_file}_L` --a ${subject}_${annotation_file} --ctab ${output_dir}/temp_${first}_${last}_${rand_id}/colortab_${annotation_file}_L >> ${output_dir}/label2annot_${annotation_file}lh.log
fi
echo "Done mris_label2annot, past line 221"
# Convert annot to volume
rm -f ${output_dir}/log_aparc2aseg
mri_aparc2aseg --s ${subject} --o ${output_dir}/temp_${first}_${last}_${rand_id}/${annotation_file}.nii.gz --annot ${subject}_${annotation_file} >> ${output_dir}/log_aparc2aseg
# Remove hippocampal 'residue' --> voxels assigned to hippocampus in the HCPMMP1.0 parcellation will be very few, corresponding to vertices around the actual structure. These will be given the same voxel values as the hippocampi (as defined by the FS automatic segmentation): 17 (L) and 53 (R)
### AEF, 22.11.2021: I think I can ignore this because I will delete the subcortical measures anyway
l_hipp_index=`grep 'L_H_ROI.label' ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_${annotation_file}.txt | cut -c-4`
r_hipp_index=`grep 'R_H_ROI.label' ${output_dir}/temp_${first}_${last}_${rand_id}/LUT_${annotation_file}.txt | cut -c-4`
fslmaths ${output_dir}/temp_${first}_${last}_${rand_id}/${annotation_file}.nii.gz -thr $l_hipp_index -uthr $l_hipp_index ${output_dir}/temp_${first}_${last}_${rand_id}/l_hipp_HCP
fslmaths ${output_dir}/temp_${first}_${last}_${rand_id}/l_hipp_HCP -bin -mul 17 ${output_dir}/temp_${first}_${last}_${rand_id}/l_hipp_FS
fslmaths ${output_dir}/temp_${first}_${last}_${rand_id}/${annotation_file}.nii.gz -thr $r_hipp_index -uthr $r_hipp_index -add ${output_dir}/temp_${first}_${last}_${rand_id}/l_hipp_HCP ${output_dir}/temp_${first}_${last}_${rand_id}/l_r_hipp_HCP
fslmaths ${output_dir}/temp_${first}_${last}_${rand_id}/${annotation_file}.nii.gz -thr $r_hipp_index -uthr $r_hipp_index -bin -mul 53 -add ${output_dir}/temp_${first}_${last}_${rand_id}/l_hipp_FS ${output_dir}/temp_${first}_${last}_${rand_id}/l_r_hipp_FS
fslmaths ${output_dir}/temp_${first}_${last}_${rand_id}/${annotation_file}.nii.gz -sub ${output_dir}/temp_${first}_${last}_${rand_id}/l_r_hipp_HCP -add ${output_dir}/temp_${first}_${last}_${rand_id}/l_r_hipp_FS ${output_dir}/${annotation_file}.nii.gz
# Create individual mask files
if [[ ${create_individual_masks} == "YES" ]]
then
printf ">> Creating individual region masks for subject ${subject}\n"
mkdir -p ${output_dir}/masks
for ((i=1;i<=${number_labels_L};i+=1))
do num=`echo "${i}+1000" | bc`
if [[ $num != $l_hipp_index ]]
then
temp_region=`sed -n "$i,$i p" ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}L1`
echo "Mask left hemisphere: $i ${temp_region}"
fslmaths ${output_dir}/${annotation_file}.nii.gz -thr ${num} -uthr ${num} -bin ${output_dir}/masks/${temp_region}
else
echo ">> Skipping left hippocampus"
fi
done
for ((i=1;i<=${number_labels_R};i+=1))
do num=`echo "${i}+2000" | bc`
if [[ $num != $r_hipp_index ]]
then
temp_region=`sed -n "$i,$i p" ${output_dir}/temp_${first}_${last}_${rand_id}/list_labels_${annotation_file}R1`
echo "Mask right hemisphere: $i ${temp_region}"
fslmaths ${output_dir}/${annotation_file}.nii.gz -thr ${num} -uthr ${num} -bin ${output_dir}/masks/${temp_region}
else
echo ">> Skipping right hippocampus"
fi
done
fi
# Create individual subcortical masks
if [[ ${create_aseg_files} == "YES" ]]
then
mkdir -p ${output_dir}/aseg_masks
printf ">> Creating subcortical aseg masks for subject ${subject}\n"
if [[ -e ${output_dir}/aseg_masks/list_aseg ]]; then rm ${output_dir}/aseg_masks/list_aseg; fi
for side in Left Right
do printf "$side-Thalamus-Proper\n$side-Caudate\n$side-Pallidum\n$side-Hippocampus\n$side-Amygdala\n$side-Accumbens-area\n" >> ${output_dir}/aseg_masks/list_aseg
done
for rois in `cat ${output_dir}/aseg_masks/list_aseg`
do roi_index=`grep "${rois} " FreeSurferColorLUT.txt | cut -c-2` # the space after ${rois} is not casual
fslmaths ${output_dir}/${annotation_file}.nii.gz -thr ${roi_index} -uthr ${roi_index} -bin ${output_dir}/aseg_masks/${rois}
done
fi
# Get anatomical stats table
if [[ $get_anatomical_stats == "YES" ]]
then
echo "Starting anatomical stats"
mkdir -p ${output_dir}/tables
mris_anatomical_stats -a $SUBJECTS_DIR/${subject}/label/lh.${subject}_${annotation_file}.annot -b ${subject} lh > ${output_dir}/temp_${first}_${last}_${rand_id}/lh.${annotation_file}.${subject}.txt
sed '/_H_ROI/d; /???/d' ${output_dir}/temp_${first}_${last}_${rand_id}/lh.${annotation_file}.${subject}.txt > ${output_dir}/tables/lh.${annotation_file}.${subject}.txt
mris_anatomical_stats -a $SUBJECTS_DIR/${subject}/label/rh.${subject}_${annotation_file}.annot -b ${subject} rh > ${output_dir}/temp_${first}_${last}_${rand_id}/rh.${annotation_file}.${subject}.txt
sed '/_H_ROI/d; /???/d' ${output_dir}/temp_${first}_${last}_${rand_id}/rh.${annotation_file}.${subject}.txt > ${output_dir}/tables/rh.${annotation_file}.${subject}.txt
# Get tables with numerical values only
grep -n 'structure' ${output_dir}/tables/lh.${annotation_file}.${subject}.txt > ${output_dir}/temp_${first}_${last}_${rand_id}/temp_line_structure_name
grep -Eo '[0-9]{1,4}' ${output_dir}/temp_${first}_${last}_${rand_id}/temp_line_structure_name > ${output_dir}/temp_${first}_${last}_${rand_id}/temp_line_structure_name2
line_structure_name=`cat ${output_dir}/temp_${first}_${last}_${rand_id}/temp_line_structure_name2`
post_end_line=`echo "1+${line_structure_name}" | bc`
sed "1,${post_end_line}d" ${output_dir}/tables/rh.${annotation_file}.${subject}.txt > ${output_dir}/tables/table_rh_values_${annotation_file}.${subject}
sed "1,${post_end_line}d" ${output_dir}/tables/lh.${annotation_file}.${subject}.txt > ${output_dir}/tables/table_lh_values_${annotation_file}.${subject}
sed -i -r 's/\S+//10' ${output_dir}/tables/table_lh_values_${annotation_file}.${subject}
sed -i -r 's/\S+//10' ${output_dir}/tables/table_rh_values_${annotation_file}.${subject}
# Get variable names
grep -n 'number of vertices' ${output_dir}/tables/lh.${annotation_file}.${subject}.txt > ${output_dir}/temp_${first}_${last}_${rand_id}/temp_number_vert
grep -Eo '[0-9]{1,4}' ${output_dir}/temp_${first}_${last}_${rand_id}/temp_number_vert > ${output_dir}/temp_${first}_${last}_${rand_id}/temp_number_vert2
line_number_vert=`cat ${output_dir}/temp_${first}_${last}_${rand_id}/temp_number_vert2`
pre_number_vert=`echo "${line_number_vert}-1" | bc`
number_lines=`wc -l < ${output_dir}/tables/rh.${annotation_file}.${subject}.txt`
sed "1,${pre_number_vert}d;${post_end_line},${number_lines}d" ${output_dir}/tables/rh.${annotation_file}.${subject}.txt > ${output_dir}/temp_${first}_${last}_${rand_id}/rh_mri_anatomical_stats_variables.txt
cut -c5- ${output_dir}/temp_${first}_${last}_${rand_id}/rh_mri_anatomical_stats_variables.txt > ${output_dir}/temp_${first}_${last}_${rand_id}/rh_mri_anatomical_stats_variables2.txt
sed -i 's/ /_/g' ${output_dir}/temp_${first}_${last}_${rand_id}/rh_mri_anatomical_stats_variables2.txt
awk '{ for (f = 1; f <= NF; f++) a[NR, f] = $f } NF > nf { nf = NF } END { for (f = 1; f <= nf; f++) for (r = 1; r <= NR; r++) printf a[r, f] (r==NR ? RS : FS) }' ${output_dir}/temp_${first}_${last}_${rand_id}/rh_mri_anatomical_stats_variables2.txt > ${output_dir}/tables/rh_mri_anatomical_stats_variables
sed "1,${pre_number_vert}d;${post_end_line},${number_lines}d" ${output_dir}/tables/lh.${annotation_file}.${subject}.txt > ${output_dir}/temp_${first}_${last}_${rand_id}/lh_mri_anatomical_stats_variables.txt
cut -c5- ${output_dir}/temp_${first}_${last}_${rand_id}/lh_mri_anatomical_stats_variables.txt > ${output_dir}/temp_${first}_${last}_${rand_id}/lh_mri_anatomical_stats_variables2.txt
sed -i 's/ /_/g' ${output_dir}/temp_${first}_${last}_${rand_id}/lh_mri_anatomical_stats_variables2.txt
awk '{ for (f = 1; f <= NF; f++) a[NR, f] = $f } NF > nf { nf = NF } END { for (f = 1; f <= nf; f++) for (r = 1; r <= NR; r++) printf a[r, f] (r==NR ? RS : FS) }' ${output_dir}/temp_${first}_${last}_${rand_id}/lh_mri_anatomical_stats_variables2.txt > ${output_dir}/tables/lh_mri_anatomical_stats_variables
fi
if [[ ${colorlut_miss} == "YES" ]]; then printf "\n >>>> ERROR: FreeSurferColorLUT.txt file not found. Individual subcortical masks NOT created\n"; fi
printf "\n >>>> ${subject} STARTED AT `cat ${output_dir}/temp_${first}_${last}_${rand_id}/start_date`, ENDED AT: $(date)\n\n"
echo "Done mris_anatomical_stats"
done
rm -r ${output_dir}/temp_${first}_${last}_${rand_id}
#rm ${subject_list}
echo "Done whole script"
The Barcelona pipeline as displayed here is mostly identical to the one provided by the CJNeuroLab. I only changed a few lines, as it did not instantly work on my Cluster as downloaded. Make sure all the directories and links are correct and are set up on your Cluster.
I ran into an issue specific to the Julich Brain atlas (version 2.9): There are inconsistencies in the way that regions are labeled. Briefly explained, the create_subj_volume_parcellation.sh creates label files (e.g., /label/lh.AreaFG3 (FusG).label) with spaces, which is how the labels are stored in the Julich Brain annotation files. However, later in the pipeline, the annotation file expects the labels to be printed without spaces (.i.e., /label/lh.AreaFG3(FusG).label). Therefore, I added an if statement to the Barcelona pipeline to get rid of the spaces.
# line 128
if [ $annotation_file = "JulichBrain" ]
then
cd ${output_dir}/label_new
for f in *\ *; do mv "$f" "${f// /}"; done
fi
However, there is one region in the JulichBrain atlas that cannot be rescued with this fix: Area TPJ (STG/SMG). Linux and FreeSurfer cannot deal with a backslash in the directory name which is why this region is entirely missing in the final mapping of the atlas onto the individuals brain. I have contacted the developers of the annotation files to ask if they either knew of a tool that allowed me to edit annotation files, and output them in their binary format, or if they could provide a version of the annotation file without the backslash in the label (something like Area TPJ (STG_SMG)). They agreed it was not best practice to name their regions with backslashes, but also said there was nothing they could do for me. I decided to proceed by excluding this one region. As I am assessing atlas performances as they are implemented in FS, I think this remains a genuine representation of atlas performance as provided by the developers.
With the JulichBrain processing so far, we end up with 137 in right hemisphere, 136 in left hemisphere, after we deleting unlabelled regions. This is one ROI less than pre-registered.
As indicated above, I test the Barcelona pipeline by processing Desikan-Killiany and comparing the output with the Desikan-Killiany anatomical stats as they are provided by UKB. To get a sense of the accuracy, I have processed all subjects with the lh.aparc.annot files (re-named it to lh.test.aparc.annot so that files stats files don’t overwrite each other).
Refer to this site for a comparison between the original UKB DK estimates, and estimates we get from processing data with .annot files. We used the original DK estimates in any subsequent analysis.
Some of the individual’s Schaefer output have glitches. I realised that if a participants size of the “Background+FreeSurfer_Defined_Medial_Wall” crosses a certain threshold, the output does not generate a space between the label (i.e. Background+FreeSurfer_Defined_Medial_Wall) and the first value. Consequently, my formatting scripts will not be able to read the Schaefer atlas files into R because there is an inconsistency of column names and actual columns. I added the following piece of code that checks whether that has happened, and fixes it.
###################################################################################################
## LAZY FIX: Schaefer atlas produces some weird formatting in some participants
## as a result of this, there is the wrong number of columns in the stats files, and Rscript fails
## the code below specifically targets this issue in the Schaefer atlas
## (issue = missing space between "Background+FreeSurfer_Defined_Medial_Wall" label and NumVert)
## I only test for missing space between label and 1 & 2 because it's easier
## Most participants don't have enough NumVert in this area to be an issue, only some do
## It's unlikely that a participant would have double NumVert to the ones who already have largest NumVert
## (fewer vertices wouldn't create this problem)
###################################################################################################
for hemisphere in lh rh
do
if grep -q "Background+FreeSurfer_Defined_Medial_Wall1" ${wd}/${subjID}/stats/${hemisphere}.schaefer500-yeo17.stats
then
echo "Schaefer atlas had this weird error that rarely occurs (${hemisphere})"
sed -i -e 's/Background.FreeSurfer_Defined_Medial_Wall/Background.FreeSurfer_Defined_Medial_Wall /g' ${wd}/${subjID}/stats/${hemisphere}.schaefer500-yeo17.stats
else
echo "Schaefer atlas behaved in ${hemisphere} (1 of 2)"
fi
if grep -q "Background+FreeSurfer_Defined_Medial_Wall2" ${wd}/${subjID}/stats/${hemisphere}.schaefer500-yeo17.stats
then
echo "Schaefer atlas had this weird error that rarely occurs (${hemisphere})"
sed -i -e 's/Background.FreeSurfer_Defined_Medial_Wall/Background.FreeSurfer_Defined_Medial_Wall /g' ${wd}/${subjID}/stats/${hemisphere}.schaefer500-yeo17.stats
else
echo "Schaefer atlas behaved in ${hemisphere} (2 of 2)"
fi
done
########################################################################################################################
##### extract vertex-wise tables
##### convert .mgh format into .asc
##### and extract vertices
########################################################################################################################
for hemisphere in lh rh
do
for meas in area thickness volume
do
echo ${hemisphere}.${meas}
${fsdir}/bin/mris_convert -c ${wd}/${subjID}/surf/${hemisphere}.${meas}.fwhm0.fsaverage.mgh ${wd}/fsaverage/surf/${hemisphere}.orig ${wd}/${subjID}/tmp/${hemisphere}.${meas}.fwhm0.fsaverage.asc
touch ${wd}/${subjID}/tmp/${hemisphere}.${meas}.fwhm0.UKB.txt
awk '{print $1}' ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".fwhm0.fsaverage.asc >> ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".fwhm0.UKB.txt
awk '{print $5}' ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".fwhm0.fsaverage.asc >> ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".temp.lta
paste ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".fwhm0.UKB.txt ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".temp.lta > ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".temp2.lta
cp ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".temp2.lta ${od}/indiv_vertex_wise_output/"${subjID}"."${hemisphere}"."${meas}".fwhm0.UKB.txt
done
done
The vertex coordinates should be the same across all participants, because they are mapped into fsaverage standard space. For a sanity check, I compared the vertex coordinates in my subjects with the ones used by Baptiste and they are identical.
awk '{print $1 " " $2 " " $3 " " $4}' lh.area.fwhm0.fsaverage.asc > lh.fsaverage_vertex_coordinates
mv lh.fsaverage_vertex_coordinates /mnt/lustre/datasets/ukbiobank/ukb40933/imaging/output/lh.fsaverage_vertex_coordinates
# 163842 vertices
awk '{print $1 " " $2 " " $3 " " $4}' rh.area.fwhm0.fsaverage.asc > rh.fsaverage_vertex_coordinates
mv rh.fsaverage_vertex_coordinates /mnt/lustre/datasets/ukbiobank/ukb40933/imaging/output/rh.fsaverage_vertex_coordinates
# 163842 vertices
In addition to the individual bits of the script explained above, the final script also includes some sanity checks (e.g., whether download was successful), and it keeps a record of all successful and failed processes. This is because with 40.000 participants, it would be too much to double check every participant manually.
#!/bin/bash -l
#SBATCH --job-name=FS_processing
#SBATCH --output=~/datasets/ukbiobank/ukb40933/imaging/output/bash_output/FS_processing_%A_%a.out
#SBATCH -D ~/datasets/ukbiobank/ukb40933/imaging/FS_processing
#SBATCH --time=7-00:00
#SBATCH --mem-per-cpu=2G
#SBATCH --array=1-400000%300
start_date=$(date)
start=$(date +%s)
### setup dependencies
module load utilities/use.dev
export FREESURFER_HOME=/users/anna/freesurfer-6.0.0
source $FREESURFER_HOME/SetUpFreeSurfer.sh
FSLDIR=/opt/apps/apps/fsl/6.0.5
. ${FSLDIR}/etc/fslconf/fsl.sh
PATH=${FSLDIR}/bin:${PATH}
export FSLDIR PATH
# save directories
download_dir="~/datasets/ukbiobank/ukb40933/imaging/download/"
fsdir="~/freesurfer-6.0.0" # path to your FreeSurfer binary files
wd="~/datasets/ukbiobank/ukb40933/imaging/FS_processing" # Path to your freesurfer folders
atlasFolder="~/datasets/ukbiobank/ukb40933/imaging/atlas_data" # Path to your atlas data downloaded above
od="~/datasets/ukbiobank/ukb40933/imaging/output" # Output directory
export SUBJECTS_DIR=${wd}
# determine subjList from which we will take subjNUM and subjID
cd ${download_dir}
subjList=${download_dir}ukb46867_altered1.bulk
### ${download_dir}only_IDs.bulk must exist and be named appropriately
### will change with every re-submission
subjNUM=$(awk "FNR==$SLURM_ARRAY_TASK_ID" ${download_dir}resubmitIDs_10122021.bulk)
#### The following part of the script will run through each subjNUM and process every participant individually; using arrays it will iterate through all subjects saved in $subjList
echo $subjNUM
echo "output in ${od}/bash_output/FS_processing_"$SLURM_ARRAY_TASK_ID".out"
############################################
### download files
############################################
## grep line in file corresponding to subjNUM
echo `grep "$subjNUM" $subjList` > ${download_dir}temp_downloadfile${subjNUM}
cd ${download_dir}
/scratch/datasets/ukbiobank/bin/ukbfetch -btemp_downloadfile${subjNUM} -ak40933r46867.key -osuccessfetches.out -m1
# unzip file
unzip ${download_dir}${subjNUM}_20263_2_0.zip -d ${download_dir}sub_${subjNUM}
#All
# remove downloaded directory
rm ${download_dir}${subjNUM}_20263_2_0.zip
# move sub_ to another folder
mv ${download_dir}sub_${subjNUM}/FreeSurfer ${wd}/sub_${subjNUM}
# remove temp_downloadfile and left-over directory
rm ${download_dir}temp_downloadfile${subjNUM}
rmdir ${download_dir}"sub_"${subjNUM}
subjID="sub_"${subjNUM}
echo "Done downloading" $subjID
############################################
### Check if folder has been downloaded
### And if it's approx the right size
############################################
# save subjectName in error if directory doesn't exist and exit
### touch ${od}/error/download_failed
### touch ${od}/error/download_success
# extract size, but this argument also writes directory, keep first 10 characters in case the directory is a lot bigger, then extract numbers only/ in test example this is 300M
SIZE=$(du -hs ${wd}/${subjID} | sed -e 's/^\(.\{5\}\).*/\1/' | sed 's/[^0-9]*//g')
echo "Downloaded folder size is $SIZE"
# -z string comparison operator used to check if a string is empty or not (line 84 will run but produce and empty SIZE object)
if [ ! -d "${wd}/sub_${subjNUM}" ] && [ $SIZE < 200 ] || [ -z $SIZE ]
then
echo "directory $subjID missing or too small, this array will now exit"
echo $subjID >> ${od}/error/download_failed
exit 1
else
echo "directory $subjID exists and is larger than 200MB"
echo $subjID >> ${od}/error/download_success
fi
############################################
### Process with qcache
############################################
cd ${wd}
# 1) run recon-all pipeline for extra processing AND to obtain vertex-wise files
${fsdir}/bin/recon-all -sd ${wd} -s ${subjID} -qcache
echo "Done qcache"
echo $subjID
####################################################################################
### Process mris_ca_label script to obtain atlases from gcs files
### illustration of what it does here: https://surfer.nmr.mgh.harvard.edu/fswiki/SurfaceLabelAtlas
####################################################################################
for hemisphere in lh rh
do
for atlasName in schaefer500-yeo17 hcp-mmp-b gordon333dil
do
echo $hemisphere
echo $atlasName
# 2) create annot file for each subject
${fsdir}/bin/mris_ca_label -sdir ${wd} ${subjID} $hemisphere ${wd}/${subjID}/surf/$hemisphere.sphere.reg ${atlasFolder}/$atlasName/$hemisphere.${atlasName}_6p0.gcs ${wd}/${subjID}/${hemisphere}.${atlasName}.${subjID}.annot
# 3) create stat file
${fsdir}/bin/mris_anatomical_stats -a ${wd}/${subjID}/${hemisphere}.${atlasName}.${subjID}.annot -f ${wd}/${subjID}/stats/$hemisphere.$atlasName.stats -b ${subjID} $hemisphere
done
done
echo "Done schaefer500-yeo17 hcp-mmp-b gordon333dil"
###################################################################################################
## LAZY FIX: Schaefer atlas produces some weird formatting in some participants
## as a result of this, there is the wrong number of columns in the stats files, and Rscript fails
## the code below specifically targets this issue in the Schaefer atlas
## (issue = missing space between "Background+FreeSurfer_Defined_Medial_Wall" label and NumVert)
## I only test for missing space between label and 1 & 2 because it's easier
## Most participants don't have enough NumVert in this area to be an issue, only some do
## It's unlikely that a participant would have double NumVert to the ones who already have largest NumVert
## (fewer vertices wouldn't create this problem)
###################################################################################################
for hemisphere in lh rh
do
if grep -q "Background+FreeSurfer_Defined_Medial_Wall1" ${wd}/${subjID}/stats/${hemisphere}.schaefer500-yeo17.stats
then
echo "Schaefer atlas had this weird error that rarely occurs (${hemisphere})"
sed -i -e 's/Background.FreeSurfer_Defined_Medial_Wall/Background.FreeSurfer_Defined_Medial_Wall /g' ${wd}/${subjID}/stats/${hemisphere}.schaefer500-yeo17.stats
else
echo "Schaefer atlas behaved in ${hemisphere} (1 of 2)"
fi
if grep -q "Background+FreeSurfer_Defined_Medial_Wall2" ${wd}/${subjID}/stats/${hemisphere}.schaefer500-yeo17.stats
then
echo "Schaefer atlas had this weird error that rarely occurs (${hemisphere})"
sed -i -e 's/Background.FreeSurfer_Defined_Medial_Wall/Background.FreeSurfer_Defined_Medial_Wall /g' ${wd}/${subjID}/stats/${hemisphere}.schaefer500-yeo17.stats
else
echo "Schaefer atlas behaved in ${hemisphere} (2 of 2)"
fi
done
#######################################################################################################################
### Process FS stats to get tables in OSCA format for atlases already provided by UKB or processed with mris_ca_label
#######################################################################################################################
module load apps/R/3.6.0
Rscript ~/datasets/ukbiobank/ukb40933/imaging/scripts/save_FS_stats_in_OSCA_format.R ${wd} ${subjID} ${od}/indiv_output_OSCA
echo "Saved UKB atlases in OSCA format for "$subjID
####################################################################################################
### Run Barcelona pipeline for Yeo and Julich atlases
### https://cjneurolab.org/2016/11/22/hcp-mmp1-0-volumetric-nifti-masks-in-native-structural-space/
####################################################################################################
# create file containing subjectname as input to pipeline
cd ${wd}
echo $subjID > ${wd}/target_${subjID}.txt
# start loop to process Julich and Yeo
for atlasName in JulichBrain Yeo2011_17Networks test.aparc
do
echo ${atlasName}
bash create_subj_volume_parcellation.sh -L ${wd}/target_${subjID}.txt -f 1 -l 1 -a ${atlasName} -d ${wd}/${subjID} -s NO -m NO -t YES
# move into stats folder and rename
mv ${wd}/${subjID}/tables/lh.${atlasName}.${subjID}.txt ${wd}/${subjID}/stats/lh.${atlasName}.stats
mv ${wd}/${subjID}/tables/rh.${atlasName}.${subjID}.txt ${wd}/${subjID}/stats/rh.${atlasName}.stats
done
# remove subject_list created above
rm ${wd}/target_${subjID}.txt
echo "Done Julich-Brain & Yeo" $subjID
#######################################################################################
### Process FS stats to get tables in OSCA format for Yeo and Julich atlases
#######################################################################################
module load apps/R/3.6.0
Rscript ~/datasets/ukbiobank/ukb40933/imaging/scripts/save_FS_stats_in_OSCA_format_Yeo_Julich.R ${wd} ${subjID} ${od}/indiv_output_OSCA
echo "Saved Julich-Brain & Yeo UKB atlases in OSCA format for "$subjID
############################################################
### Check if expected output files exist and are not empty
############################################################
# touch ${od}/error/results_present
# touch ${od}/error/results_missing
for atlasID in Yeo JulichBrain Gordon Schaefer Glasser DK Des
do
for measurement in ThickAvg SurfArea GrayVol
do
if [ -f ${od}/indiv_output_OSCA/${subjID}_${atlasID}_${measurement} ] && [ -s ${od}/indiv_output_OSCA/${subjID}_${atlasID}_${measurement} ]
then
echo "${subjID}_${atlasID}_${measurement} exists and is not zero, AWESOME"
echo ${subjID} ${atlasID} ${measurement} >> ${od}/error/results_present
else
echo "${subjID}_${atlasID}_${measurement} either doesn't exist or is zero - OH DEAR"
echo ${subjID} ${atlasID} ${measurement} >> ${od}/error/results_missing
fi
done
done
########################################################################################################################
##### extract vertex-wise tables
##### convert .mgh format into .asc
##### and extract vertices
########################################################################################################################
for hemisphere in lh rh
do
for meas in area thickness volume
do
echo ${hemisphere}.${meas}
${fsdir}/bin/mris_convert -c ${wd}/${subjID}/surf/${hemisphere}.${meas}.fwhm0.fsaverage.mgh ${wd}/fsaverage/surf/${hemisphere}.orig ${wd}/${subjID}/tmp/${hemisphere}.${meas}.fwhm0.fsaverage.asc
touch ${wd}/${subjID}/tmp/${hemisphere}.${meas}.fwhm0.UKB.txt
awk '{print $1}' ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".fwhm0.fsaverage.asc >> ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".fwhm0.UKB.txt
awk '{print $5}' ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".fwhm0.fsaverage.asc >> ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".temp.lta
paste ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".fwhm0.UKB.txt ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".temp.lta > ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".temp2.lta
cp ${wd}/${subjID}/tmp/"${hemisphere}"."${meas}".temp2.lta ${od}/indiv_vertex_wise_output/"${subjID}"."${hemisphere}"."${meas}".fwhm0.UKB.txt
done
done
################################################################################################################
## Check if all vertex-wise tables are there
###############################################################################################################
for meas in area thickness volume
do
for hemisphere in lh rh
do
if [ -f ${od}/indiv_vertex_wise_output/${subjID}.${hemisphere}.${meas}.fwhm0.UKB.txt ] && [ -s ${od}/indiv_vertex_wise_output/${subjID}.${hemisphere}.${meas}.fwhm0.UKB.txt ]
then
echo "${subjID}.${hemisphere}.${meas}.fwhm0.UKB.txt exists and is not zero, AWESOME"
echo ${subjID} vertices ${hemisphere}.${meas} >> ${od}/error/results_present
else
echo "${subjID}.${hemisphere}.${meas}.fwhm0.UKB.txt either doesn't exist or is zero - OH DEAR"
echo ${subjID} vertices ${hemisphere}.${meas} >> ${od}/error/results_missing
fi
done
done
# remove FS directory
rm -rf ${wd}/sub_${subjNUM}
###############################################################################################################
### Record how long it took
###############################################################################################################
end=$(date +%s)
end_date=$(date)
seconds=$(($end-$start))
echo "Elapsed Time: " `expr $seconds % 60` " minutes"
echo $subjID "successfully started" $start_date "and finished" $end_date
# get info on computational requirements
/usr/bin/sacct
/usr/bin/sacct --format="CPUTime,MaxRSS"
######### Here I download, process and save the data we need for the morphometricity project
######### sbatch -p brc,shared ~/datasets/ukbiobank/ukb40933/imaging/scripts/Serious_pipeline09122021.sh
For some participants, the pipeline stopped running and most cases were picked up by the pipeline and saved in ${od}/error/download_failed
. To double check we’ve captured all kills, the following script reads the failed output files created by the pipeline and finds which participants have not been processed. We then print a table with the missing IDs, and can use this table as input to the pipeline to re-run these participants.
### here we identify participants whos processing has prematurely stopped
setwd("/mnt/lustre/datasets/ukbiobank/ukb40933/imaging/output/bash_output")
bash_output <- list.files(pattern="FS_")
# only keep the ones that are too small
bash_output <- bash_output[which(file.size(bash_output) < 700000)]
print(length(bash_output))
# extract sub_ with
# https://statisticsglobe.com/extract-substring-before-or-after-pattern-in-r
missing_IDs <- NULL
for(i in bash_output){
lines <- readLines(i, n=30)
# identify line that contains "sub_"
keep <- grep("sub_", lines)[1]
# keep the line that contains subj ID
lines <- lines[keep]
# remove strong before "sub_"
subjID <- gsub(".*sub_","",lines)
#print(lines)
subjID <- regmatches(subjID,regexpr("[0-9]+",subjID))
# save subjID in an object
missing_IDs <- append(missing_IDs, subjID)
}
# only keep the ones that are unique
missing_IDs <- as.data.frame(unique(missing_IDs))
print(nrow(missing_IDs))
write.table(missing_IDs, "~/ukbiobank/ukb40933/imaging/output/error/processing_suddenly_stopped_26122021",quote=F, col.names=F, row.names=F, sep = "\t")
This pipeline creates individual-level output data. We now need to merge atlas and vertex-wise data. We merge atlases and vertices individually because they have different computational requirements, and we also check that the vertex-wise files have the correct number of vertices.
###### merge all individual-level atlases
setwd("~/morphometricity/data/atlases")
library(data.table)
atlases <- c("DK","Des","Glasser","Gordon","Schaefer","JulichBrain","Yeo","test.aparc")
mod <- c("GrayVol","SurfArea","ThickAvg")
for(i in mod){
for(j in atlases){
print(paste0("Starting ",j,"_",i))
#make new object to store data
output<-NULL
# list files of same modality
#target <- list.files(pattern=paste0(j,"_",i))
target <- append(list.files(pattern=paste0(j,"_",i,"_all")),list.files(pattern=paste0("remains_",j,"_",i)))
# test that there are 4
#if(length(target) != 4){print(paste0("There are not enough files for ", j,"_",i,"; only ", length(target))); next}
for(k in target){
# read in one file
newfile <- fread(k, header=T)
print(paste0("File ",k," has ", nrow(newfile)," participants included and has ", (ncol(newfile)-1), " ROIs"))
# merge with output file
output <- rbind(output, newfile)
}
# get rid of duplicated participants
output <- output[-which(duplicated(output$IID)),]
# how many participants are included?
numsubj <- (nrow(output)-1)
fwrite(output, file=paste0(j,"_",i,"_all_",numsubj), na=NA, quote=FALSE, col.names=T, row.names=F, sep = "\t")
print(paste0("There are ",sum(duplicated(output$IID))," duplicated IIDs."))
}
}
print("Done all atlases")
##### merge all big vertex files
setwd("~/morphometricity/data/vertices")
library(data.table)
# save all mods in vector
mods <- c("lh.volume","rh.volume","lh.area","rh.area","lh.thickness","rh.thickness")
##### loop through all mods
for(i in 1:length(mods)){
output<-NULL
# list files that hold mods
files <- list.files(pattern = mods[i])
# test if that indexes 4 files
if(length(list.files(pattern = mods[i])) != 4){print(paste0("There are ", length(list.files(pattern = mods[i])), " files in ", mods[i], "; there should be 4")); next}
## loop through files, merge and delete
for(j in files){
# read in file
newfile <- fread(j, header=T)
print(paste0("File ",j," has ", ncol(newfile)," participants included"))
# test that all files have correct number of vertices
if(nrow(newfile)!= 163842){print(paste0("Wrong number of included vertices: ",nrow(newfile), " instead of 163842 (fsaverage)")); next}
# merge with output file
output <- cbind(output, newfile)
# clean up
rm("newfile")
}
# delete duplicated participants
#which(duplicated(names(output)))
# how many participants are included?
numsubj <- (ncol(output)-1)
# save
fwrite(output, file=paste0(mods[i],"_all_vertices_",numsubj), na=NA, quote=FALSE, col.names=T, row.names=F, sep = "\t")
# clean up
rm("output")
Sys.sleep(5)
}
We now have large files that contain all individual-level data in one place. We can go ahead and clean up the data to save storage.
####### delete indiv_output if it's contained in output files by combine_atlases.R
####### Strategy:
### 1. Read in output files as ref
### 2. Navigate to folder with indiv_output
### 3. If subjID contained in ref, delete file: file.remove("some_other_file.csv")
### 4. Check if file really deleted: file.exists("C:/path/to/file/some_file.txt")
### 5. loop this script through each atlas & modality
### 6. submit straight after the combine_atlases script to clean up (and it will only clean if the combine script successfully ran)
combined_dir="~/ukbiobank/ukb40933/imaging/output/atlases_formatted_OSCA"
delete_dir="/mnt/lustre/datasets/ukbiobank/ukb40933/imaging/output/indiv_output_OSCA"
library(data.table)
# list atlases and modality
atlases <- c("DK","Des","Glasser","Gordon","Schaefer","JulichBrain","Yeo","test.aparc")
mod <- c("ThickAvg","SurfArea","GrayVol")
# list output_files
output_files<-list.files()
for(i in atlases){
for(j in mod){
# select file of interest (combined output file)
file_name <- paste0(combined_dir,"/",i,"_",j)
# read in file of interest
file <- fread(as.character(file_name), header=T)
# save subjIDs in var name
IID <- file$IID
rm("file")
#iterate through each subjID
for(k in IID){
# piece together name of files to delete based on waht exists in ref file
to_delete <- paste0("sub_",k, "_",i,"_",j)
# check if this file exists, if TRUE, delete it
if(file.exists(paste0(delete_dir,"/",to_delete))){file.remove(paste0(delete_dir,"/",to_delete))}
}
}
print(paste0("Done ",i))
}
print("Cleaned up all atlases indiv output")
By Anna Elisabeth Fürtjes
anna.furtjes@kcl.ac.uk