README.md

# Analysis of Memory task and brain micostructure and their predictors in Gut Brain study

This GitLab repository documents all data manipulation and analysis steps for the evaluation of memory performance and its predictors in the Gut-Brain Study.

* created by thieleking@cbs.mpg.de on 2021-10-19
* preregistration submitted on 2022-01-25 here: https://osf.io/2z4dn
* analysis with R version 4.1.1

### Packages included in the analysis 

* stringr (v. 1.4.1)
* tibble (v. 3.1.8)
* ggplot2 (v. 3.3.6)
* MetBrewer (v. 0.2.0)
* magrittr (v. 2.0.3)
* vioplot (v. 0.3.7)
* brms (v. 2.16.3)
* loo (v. 2.4.1)
* dplyr (v. 1.0.7)
* rstatix (v. 0.7.0)
* ggpubr (v. 0.4.0)
* corrplot (v. 0.92)
* ggExtra (v. 0.10.0)
* sjPlot (v. 2.8.11)
* mice (v. 3.14.0)


### Analysis steps

##### Data preparation

1. import memory task logfiles and code single image performance into "hit", "correct rejection", "false alarm", "miss"

2. import hunger ratings from logfiles for each session (4 ratings = pre- and post-wanting task + pre- and post-memory task) and calculate:
    - overall hunger (mean of 4 hunger ratings)
    - mean hunger rating during wanting task
    - mean hunger rating during memory task

3. import wanting ratings for single images from wanting task (memory encoding) and from post-memory wanting task (rating of new and similar images after memory task)

4. add image categories to single images: food (F) and art/non-food (NF)

5. calculation of d' (as defined by Macmillan and Kaplan(1985) and LDI(adapted from Stark and colleagues(Leal et al., 2014; Granger et al., 2021)
    
    *Table 1: Stimulus response matrix form signal detection theory*
    |               | Response “old” | Response “new”    |
    |---------------|:--------------:|:-----------------:|
    | Targets       |  Hit           | Miss              |
    | Lures & Novels|  False Alarm   | Correct Rejection |

    [1] d’ = z (hit rate) – z (false alarm rate) = z (p(“old” | target)) – z (p(“old” | lure/novel))

    [2] LDI = z (rate of correct rejection of lures) – z (miss rate) = z (p(“new” | lure)) – z (p(“new” | target))

6. correction of d’ and LDI due to the possibility that some subjects might have performed perfectly on the recognition task, i.e. hit/correct rejection rate = 1 and false alarm/miss rate = 0. In these cases, the z-values and hence d’ or LDI cannot be calculated. As suggested by Upton(1979)⁠ and Hautus(1995)⁠, we applied the log-linear rule which makes these extreme rates impossible.

    *Table 2: Stimulus response matrix corrected with log-linear rule*
    |                       | Response “old”      | Response “new”          | Sum     |
    |-----------------------|:-------------------:|:-----------------------:|--------:|
    | Targets (30)          |  Hit + 0.5          | Miss  + 0.5             | 30+1    |
    | Lures & Novels (30+20)|  False Alarm + 0.5  | Correct Rejection + 0.5 | 30+20+1 |

    [1*] d' = z( (sum(hits)+0.5) / sum(targets)+1) ) - z( (sum(false alarms)+0.5) / sum(novels+lures)+1) )

    [2*] LDI = z( (sum(corrrect rejection of lures)+0.5x2/3) / sum(lures)+1x2/3) ) - z( (sum(misses)+0.5) / sum(targets)+1) )

7. add picture set (A, B, C, D), scanning session (1 to 4) to single images

8. calculate wanting categories "unwanted", "neutral" and "wanted" by splitting wanting ratings into tertiles based on the range used by each participant per session respectively

9. remove excluded data sets as specified in the preregistration 

10. add tract statistics of uncinate fasciculus and its sub-bundle per participant per session

11. import well-being from Visual Analogue Scale regarding nausea, anxiety, exhaustion by task, perceived difficulty of task per participant per session

12. import eating behaviour traits, socio-economic status, and personality traits  per participant

13. calculate response accuracy per participant per image

    [3] response_accuracy = 1 if Hit or Correct Rejection
    
    [4] response_accuracy = 0 if False Alarm or Miss

14. import liking ratings per participant per image

15. import body composition measures and fasted ghrelin serum levels per participant per session

16. import image characteristica (arousal, familiarity, recognizability, and valence) per image

17. import attention network performance per participant per session

18. add white matter statistics of whole brain per participant per session

### Statistical analysis

We applied **Bayesian multilevel regression modeling** with the brm()-function from the brms package (version 2.16.3)(R Core Team, 2021)⁠ and the following options: 
    
* family = gaussian for d’/LDI analyses and family = bernoulli(link = "logit") for response accuracy analyses
* more iterations than preregistered (n=4000 instead of n=3000, incl 1000 warm-ups) as Bulk Effective Samples Size (ESS) was too low, indicating posterior means and medians may have been unreliable
* chains = 2, cores = 2
* prior = set_prior(normal(0,10), coef = “b”) for gaussian model fitting; a generic, weakly informative prior for gaussian model fitting
* sample_prior = TRUE to be able to plot prior and posterior distributions
* save_pars = save_pars(all = TRUE) to use the option moment_match = TRUE for leave-one-out cross-validation
* control = list(adapt_delta = 0.995 instead of preregistered 0.99); smaller steps were necessary to avoid divergent transitions after warmup; we accepted up to 10 divergent transitions as we had good R-hat and ESS values and therefore inferences can be assumed as reliable(Stan Development Team, n.d.)⁠

To assess the **predictive accuracy of a Bayesian regression model** (BRM), Vehtari, Gelman, and Gabry proposed the expected log pointwise predictive density (elpd)(Vehtari et al., 2017). The elpd of each model can be estimated by leave-one-out (loo) cross-validation. The higher the predictive accuracy of the model, the higher is its elpd. To calculate elpds, we applied the loo_compare()-function from the loo package (version 2.4.1)(Vehtari et al., 2017)⁠ with options set to:

* moment_match = TRUE → to apply an implicit adaptive importance sampling method(Paananen et al., 2021)
* reloo = TRUE → refitting of the model in case some Pareto k diagnostic values were too high

We first tested for **random effects** present in the data and then fitted full and null models to detect probable **fixed effects**. To test which random effects are present in the data, we compared the **full model** (van Doorn et al., 2021)⁠ (fixed effects, random intercepts and random slopes) to the “less-random-effects” models. The “less-random-effects” models were defined as full models with step-by-step dropping of random slopes, constant fixed effects and constant random intercepts. Then, we defined **null models** (van Doorn et al., 2021) through step-by-step dropping of fixed effects of interest with the previously determined random intercepts and random slopes. 

To find out **which random and fixed effects are most probably present in the data**, the BRMs were compared regarding their predicitive accuracy (elpd). The difference in elpd elpd_diff and its standard deviation elpd_sd served as “quality of fit” measures. The model with the highest predictive accuracy becomes the reference model so that, based on elpd_diff and elpd_sd, we can make inferences about which random and fixed effects are most and less probably present in the data. 

Due to the high computational requirements of Bayesian multivel regression model fitting, we decided to only test for random effects in the models of the main hypotheses. We included all random effects of the variables of interest in the exploratory analyses in order to save energy. Thereby, we lowered the CO2 emissions of our analysis caused by the long run-time and high computational requirements(Li et al., 2015)⁠ but still accounted for possible random effects. In addition, we learned from the main analysis that most random effects, we tested for, were present in the data which supported our approach for the random effects in the exploratory analyses.


### Code Review
... was conducted by Tilman Stephani and finished on 9th of September 2022. His comments were implemented and replied to in the provided txt file. Everything could be solved except for the imputation of the missing values. Will be done as soon as possible.