brains-on-code
diff --git a/‎.gitignore‎
Lines changed: 7 additions & 0 deletions b/‎.gitignore‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎Readme.md‎
Lines changed: 31 additions & 9 deletions b/‎Readme.md‎
Lines changed: 31 additions & 9 deletions
diff --git a/‎analysis/BehavioralBrain.py‎
Lines changed: 91 additions & 0 deletions b/‎analysis/BehavioralBrain.py‎
Lines changed: 91 additions & 0 deletions
@@ -0,0 +1,7 @@
+# General
+.idea
+/venv
+__pycache__
+
+# Project-specific
+/analysis/output
@@ -1,29 +1,47 @@
 # Program Comprehension and Code Complexity Metrics: An fMRI Study
 
-This repository contains the replication package and additional information on our paper at ICSE2021.
+This repository contains the replication package, analysis scripts, and additional information on our paper accepted at ICSE 2021.
 
-## Replication Package
+Publication: Norman Peitek, Sven Apel, Chris Parnin, André Brechmann, and Janet Siegmund. *Program Comprehension and Code Complexity Metrics: An fMRI Study*. In Proceedings of the International Conference of Software Engineering (ICSE), 2021
+
+# Replication Package
 
 In `/replication`, we provide:
 
-- all used stimuli for the comprehension, control condition, and distractor tasks (`/tasks-*`)
+- all used stimuli for the comprehension, control condition, and distractor tasks (`/tasks-*`) either as image and/or text file. We used our [CodeImageGenerator](https://github.com/peitek/CodeImageGenerator) to create image files for our Java code snippets.
 - experiment protocol for the fMRI session
-- analysis protocol
 - the meta protocol for the post-session interview
 
-## Experiment Data: Pilot Study
+# Data
+
+In `/data`, we share all raw and preprocessed data that we can. Due to our local privacy law, we cannot publicly provide the fMRI data at this time. Please contact us for individual solutions.
+
+# Analysis
+
+In `/analysis`, we share our analysis scripts that process the input data, compute the results, create plots, and run statistics.
+
+To run the script yourself, you will need:
 
-We share our insights from the pilot studies in `/pilot-data`.
+- Python 3.x
+- numpy
+- pandas
+- scipy
+- seaborn
+- statsmodels
 
-Due to our local privacy law, we cannot publicly provide the fMRI data at this time. You can contact us to individually share experiment data (once double-blind is lifted).
+Once your system is ready, start the `/analysis/main.py`.
+
+# Results
+
+For convenience, we provide all output that the analysis script yields for our data in `/output`.
 
 ## Experiment Data: fMRI Correlations
 
-In addition to the selected plots presented in the paper, we added all correlation plots in `/plots`.
+In addition to the selected plots presented in the paper, we added all correlation plots in `/output/plots`.
 
 ## Experiment Data: Additional Metrics
 
-We explored overall 41 additional metrics. In `/Metrics.md`, we provide an description of each metric.
+We explored overall 41 additional metrics. In `/data/metrics/Metrics.md`, we provide a description of each metric.
 
 In the paper, we provide a shortened overview. Here is the full correlation table:
 
@@ -65,3 +83,7 @@ In the paper, we provide a shortened overview. Here is the full correlation tabl
 | Halstead    | 0.485714  | 0.542857  | 0.428571  | 0.600000  | 0.393218 | 0.209351 | 0.213938 | 0.288564 | 0.600000 |
 | Sonar_Cog   | 0.239046  | 0.139443  | 0.039841  | 0.298807  | 0.237528 | 0.057054 | 0.016282 | 0.131811 | 0.298807 |
 | Sonar_Cyclo | 0.102869  | 0.123443  | 0.000000  | 0.205738  | 0.051420 | 0.022982 | 0.000800 | 0.046978 | 0.205738 |
+
+# Contact
+
+If you have questions, please contact me directly: `norman.peitek@lin-magdeburg.de`. Thank you!
@@ -0,0 +1,91 @@
+import numpy
+import pandas as pd
+import matplotlib.pyplot as plt
+
+import BrainActivationAnalysis
+from analysis import BehavioralSubjective
+
+plt.rcParams.update({'font.size': 26})
+import seaborn as sns
+import scipy
+
+# TODO this file is supposed to analyze complexity metrics versus behavioral data
+graph_label = dict(color='#202020', alpha=0.9)
+
+def plot_correlation(df, computeResponseTime, ba, activation=True):
+    if computeResponseTime:
+        variable = 'ResponseTime'
+    else:
+        variable = 'Correct'
+
+    ax1 = df.plot(kind='scatter', x=variable, y=ba, s=50, figsize=(7, 4))
+    z = numpy.polyfit(df[variable], df[ba], 1)
+    p = numpy.poly1d(z)
+
+    plt.plot(df[variable], p(df[variable]), linewidth=1)
+
+    if computeResponseTime:
+        plt.xlabel("Response Time (in sec.)")
+    else:
+        plt.xlabel("Correctness (in %)")
+
+    if activation:
+        plt.ylabel("Activation in %\n" + ba)
+    else:
+        plt.ylabel("Deactivation in %\n" + ba)
+
+    corr = df[ba].corr(df[variable], method='kendall')
+    print('Kendall corr:', corr)
+
+    slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(df[ba], df[variable])
+    print('r squared:', r_value**2)
+
+    left, right = plt.xlim()
+    bottom, top = plt.ylim()
+    ax1.text(left+((right-left)/40), bottom + ((top - bottom) / 7), 'Kendall τ: ' + format(corr, '.2f'), fontdict=graph_label)
+    ax1.text(left+((right-left)/40), bottom + ((top - bottom) / 40), 'r squared: ' + format(r_value**2, '.2f'), fontdict=graph_label)
+
+    sns.despine()
+    plt.tight_layout()
+
+    if activation:
+        pre = 'Activation_'
+    else:
+        pre = 'Deactivation_'
+
+    plt.savefig('output/' + pre + ba + '_' + variable + '.pdf', dpi=300, bbox_inches='tight', pad_inches=0)
+
+
+def compute_behavioral_brain(df_ba_cond, behavioral_data, activation):
+
+    behavioral_ba = pd.merge(df_ba_cond, behavioral_data, how='left', left_on=['participant', 'Snippet'], right_on=['Participant', 'Snippet'])
+
+    # check whether there is response times < 5s and exclude them since they are accidental clicks
+    behavioral_ba["ResponseTime"].fillna(60000, inplace=True)
+    behavioral_ba.loc[behavioral_ba['ResponseTime'] < 5000, 'ResponseTime'] = numpy.nan
+    behavioral_ba = behavioral_ba.dropna(subset=['ResponseTime'])
+    behavioral_ba["ResponseTime"] = behavioral_ba['ResponseTime'].apply(BehavioralSubjective.convert_to_second)
+
+    # compute correct correctness
+    behavioral_ba = behavioral_ba.groupby('Snippet').mean()
+    behavioral_ba["Correct"] = behavioral_ba['Correct'].apply(BehavioralSubjective.convert_to_percent)
+
+    for ba in BrainActivationAnalysis.get_bas(activation):
+        plot_correlation(behavioral_ba, True, ba, activation)
+        plot_correlation(behavioral_ba, False, ba, activation)
+
+
+def main():
+    df_ba_part_cond_act = pd.read_csv('../data/fMRI/fMRI_Analyzed_BA_Snippet_Participant_Activation.csv')
+    df_ba_part_cond_deact = pd.read_csv('../data/fMRI/fMRI_Analyzed_BA_Snippet_Participant_Deactivation.csv')
+
+    behavioral_data = BehavioralSubjective.load_behavioral_data()
+
+    compute_behavioral_brain(df_ba_part_cond_act, behavioral_data, True)
+    compute_behavioral_brain(df_ba_part_cond_deact, behavioral_data, False)
+
+    print('\n##### \n-> all done \o/')
+
+
+if __name__ == "__main__":
+    main()