Skip to content

Commit 3488048

Browse files
committed
prepare for artifact evaluation, add raw data, analysis scripts
1 parent a5e711e commit 3488048

File tree

150 files changed

+2432
-107
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

150 files changed

+2432
-107
lines changed

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# General
2+
.idea
3+
/venv
4+
__pycache__
5+
6+
# Project-specific
7+
/analysis/output

Readme.md

Lines changed: 31 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,47 @@
11
# Program Comprehension and Code Complexity Metrics: An fMRI Study
22

3-
This repository contains the replication package and additional information on our paper at ICSE2021.
3+
This repository contains the replication package, analysis scripts, and additional information on our paper accepted at ICSE 2021.
44

5-
## Replication Package
5+
Publication: Norman Peitek, Sven Apel, Chris Parnin, André Brechmann, and Janet Siegmund. *Program Comprehension and Code Complexity Metrics: An fMRI Study*. In Proceedings of the International Conference of Software Engineering (ICSE), 2021
6+
7+
# Replication Package
68

79
In `/replication`, we provide:
810

9-
- all used stimuli for the comprehension, control condition, and distractor tasks (`/tasks-*`)
11+
- all used stimuli for the comprehension, control condition, and distractor tasks (`/tasks-*`) either as image and/or text file. We used our [CodeImageGenerator](https://github.com/peitek/CodeImageGenerator) to create image files for our Java code snippets.
1012
- experiment protocol for the fMRI session
11-
- analysis protocol
1213
- the meta protocol for the post-session interview
1314

14-
## Experiment Data: Pilot Study
15+
# Data
16+
17+
In `/data`, we share all raw and preprocessed data that we can. Due to our local privacy law, we cannot publicly provide the fMRI data at this time. Please contact us for individual solutions.
18+
19+
# Analysis
20+
21+
In `/analysis`, we share our analysis scripts that process the input data, compute the results, create plots, and run statistics.
22+
23+
To run the script yourself, you will need:
1524

16-
We share our insights from the pilot studies in `/pilot-data`.
25+
- Python 3.x
26+
- numpy
27+
- pandas
28+
- scipy
29+
- seaborn
30+
- statsmodels
1731

18-
Due to our local privacy law, we cannot publicly provide the fMRI data at this time. You can contact us to individually share experiment data (once double-blind is lifted).
32+
Once your system is ready, start the `/analysis/main.py`.
33+
34+
# Results
35+
36+
For convenience, we provide all output that the analysis script yields for our data in `/output`.
1937

2038
## Experiment Data: fMRI Correlations
2139

22-
In addition to the selected plots presented in the paper, we added all correlation plots in `/plots`.
40+
In addition to the selected plots presented in the paper, we added all correlation plots in `/output/plots`.
2341

2442
## Experiment Data: Additional Metrics
2543

26-
We explored overall 41 additional metrics. In `/Metrics.md`, we provide an description of each metric.
44+
We explored overall 41 additional metrics. In `/data/metrics/Metrics.md`, we provide a description of each metric.
2745

2846
In the paper, we provide a shortened overview. Here is the full correlation table:
2947

@@ -65,3 +83,7 @@ In the paper, we provide a shortened overview. Here is the full correlation tabl
6583
| Halstead | 0.485714 | 0.542857 | 0.428571 | 0.600000 | 0.393218 | 0.209351 | 0.213938 | 0.288564 | 0.600000 |
6684
| Sonar_Cog | 0.239046 | 0.139443 | 0.039841 | 0.298807 | 0.237528 | 0.057054 | 0.016282 | 0.131811 | 0.298807 |
6785
| Sonar_Cyclo | 0.102869 | 0.123443 | 0.000000 | 0.205738 | 0.051420 | 0.022982 | 0.000800 | 0.046978 | 0.205738 |
86+
87+
# Contact
88+
89+
If you have questions, please contact me directly: `norman.peitek@lin-magdeburg.de`. Thank you!

analysis/BehavioralBrain.py

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
import numpy
2+
import pandas as pd
3+
import matplotlib.pyplot as plt
4+
5+
import BrainActivationAnalysis
6+
from analysis import BehavioralSubjective
7+
8+
plt.rcParams.update({'font.size': 26})
9+
import seaborn as sns
10+
import scipy
11+
12+
# TODO this file is supposed to analyze complexity metrics versus behavioral data
13+
graph_label = dict(color='#202020', alpha=0.9)
14+
15+
def plot_correlation(df, computeResponseTime, ba, activation=True):
16+
if computeResponseTime:
17+
variable = 'ResponseTime'
18+
else:
19+
variable = 'Correct'
20+
21+
ax1 = df.plot(kind='scatter', x=variable, y=ba, s=50, figsize=(7, 4))
22+
z = numpy.polyfit(df[variable], df[ba], 1)
23+
p = numpy.poly1d(z)
24+
25+
plt.plot(df[variable], p(df[variable]), linewidth=1)
26+
27+
if computeResponseTime:
28+
plt.xlabel("Response Time (in sec.)")
29+
else:
30+
plt.xlabel("Correctness (in %)")
31+
32+
if activation:
33+
plt.ylabel("Activation in %\n" + ba)
34+
else:
35+
plt.ylabel("Deactivation in %\n" + ba)
36+
37+
corr = df[ba].corr(df[variable], method='kendall')
38+
print('Kendall corr:', corr)
39+
40+
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(df[ba], df[variable])
41+
print('r squared:', r_value**2)
42+
43+
left, right = plt.xlim()
44+
bottom, top = plt.ylim()
45+
ax1.text(left+((right-left)/40), bottom + ((top - bottom) / 7), 'Kendall τ: ' + format(corr, '.2f'), fontdict=graph_label)
46+
ax1.text(left+((right-left)/40), bottom + ((top - bottom) / 40), 'r squared: ' + format(r_value**2, '.2f'), fontdict=graph_label)
47+
48+
sns.despine()
49+
plt.tight_layout()
50+
51+
if activation:
52+
pre = 'Activation_'
53+
else:
54+
pre = 'Deactivation_'
55+
56+
plt.savefig('output/' + pre + ba + '_' + variable + '.pdf', dpi=300, bbox_inches='tight', pad_inches=0)
57+
58+
59+
def compute_behavioral_brain(df_ba_cond, behavioral_data, activation):
60+
61+
behavioral_ba = pd.merge(df_ba_cond, behavioral_data, how='left', left_on=['participant', 'Snippet'], right_on=['Participant', 'Snippet'])
62+
63+
# check whether there is response times < 5s and exclude them since they are accidental clicks
64+
behavioral_ba["ResponseTime"].fillna(60000, inplace=True)
65+
behavioral_ba.loc[behavioral_ba['ResponseTime'] < 5000, 'ResponseTime'] = numpy.nan
66+
behavioral_ba = behavioral_ba.dropna(subset=['ResponseTime'])
67+
behavioral_ba["ResponseTime"] = behavioral_ba['ResponseTime'].apply(BehavioralSubjective.convert_to_second)
68+
69+
# compute correct correctness
70+
behavioral_ba = behavioral_ba.groupby('Snippet').mean()
71+
behavioral_ba["Correct"] = behavioral_ba['Correct'].apply(BehavioralSubjective.convert_to_percent)
72+
73+
for ba in BrainActivationAnalysis.get_bas(activation):
74+
plot_correlation(behavioral_ba, True, ba, activation)
75+
plot_correlation(behavioral_ba, False, ba, activation)
76+
77+
78+
def main():
79+
df_ba_part_cond_act = pd.read_csv('../data/fMRI/fMRI_Analyzed_BA_Snippet_Participant_Activation.csv')
80+
df_ba_part_cond_deact = pd.read_csv('../data/fMRI/fMRI_Analyzed_BA_Snippet_Participant_Deactivation.csv')
81+
82+
behavioral_data = BehavioralSubjective.load_behavioral_data()
83+
84+
compute_behavioral_brain(df_ba_part_cond_act, behavioral_data, True)
85+
compute_behavioral_brain(df_ba_part_cond_deact, behavioral_data, False)
86+
87+
print('\n##### \n-> all done \o/')
88+
89+
90+
if __name__ == "__main__":
91+
main()

0 commit comments

Comments
 (0)