Skip to content

Project Management

John M. Drake edited this page Sep 28, 2023 · 42 revisions

Page Editor: @allopole
To request edits to this page, open an issue, and tag @allopole


This page covers project structure, collaboration, and version control systems.

Come here when beginning a new project for steps to set up folder directory structure, use electronic collaboration, and setting up version control, a system for managing projects over time.


Getting a computational project started with a Project Protocol

Once you have a computational project in mind, you will be asked to write a modeling project protocol. The goal of the protocol is to avoid "mission creep" and guide practical decisions that have to be made late in the research process. This includes a short but succinct background paragraph of the context of the study, a description of method of data collection/study design, any external data sources, and a checklist of things that need to be done (and when). Once you and the co-authors agree on the project's protocol, it shouldn't be changed much. If you do change it, make sure to document why at the end of the document. As you are filling out the protocol take note of what methods, software, or data you will need. Some code will need to be written yourself but there are a lot of resources online (GitHub, SourceForge, CRAN) that may be helpful. Also, don't forget to ask other members of the Lab or OSE community!

If you are looking for more data sources, check out this list that Drake Lab members have found useful in the past:

  • Project Tycho -- take note of how clean specific data are (in the past, we've noticed some oddities with cumulative/incidence data, missing data, wrong data)
  • CDC Influenza and COVID-19 Data
  • WHO

Setting up a neat project/folder directory

A good project layout will ultimately make your life easier:

  • Ensures the integrity of data
  • Makes it simpler to share your code with someone else (a lab-mate, collaborator, or supervisor)
  • Allows you to easily upload your code with your manuscript submission
  • Makes it easier to pick the project back up after a break

See here for tips on doing this in R with Rmd. Here are more specific tips for organizing files on your computer including for starting project directories. If you'd like to clone a Github repo to get started, you can visit here or here.

Get ready to learn a Version Control System, especially if the project requires collaboration

Understand the capabilities of version control and how it can help you in your everyday work and how it can make collaborative work less painful. For that, we recommend watching this introductory video from the software carpentry course. https://www.youtube.com/watch?v=gY2JwRfin1M

We typically use Git & GitHub in the Drake Lab. Good links to learn how to use Git are:

You can work with Git on Github.com, in RStudio, or on the command line.

Git Workflows for individual and collaborative projects

Once you are familiar with the basic functioning of Git and Github, it's time to think about how to organize your workflow. From a Git point of view, a workflow is a strategy for using Branches. How you use branches depends on the nature of the project. Here are some suggestions for simple individual and collaborative workflows that work for the sciences:

Individual Project

The simplest project would use just one branch, called "master". Each substantive change you make should be "committed" (i.e. a "commit" is created). This allows you to rewind your project to a previous state if needed.

A slightly more flexible workflow allows you to try new ideas in a side branch before "merging" those new ideas back into the master branch:

Alternately, you can create new branches named appropriately for each new idea you want to try. If the idea doesn't work, you can simple stop working in that branch and keep working in the master branch. If the idea works, merge it back into the "master" branch, then continue working in the "master" branch.

Collaborative projects

For a collaborative project, we suggest that each person's individual contribution be done in a separate branch with their name on it. Treat your named branch like you would "master" in an individual project. Make idea branches off your named branch, and merge them back into your named branch if they are successful. When you reach endpoints, merge your named branch back into "master". You can then keep working in your named branch if you have more to do, merging into master periodically.

Bon voyage!

Our inspiration: a git workflow used in software development:

https://nvie.com/img/git-model@2x.png

Communication & task management

  • Email for anything you may want to be able to refer back to over a longer period of time
  • Slack for more immediate communication about projects (mobile & desktop app) that may not be as important to refer back to over a longer period of time. Download available: https://slack.com
  • We have not implemented or imposed a task management system for the lab. We have considered Trello for task management, but have not implemented it. The Trello app allows grouping and assigning tasks. It is possible to integrate Trello with slack. Other task management tools we have used successfully, either individually or within small groups, are RTM (Remember the Milk), GitHub Issues, and GitHub Projects. GitHub Issues are always tied to a particular repository. Github Projects are fancy task lists accessible within a GitHub organization. RTM is useful for individual and small group task management, but the full featured version required a paid subscription.

More resources

Osborne JM, Bernabeu MO, Bruna M, Calderhead B, Cooper J, Dalchau N, et al. (2014) "Ten Simple Rules for Effective Computational Research." PLoS Comput Biol 10(3): e1003506. https://doi.org/10.1371/journal.pcbi.1003506

Lab Links

Clone this wiki locally