• 3. Storing data
Motivating scenario: You have collected data and want to take good care of it.
Learning goals: By the end of this subsection you should be able to
- Safely store and back up data.
- Make folders to house all files for a project.
- Understand why and how to submit data for long-term storage.
Storing data
Collecting data is hard work. It is therefore important to make sure the data does not get lost or corrupted, and can be easily analyzed by us and other researchers. We therefore must consider stable and convenient data storage over the short and long term.
Backing up data
Computers can die, cloud storage can fail, and external hard drives can be lost or overwritten. So, I suggest using all three. Update each storage location every time you add data to your spreadsheet. Automatic syncing is even safer. Once you have entered data for the day, make sure those files are protected so that they are never altered.
Do not edit data in a datasheet. Rather, you should process and filter data in a computer script so that you have a record of your process.
Structuring your folders
You likely have numerous things you’re working on. To keep your work clean and reproducible, I suggest:
- Keeping all aspects of a project in a single folder.
- Guarding this folder against unrelated material.
Exactly how you structure your folder for a project is up to you, and depends on the scope and scale of the project. For small projects (one script, one data file etc) like the ones in this course I suggest one folder with a small handful of files (Figure 1). For larger projects with multiple scripts and multiple datasets (e.g., an honors thesis, a scientific manuscript), it is sometimes cleaner to have separate subfolders for each kind of file (e.g., all data sheets go in the data folder, all scripts go in the scripts folder etc).
Long-term storage
By sharing our data, we make our science more transparent, our work more reproducible, and make our data accessible for people to further investigate (or combine with other studies). As such it is the expectation in most fields that data is made available after publication. Repositories like data DRYAD, figshare or DRUM make this easy.
This is perhaps most helpful for you - the author of the project. You are the one most likely to want to revisit your previous code and data, and as noted above, the long term survival of such data in your hands cannot be guaranteed.