Late in Study, Dataset-Sharing Orientation, Low Level of Data Sharing Resources¶

Data Packaging Timeline¶

What to do right away¶

Set yourself up for success¶

Review all study files/resources already produced by or for your study
- If you are late in your study, you've likely accumulated many study files already. This means establishing file naming and organization conventions now and back-applying them may be quite burdensome and potentially prone to error. Therefore, we generally recommend that you leave file names and organization as is - However, we do request that you consider the following exceptions:
  - Where practicable to implement (without duplicating original files), organize all study files/resources into a single study folder/directory (study folder/directory may of course have sub-directories; see here for guidance on and examples of recommended study folder/directory structure)
  - If you have sets of "like" files (e.g. a similarly formatted tabular data file or brain imaging file per study subject per study timepoint), it may be well worth establishing file naming and organization conventions based on HEAL recommendations for organizing and naming study files/resources now and back-applying them just for these file sets - doing so may make it possible/easier to annotate these file sets in one go instead of annotating them singly one at a time, and so may substantially reduce annotation/data-sharing burden for the study group
All study files/resources should be stored in a location where the person(s) who will be creating/contributing to your data package documentation can access them all at the same time (e.g. you can have files located on different network drives as long as all network drives can be mounted and accessed at the same time by the person documenting; you CANNOT have files located on two different local computer drives, even if the person documenting can access both computers separately)

A note on copying study files

Although these guidelines provide suggestions on some useful adjustments you can make before you start documenting that will make documentation easier (e.g., applying naming conventions and ensuring files are accessible to the person documenting), these guidelines do not mean that you should copy your study files into new or existing folders to group all "to share" files together.

All documentation should be completed based on original files. Creating copies of your files for documentation can introduce inconsistencies in your final package (e.g., if you edit the original file but not the copy, the file documented and likely shared will not be the most up to date version).

The only exception to this rule: After you have completely finished documenting your study files at their local (or network) paths, you will copy the files that you intend to share to finish preparing your data package for submission.

What to do when your dataset-of-interest is finalized¶

Initialize your Data Package ¶

Create a "dsc-pkg" folder/directory that will hold all Standard Data Package Metadata Files for your data package
- If all study files/resources are organized into a single study folder/directory, create this folder/directory as a direct sub-directory of your study folder/directory, and name it "dsc-pkg"; consistency in naming and location of this folder/directory relative to your overall study folder/directory will make it easy to recognize as the folder that contains the Standard Data Package Metadata files for your study's data package
- If all study files/resources are NOT organized into a single study folder/directory, create this folder/directory in a disk location that makes sense for you; name it "dsc-pkg", optionally appending a suffix to the name that will make it easy to recognize as "belonging" to a specific study (e.g. "dsc-pkg-study-1" or "dsc-pkg-mindfulness-for-oud"); consistency in naming (i.e. including the "dsc-pkg" prefix) and appending a suffix to the name that is a human-recognizable identifier for the relevant study will make it easy to recognize as the folder that contains the Standard Data Package Metadata files for your study's data package

Make a list of contributing experiments/activities for the final dataset-of-interest¶

Make a list of the full set of study experiments/activities that produced supporting data or other support for the final dataset-of-interest

Start your Experiment Tracker ¶

Start your Experiment Tracker by initializing an empty Experiment Tracker file based on the Experiment Tracker csv template
- Save your Experiment Tracker in your "dsc-pkg" folder as "heal-csv-experiment-tracker.csv"
- Each row in your Experiment Tracker will represent a study experiment/activity
- Use the Experiment Tracker schema to understand what each "question"/field in the Experiment Tracker means and how to "answer"/complete each "question"/field
Add all study experiments/activities that produced supporting data or other support for the final dataset-of-interest to your Experiment Tracker

Start your Data Dictionary(ies)¶

If the dataset-of-interest is a tabular data file, create a Data Dictionary for the dataset
- You'll add this Data Dictionary as an associated file/dependency of your dataset-of-interest when you list your dataset-of-interest in the Resource Tracker because it is necessary for interpretation and use of the dataset-of-interest
Start a Data Dictionary for a tabular data file by initializing an empty Data Dictionary file based on the Data Dictionary csv template
- Save your Data Dictionary in your "dsc-pkg" folder as "heal-csv-dd-my-datafile.csv" (i.e. the file name starts with the prefix "heal-csv-dd-", you append the name of the data file to which the Data Dictionary applies, and save as a csv file)
- Each row in your Data Dictionary will represent a variable that is collected/populated in your tabular data file
- Use the Data Dictionary schema to understand what each "question"/field in the Data Dictionary means and how to "answer"/complete each "question"/field
- Add all variables in the tabular data file to your Data Dictionary
Add a Data Dictionary for each tabular data file

Start your Resource Tracker ¶

Start your Resource Tracker by initializing an empty Resource Tracker file based on the Resource Tracker csv template
- Save your Resource Tracker in your "dsc-pkg" folder as "heal-csv-resource-tracker.csv"
- Each row in your Resource Tracker will represent a study file/resource that you have annotated. Study files include data and non-data supporting files, including HEAL-formatted data dictionaries you may have created
- Use the Resource Tracker schema to understand what each "question"/field in the Resource Tracker means and how to "answer"/complete each "question"/field
- The Resource Tracker will ask you to list associated files/dependencies for each study file/resource (i.e. files that are required to interpret, replicate, or use the study file/resource; for tabular data files, this will include a data dictionary for the file)

Add items to your Resource Tracker¶

First, add your finalized dataset-of-interest to your Resource Tracker.
- The Resource Tracker will ask you to list associated files/dependencies of the dataset-of-interest
- If the finalized dataset-of-interest is a tabular data file, include the Data Dictionary you created for it as one of the dependencies of this resource.
- List any other dependencies of the dataset-of-interest in its Resource Tracker entry (e.g. raw data pieces, code, analysis plan)
Next, add any items you have listed as associated files/dependencies of the finalized dataset-of-interest as resources to your Resource Tracker ONLY if the file will be shared in a public repository.
- Remember that whenever you add a tabular data file to your Resource Tracker, you should first create a Data Dictionary. Then, add the tabular file to the Resource Tracker and list the Data Dictionary you have created as a dependency.
Finally, any file that has been added as an associated files/dependency of a resource in the Resource Tracker should also be added as a resource ONLY if it will be shared in a public repository.
- Repeat this process with the dependencies of these files until you're listing files without any dependencies.
If there are any other study files that you will share in a public repository that have not already been added to the Resource Tracker, add them to the Resource Tracker now

Congratulations! You have finished preparing your data package locally.

You can now prepare your data package for submission to a public repository.

Tip to confirm your local data package is ready for submission

Before you share your data package, it may be useful to ask someone else in your study group to review your data package and annotations thinking about whether it would be accessible and understandable to a researcher looking at the data package for the first time.

If they are able to easily understand the study structure and how they might go about replicating your dataset/results, that is a good sign that your data package has the necessary resources and adequate detail to be shared.