Early in Study, Data-Sharing Orientation, Standard Level of Data Sharing Resources¶
Data Packaging Timeline¶
What to do right away¶
Set yourself up for success¶
- Review all study files/resources already produced by or for your study
- Come up with file organization and naming conventions for your study folders and files now - Consider applying HEAL recommendations for file organization and naming. Apply these conventions to existing study files and to any future files and folders as they are created
- Where practicable to implement (without duplicating original files), organize all study files/resources into a single study folder/directory (study folder/directory may of course have sub-directories; see here for guidance on and examples of recommended study folder/directory structure)
- All study files/resources should be stored in a location where the person(s) who will be creating/contributing to your data package documentation can access them all at the same time (e.g. you can have files located on different network drives as long as all network drives can be mounted and accessed at the same time by the person documenting; you CANNOT have files located on two different local computer drives, even if the person documenting can access both computers separately)
A note on copying study files
Although these guidelines provide suggestions on useful adjustments you can make before you start documenting that will make documentation easier (e.g., applying naming conventions and ensuring files are accessible to the person documenting), these guidelines do not mean that you should copy your study files into new or existing folders to group all "to share" files together.
All documentation should be completed based on original files. Creating copies of your files for documentation can introduce inconsistencies in your final package (e.g., if you edit the original file but not the copy, the file documented and likely shared will not be the most up to date version).
The only exception to this rule: After you have completely finished documenting your study files at their local (or network) paths, you will copy the files that you intend to share to finish preparing your data package for submission.
Initialize your Data Package¶
- Create a "dsc-pkg" folder/directory that will hold all Standard Data Package Metadata Files for your data package
- If all study files/resources are organized into a single study folder/directory, create this folder/directory as a direct sub-directory of your study folder/directory, and name it "dsc-pkg"; consistency in naming and location of this folder/directory relative to your overall study folder/directory will make it easy to recognize as the folder that contains the Standard Data Package Metadata files for your study's data package
- If all study files/resources are NOT organized into a single study folder/directory, create this folder/directory in a disk location that makes sense for you; name it "dsc-pkg", optionally appending a suffix to the name that will make it easy to recognize as "belonging" to a specific study (e.g. "dsc-pkg-study-1" or "dsc-pkg-mindfulness-for-oud"); consistency in naming (i.e. including the "dsc-pkg" prefix) and appending a suffix to the name that is a human-recognizable identifier for the relevant study will make it easy to recognize as the folder that contains the Standard Data Package Metadata files for your study's data package
Start your Experiment Tracker¶
- Start your Experiment Tracker by initializing an empty Experiment Tracker file based on the Experiment Tracker csv template
- Save your Experiment Tracker in your "dsc-pkg" folder as "heal-csv-experiment-tracker.csv"
- Each row in your Experiment Tracker will represent a study experiment/activity
- Use the Experiment Tracker schema to understand what each "question"/field in the Experiment Tracker means and how to "answer"/complete each "question"/field
- Add all study experiments/activities which have already been designed to your Experiment Tracker
Start your Data Dictionary(ies)¶
- Create a Data Dictionary for each tabular data file already produced by or for your study
- Start a Data Dictionary for a tabular data file by initializing an empty Data Dictionary file based on the Data Dictionary csv template
- Save your Data Dictionary in your "dsc-pkg" folder as "heal-csv-dd-my-datafile.csv" (i.e. the file name starts with the prefix "heal-csv-dd-", you append the name of the data file to which the Data Dictionary applies, and save as a csv file)
- Each row in your Data Dictionary will represent a variable that is collected/populated in your tabular data file
- Use the Data Dictionary schema to understand what each "question"/field in the Data Dictionary means and how to "answer"/complete each "question"/field
- Add all variables in the tabular data file to your Data Dictionary
Start your Resource Tracker¶
- Start your Resource Tracker by initializing an empty Resource Tracker file based on the Resource Tracker csv template
- Save your Resource Tracker in your "dsc-pkg" folder as "heal-csv-resource-tracker.csv"
- Each row in your Resource Tracker will represent a study file/resource that you have annotated. Study files include data and non-data supporting files, including HEAL-formatted data dictionaries you may have created
- Use the Resource Tracker schema to understand what each "question"/field in the Resource Tracker means and how to "answer"/complete each "question"/field
- The Resource Tracker will ask you to list associated files/dependencies for each study file/resource (i.e. files that are required to interpret, replicate, or use the study file/resource; for tabular data files, this will include a data dictionary for the file)
- Add all study files/resources already produced by or for your study to your Resource Tracker
What to do continuously, as-you-go¶
New study experiments/activities:¶
- Add new study experiments/activities to your Experiment Tracker as they are designed (as soon as possible)
New tabular data files:¶
- Create a Data Dictionary for new tabular data files as they are produced
New study files/resources:¶
- Consistently apply the file organization and naming conventions you decided on for your study files (in the "Set yourself up for Success" section) to file organization and naming as you create/collect new study files
- If you need to come up with new file organization and naming conventions for new data types, study experiments/activities, or other unanticipated file collection, consider applying HEAL recommendations for file organization and naming when proactively deciding upon those new file organization and naming conventions, and apply these conventions to study files/resources moving forward, as they are created
-
Add new study files/resources to your Resource Tracker as they are created/collected (as soon as possible), including data and non-data support files, such as protocols, data dictionaries, code, and analysis files
What to do when your dataset-of-interest is finalized¶
Add items to your Resource Tracker¶
- Add your finalized dataset-of-interest to your Resource Tracker
- Confirm that all associated files/dependencies for the finalized dataset-of-interest (i.e. files required to interpret, replicate, or use the finalized dataset-of-interest) are also listed as study resource/files in your Resource Tracker - Add any that are missing
- Confirm that all associated files/dependencies for each study file/resource listed in your Resource Tracker are also listed as study files/resources in your Resource Tracker - Add any that are missing
Congratulations! You have finished preparing your data package locally.
You can now prepare your data package for submission to a public repository.
Tip to confirm your local data package is ready for submission
Before you share your data package, it may be useful to ask someone else in your study group to review your data package and annotations thinking about whether it would be accessible and understandable to a researcher looking at the data package for the first time.
If they are able to easily understand the study structure and how they might go about replicating your dataset/results, that is a good sign that your data package has the necessary resources and adequate detail to be shared.