Raw and primary data¶
A fundamental principle of Psych-DS is the preservation of the earliest available form of research data. This page explains the distinctions between raw and primary data, and how to handle different data types within the Psych-DS framework.
Core principles¶
Key Requirements
- The earliest form of data must be preserved
- Original data should never be modified
- Different versions of the data should be kept separate
- All transformations should be documented
Understanding raw vs primary data¶
Raw data¶
Raw data refers to the very first form in which data exists, regardless of format. This could be:
- Paper questionnaires
- Audio/video recordings
- Device output files
- Hand-written notes
- Original Excel workbooks
- Survey software exports
Primary data¶
Primary data refers to the first digital form of the data. This distinction becomes important when raw data begins in a non-digital format.
Raw vs Primary Data Examples
Scenario 1: Paper-based collection
- Raw Data: Physical paper questionnaires
- Primary Data: Scanned PDFs or spreadsheet with transcribed responses
Scenario 2: Digital Collection
- Raw Data: Survey software export file
- Primary Data: Same file (raw and primary are identical)
Scenario 3: Behavioral Observation
- Raw Data: Video recordings of participant behavior
- Primary Data: Spreadsheet of coded behaviors from videos
Data organization in Psych-DS¶
The data/ directory¶
Your Psych-DS data/ directory should contain:
- Primary data (or the earliest version available)
- Any Psych-DS compliant versions of that data
- All subsequent processed/transformed versions
The data/raw/ subdirectory¶
If your primary data is not in CSV format, store it in the data/raw/ subdirectory. This tells the Psych-DS validator to ignore these files while still preserving them as part of your dataset.
Directory structure example
my-study/
data/
raw/
original-survey-responses.xlsx
participant-videos/
participant001.mp4
participant002.mp4
participant-001_data.csv
participant-002_data.csv