HEALTH DATA COMPLIANCE
Health Data — Configuration guide based on distribution and usage restrictions
- Key points (health.gov.au perspective):
- Many public health data have usage/consent/anonymization requirements and management frameworks (e.g. HCP/PHDB/NIHSI/EDW).
- Data classification and governance (Data Governance Framework) is in place, and policies are applied to storage, sharing, and usage.
-
Some data is publicly available, but research use and detailed access require an application/approval/agreement (DUA).
-
Items that should be included in the configuration (recommended to add to
config/connectome_config.yaml): access_type: "public" | "restricted" # Determine whether automatic download is possiblerequires_approval: true/falseapproval_contact: "email@custodian.gov.au" # Where to applydata_classification: "sensitive" | "public"duc_required: true/falseduc_url: "https://..." # DUA/DUC referenceci_download_secret: "CONNECTOME_DOWNLOAD_URL" # Secret name to place in CI (required for restricted data)storage: { encrypted: true, allowed_environments: ["secure_cluster"] }-
provenance: { source_url: ..., version: ..., contact: ... } -
Example (expanding dataset entry in sources):
hpd_restricted_dataset: dataset_id: "hpd_2024" source_url: "https://data.custodian.gov.au/hpd/metadata" format: "json" access_type: "restricted" requires_approval: true duc_required: true duc_url: "https://custodian.gov.au/duc" approval_contact: "data.custodian@agency.gov.au" data_classification: "sensitive" storage: encrypted: true allowed_environments: - "secure_cluster" ci_download_secret: "HPD_DOWNLOAD_URL"
- CI and automation notes:
- Avoid workflows that automatically download directly from public sites when
access_type: restrictedorrequires_approval: true. - When importing with CI, place the download URL in Secrets outside the repository, and use manual workflow_dispatch + required approval step for bootstrap.
-
Require confirmation of DUA/DUC consent before download (leave a trail of consent in Docs).
-
Actual operation flow (recommended):
- Apply for access and give DUA consent to the data custodian.
- After approval, register the secure download URL (with an expiration date) or access credentials provided by the custodian as a CI secret.
- Run
scripts/bootstrap_connectome.py --download-urlin a protected step (workflow with manual approval) on the runner. -
Store the downloaded JSON/NPZ in an artifact repository with encrypted storage/ACL and do not include it in the public repository.
-
The following recommended actions (short term):
- Add the above field to the target dataset in
config/connectome_config.yaml(I will add the example with a patch). - Added note about using manual approval / secrets to CI workflow
bootstrap-connectome.yml.
A simple patch (apply it if you want to automate it).