HEALTH DATA COMPLIANCE

Health Data — Configuration guide based on distribution and usage restrictions

  • Key points (health.gov.au perspective):
  • Many public health data have usage/consent/anonymization requirements and management frameworks (e.g. HCP/PHDB/NIHSI/EDW).
  • Data classification and governance (Data Governance Framework) is in place, and policies are applied to storage, sharing, and usage.
  • Some data is publicly available, but research use and detailed access require an application/approval/agreement (DUA).

  • Items that should be included in the configuration (recommended to add to config/connectome_config.yaml):

  • access_type: "public" | "restricted" # Determine whether automatic download is possible
  • requires_approval: true/false
  • approval_contact: "email@custodian.gov.au" # Where to apply
  • data_classification: "sensitive" | "public"
  • duc_required: true/false
  • duc_url: "https://..." # DUA/DUC reference
  • ci_download_secret: "CONNECTOME_DOWNLOAD_URL" # Secret name to place in CI (required for restricted data)
  • storage: { encrypted: true, allowed_environments: ["secure_cluster"] }
  • provenance: { source_url: ..., version: ..., contact: ... }

  • Example (expanding dataset entry in sources):

hpd_restricted_dataset: dataset_id: "hpd_2024" source_url: "https://data.custodian.gov.au/hpd/metadata" format: "json" access_type: "restricted" requires_approval: true duc_required: true duc_url: "https://custodian.gov.au/duc" approval_contact: "data.custodian@agency.gov.au" data_classification: "sensitive" storage: encrypted: true allowed_environments: - "secure_cluster" ci_download_secret: "HPD_DOWNLOAD_URL"

  • CI and automation notes:
  • Avoid workflows that automatically download directly from public sites when access_type: restricted or requires_approval: true.
  • When importing with CI, place the download URL in Secrets outside the repository, and use manual workflow_dispatch + required approval step for bootstrap.
  • Require confirmation of DUA/DUC consent before download (leave a trail of consent in Docs).

  • Actual operation flow (recommended):

  • Apply for access and give DUA consent to the data custodian.
  • After approval, register the secure download URL (with an expiration date) or access credentials provided by the custodian as a CI secret.
  • Run scripts/bootstrap_connectome.py --download-url in a protected step (workflow with manual approval) on the runner.
  • Store the downloaded JSON/NPZ in an artifact repository with encrypted storage/ACL and do not include it in the public repository.

  • The following recommended actions (short term):

  • Add the above field to the target dataset in config/connectome_config.yaml (I will add the example with a patch).
  • Added note about using manual approval / secrets to CI workflow bootstrap-connectome.yml.

A simple patch (apply it if you want to automate it).