Data User Guide Template

Author

Camilla Green

Published

January 15, 2026

1 Data User Guide Template

Open In Colab

1.1 Examples:

  • TOPST SCHOOL project: https://ciesin-geospatial.github.io/TOPSTSCHOOL-SPHINX/
    • Specific lesson: NRT Water Lesson: https://ciesin-geospatial.github.io/TOPSTSCHOOL-water/m102-lance-modis-nrt-global-flood.html
  • Pytidycensus examples page: https://github.com/mmann1123/pytidycensus/tree/main/examples
  • NASA Earthdata Cloud Cookbook: https://nasa-openscapes.github.io/earthdata-cloud-cookbook/tutorials/
  • GSAPP Smorgasbord: https://cdp.arch.columbia.edu/smorgasbord/modules/5-computational-design-modeling-in-grasshopper/5-7_Design-Space/

1.2 Tips and Pointers

  • Provide links to any relevant URLs including code packages, data documentation
  • Provide links to other help/guides when relevant instead of DIY. For example, link to a guide on accessing REST APIs instead of writing a whole section yourself. Two options for this:
    • https://zapier.com/blog/how-to-use-api/#what
    • https://guides.dataverse.org/en/6.8/api/getting-started.html
  • Use open source tools whenever possible to facilitate user accessibility. Where an open tool is not available, explain why the tool is necessary and point to some possible alternatives

1.3 1. Overview

1.3.1 1.1 What Is This Dataset?

  • Brief, plain-language description of the dataset
  • Links to our datahub, API
  • Time period covered
  • Geographic or population coverage
  • License (e.g., CC BY 4.0)
  • Citation guidance
  • Any usage restrictions

1.3.2 1.2 Who Is This Data AND Tutorial For?

  • Intended audience
  • Suggested skill level (beginner / intermediate / advanced)

1.4 2. Key Questions This Data Can Answer

List 5–6 example questions users can explore, in addition to the key question that we will be exploring in this guide.

  • How has X changed over time?
  • Are there differences in X by region or demographic group?
  • Which areas experience the highest or lowest values?

Explain why this data set is appropriate to use to assess the key question.


1.5 3. Data Access

1.5.1 3.1 Where to Find the Data

  • Link(s) to download location (portal, GitHub, API, cloud storage)
  • File formats available (CSV, Excel, Parquet, API endpoint)

1.5.2 3.2 How to Download

Step-by-step instructions: 1. Navigate to … 2. Click … 3. Select format …

Include screenshots if possible.


1.6 4. Dataset Structure

1.6.1 4.1 Files Included

File Name Description
data.csv Main cleaned dataset
codebook.csv Variable definitions
metadata.json Collection and methodology details

1.6.2 4.2 Unit of Analysis

Clearly state what one row represents:

  • One road
  • One ward
  • One health facility
  • Etc

1.7 5. Variables & Definitions

1.7.1 5.1 Key Variables

Highlight the most important fields.

Variable Description Type Example
county County name Categorical Cook
year Calendar year Integer 2022
funding_usd Total funding Numeric 125000

1.7.2 5.2 Missing Data

  • How missing values are represented (e.g., NA, blank, -999)
  • Known gaps or limitations

1.8 6. Data Quality & Limitations

  • Known sources of bias
  • Data collection limitations
  • Changes in methodology over time
  • What the data should not be used for

1.9 7. Getting Started: Quick Setup

1.9.1 7.1 Tools You Can Use

  • R (which packages?)
  • Python (which packages?)
  • ArcGIS Pro / QGIS
  • Excel / Google Sheets
  • Local computing requirements (if any) - provide alternatives to local processing
    • JupyterLab/CoLab

1.9.2 7.2 Load the Data

1.9.3 7.3 Clean / Filter Data

  • Subset to year or area of interest

1.9.4 7.4 EDA

  • Calculate summary statistics

1.10 8. Analysis / Visualization

1.11 9. Conclusion / Summary / Further Steps


2 TO DO

  • Adapt Matt W’s colab notebook that uses the roads dataset -> converts to a network -> calculates travel time

    • Need to work out fix to access ArcGIS WMS in python
  • Determine if we will use Astro or Quarto? -> Matt H. and Juan are doing some tests

  • Map out which user guides we will develop next

    • Catchment areas from friction surface
    • DRC Data layers
    • Waterways