toys_dataset — Downloading example data

The pydownloader module provides helpers to fetch example files from the WOLFHECE GitLab repositories. Files are cached locally so subsequent calls are instant.

Function

Source repository

Purpose

toys_dataset(dir, file)

wolf_examples

WolfArrays, vectors, shapefiles, …

toys_gpu_dataset(dir)

wolfgpu_examples

Complete GPU simulations

toys_injector_gpu(file)

wolfgpu_injectors

GPU injector files

toys_laz_grid(dir, file)

wolf_examples

LiDAR grid tiles (list-based)

toys_lidaxe(dir, file)

lidaxe_data

Lidaxe flow accumulation data

download_file(url, …)

any URL

Generic HTTP/HTTPS/FTP downloader

Imports and cache directory

[1]:
from wolfhece.pydownloader import (
    toys_dataset, toys_gpu_dataset, toys_injector_gpu,
    download_file, download_gpu_simulation,
    DATADIR, GITLAB_EXAMPLE, DownloadFiles,
)
from pathlib import Path

print(f"Cache directory: {DATADIR}")
print(f"GitLab base URL: {GITLAB_EXAMPLE}")
Cache directory: C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads
GitLab base URL: https://gitlab.uliege.be/HECE/wolf_examples/-/raw/main

Downloading a WolfArray

toys_dataset(dir, file) builds the URL from GITLAB_EXAMPLE/{dir}/{file}, downloads it (plus companion files like .bin.txt), and returns the local Path. If the file already exists locally, the download is skipped.

[2]:
# Download a DEM (Digital Elevation Model)
fpath = toys_dataset('Array_Theux_Pepinster', 'mnt.bin')
print(f"Downloaded to: {fpath}")
print(f"Exists?       {fpath.exists()}")

# The companion header file is also downloaded
header_path = Path(str(fpath) + '.txt')
print(f"Header exists? {header_path.exists()}")

# Load it as a WolfArray
from wolfhece.wolf_array import WolfArray
wa = WolfArray(fname=str(fpath))
print(f"Shape: {wa.nbx} x {wa.nby}, dx={wa.dx}, dy={wa.dy}")
print(f"Bounds: x=[{wa.origx:.0f}, {wa.origx + wa.nbx * wa.dx:.0f}], "
      f"y=[{wa.origy:.0f}, {wa.origy + wa.nby * wa.dy:.0f}]")
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\Array_Theux_Pepinster\mnt.bin already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\Array_Theux_Pepinster\mnt.bin.txt already exists. Skipping download.
Downloaded to: C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\Array_Theux_Pepinster\mnt.bin
Exists?       True
Header exists? True
Shape: 480 x 1160, dx=5.0, dy=5.0
Bounds: x=[251000, 253400], y=[135500, 141300]

Downloading vectors and shapefiles

download_file auto-detects the file type from the extension and downloads all companion files. For shapefiles, that includes .dbf, .shx, .prj, .cpg, etc.

[3]:
# Download a WOLF vector file (.vec + .vec.extra)
vec_path = toys_dataset('Extract_part_array', 'extraction.vec')
print(f"Vector file:  {vec_path}")
print(f"Extra exists? {Path(str(vec_path) + '.extra').exists()}")

# Download a shapefile (all components)
shp_path = toys_dataset('PICC', 'PICC_Vesdre.shp')
parent = shp_path.parent
companions = list(parent.glob('PICC_Vesdre.*'))
print(f"\nShapefile components: {[p.suffix for p in companions]}")
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\Extract_part_array\extraction.vec already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\Extract_part_array\extraction.vec.extra already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\PICC\PICC_Vesdre.shp already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\PICC\PICC_Vesdre.dbf already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\PICC\PICC_Vesdre.shx already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\PICC\PICC_Vesdre.prj already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\PICC\PICC_Vesdre.cpg already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\PICC\PICC_Vesdre.sbn already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\PICC\PICC_Vesdre.sbx already exists. Skipping download.
Vector file:  C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\Extract_part_array\extraction.vec
Extra exists? True

Shapefile components: ['.cpg', '.dbf', '.prj', '.sbn', '.sbx', '.shp', '.shx']

Available example directories

The wolf_examples repository contains several directories:

Directory

Contents

Array_Theux_Pepinster

mnt.bin, mnt.tif — DEM Theux/Pepinster

Extract_part_array

extraction.vec, Array_vector.proj

SurfaceVolume

elevation.tif, waterdepth.tif, selection.vec

PICC

PICC_Vesdre.shp — cadastral polygons

Profiles

profiles.txt, support.vec — cross-section data

Triangulation_cross_sections

support_cs.vecz

GPU_for_dummies_data

bathy_topo.tif, manning.tif — GPU setup

LAZ

.laz LiDAR point clouds

DXF

example.dxf

Communes_Belgique

Belgian municipalities (shapefile)

Analysis_PICC_poly

Flood return period arrays

DownloadFiles — companion file handling

The DownloadFiles enum maps each file type to its set of companion extensions. This is why requesting a .shp also downloads .dbf, .shx, .prj, etc.

[4]:
# Show companion extensions for each file type
for df in DownloadFiles:
    print(f"{df.name:15s} -> {df.value}")
WOLFARRAYS      -> ('bin', 'bin.txt')
TIFARRAYS       -> ('tif',)
TIFFARRAYS      -> ('tiff',)
SHPFILES        -> ('shp', 'dbf', 'shx', 'prj', 'cpg', 'sbn', 'sbx')
GPKGFILES       -> ('gpkg',)
VECFILES        -> ('vec', 'vec.extra')
VECZFILES       -> ('vecz', 'vecz.extra')
PROJECTFILES    -> ('proj',)
NUMPYFILES      -> ('npy',)
NPZFILES        -> ('npz',)
JSONFILES       -> ('json',)
TXTFILES        -> ('txt',)
CSVFILES        -> ('csv',)
DXFFILES        -> ('dxf',)
ZIPFILES        -> ('zip',)
LAZFILES        -> ('laz',)
LISTFILES       -> ('lst',)
LAZBIN          -> ('bin',)
VRT             -> ('vrt',)

Downloading a GPU simulation

toys_gpu_dataset(dir) downloads a complete GPU simulation: input maps (.npy), parameters.json, and all result time steps (.npz files in simul_gpu_results/). It returns a ResultsStore.

[5]:
# Download a complete GPU simulation (this may take a moment)
store = toys_gpu_dataset('channel_w_archbridge_fully_man004')
print(f"Type: {type(store).__name__}")
print(f"Number of saved results: {store.nb_results}")
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\NAP.npy already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\bathymetry.npy already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\bridge_roof.npy already exists. Skipping download.
ERROR:root:HTTP error occurred while downloading https://gitlab.uliege.be/HECE/wolfgpu_examples/-/raw/main/channel_w_archbridge_fully_man004/bridge_deck.npy: 404 Client Error: Not Found for url: https://gitlab.uliege.be/HECE/wolfgpu_examples/-/raw/main/channel_w_archbridge_fully_man004/bridge_deck.npy
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\h.npy already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\manning.npy already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\qx.npy already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\qy.npy already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\parameters.json already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\metadata.json already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\nap.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\nb_results.txt already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\sim_times.csv already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\h_0000001.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qx_0000001.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qy_0000001.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\h_0000002.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qx_0000002.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qy_0000002.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\h_0000003.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qx_0000003.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qy_0000003.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\h_0000004.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qx_0000004.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qy_0000004.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\h_0000005.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qx_0000005.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qy_0000005.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\h_0000006.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qx_0000006.npz already exists. Skipping download.
INFO:root:File C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\channel_w_archbridge_fully_man004\simul_gpu_results\qy_0000006.npz already exists. Skipping download.
Type: ResultsStore
Number of saved results: 6

Generic download_file

download_file(url, destination, ...) is the low-level downloader. It supports HTTP/HTTPS/FTP and optional GitLab tokens for private repositories.

download_file(
    url='https://example.com/data/myfile.tif',
    destination='local_data/myfile.tif',
    load_from_cache=True,        # skip if exists
    token='glpat-xxxx',          # GitLab private token (optional)
)

Force re-download

Pass load_from_cache=False to bypass the local cache and re-download.

[6]:
# Force re-download
fpath2 = toys_dataset('Array_Theux_Pepinster', 'mnt.bin', load_from_cache=False)
print(f"Re-downloaded: {fpath2}")
Re-downloaded: C:\Users\pierre\Documents\Gitlab\HECEPython\wolfhece\data\downloads\Array_Theux_Pepinster\mnt.bin

Summary

Function

Signature

Returns

toys_dataset

(dir, file, load_from_cache=True)

Path

toys_gpu_dataset

(dir, dirweb=None, load_from_cache=True)

ResultsStore

toys_injector_gpu

(file, load_from_cache=True)

Path

toys_laz_grid

(dir, file, load_from_cache=True)

Path (directory)

toys_lidaxe

(dir, file, load_from_cache=True)

Path

download_file

(url, destination=None, load_from_cache=True, token=None)

Path

download_gpu_simulation

(url, destination, load_from_cache=True)

ResultsStore

See also:

[ ]: