{ "cells": [ { "cell_type": "markdown", "id": "904070f7", "metadata": {}, "source": [ "# toys_dataset — Downloading example data\n", "\n", "The `pydownloader` module provides helpers to fetch example files from the\n", "WOLFHECE GitLab repositories. Files are cached locally so subsequent calls\n", "are instant.\n", "\n", "| Function | Source repository | Purpose |\n", "|----------|-------------------|---------|\n", "| `toys_dataset(dir, file)` | `wolf_examples` | WolfArrays, vectors, shapefiles, … |\n", "| `toys_gpu_dataset(dir)` | `wolfgpu_examples` | Complete GPU simulations |\n", "| `toys_injector_gpu(file)` | `wolfgpu_injectors` | GPU injector files |\n", "| `toys_laz_grid(dir, file)` | `wolf_examples` | LiDAR grid tiles (list-based) |\n", "| `toys_lidaxe(dir, file)` | `lidaxe_data` | Lidaxe flow accumulation data |\n", "| `download_file(url, …)` | any URL | Generic HTTP/HTTPS/FTP downloader |" ] }, { "cell_type": "markdown", "id": "27442ace", "metadata": {}, "source": [ "## Imports and cache directory" ] }, { "cell_type": "code", "execution_count": 1, "id": "7b5ead37", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Cache directory: C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\n", "GitLab base URL: https://gitlab.uliege.be/HECE/wolf_examples/-/raw/main\n" ] } ], "source": [ "from wolfhece.pydownloader import (\n", " toys_dataset, toys_gpu_dataset, toys_injector_gpu,\n", " download_file, download_gpu_simulation,\n", " DATADIR, GITLAB_EXAMPLE, DownloadFiles,\n", ")\n", "from pathlib import Path\n", "\n", "print(f\"Cache directory: {DATADIR}\")\n", "print(f\"GitLab base URL: {GITLAB_EXAMPLE}\")" ] }, { "cell_type": "markdown", "id": "48820613", "metadata": {}, "source": [ "## Downloading a WolfArray\n", "\n", "`toys_dataset(dir, file)` builds the URL from `GITLAB_EXAMPLE/{dir}/{file}`,\n", "downloads it (plus companion files like `.bin.txt`), and returns the local\n", "`Path`. If the file already exists locally, the download is skipped." ] }, { "cell_type": "code", "execution_count": 2, "id": "9a48c3d8", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\Array_Theux_Pepinster\\mnt.bin already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\Array_Theux_Pepinster\\mnt.bin.txt already exists. Skipping download.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Downloaded to: C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\Array_Theux_Pepinster\\mnt.bin\n", "Exists? True\n", "Header exists? True\n", "Shape: 480 x 1160, dx=5.0, dy=5.0\n", "Bounds: x=[251000, 253400], y=[135500, 141300]\n" ] } ], "source": [ "# Download a DEM (Digital Elevation Model)\n", "fpath = toys_dataset('Array_Theux_Pepinster', 'mnt.bin')\n", "print(f\"Downloaded to: {fpath}\")\n", "print(f\"Exists? {fpath.exists()}\")\n", "\n", "# The companion header file is also downloaded\n", "header_path = Path(str(fpath) + '.txt')\n", "print(f\"Header exists? {header_path.exists()}\")\n", "\n", "# Load it as a WolfArray\n", "from wolfhece.wolf_array import WolfArray\n", "wa = WolfArray(fname=str(fpath))\n", "print(f\"Shape: {wa.nbx} x {wa.nby}, dx={wa.dx}, dy={wa.dy}\")\n", "print(f\"Bounds: x=[{wa.origx:.0f}, {wa.origx + wa.nbx * wa.dx:.0f}], \"\n", " f\"y=[{wa.origy:.0f}, {wa.origy + wa.nby * wa.dy:.0f}]\")" ] }, { "cell_type": "markdown", "id": "ecc9a8d9", "metadata": {}, "source": [ "## Downloading vectors and shapefiles\n", "\n", "`download_file` auto-detects the file type from the extension and downloads\n", "**all companion files**. For shapefiles, that includes `.dbf`, `.shx`,\n", "`.prj`, `.cpg`, etc." ] }, { "cell_type": "code", "execution_count": 3, "id": "0ceb62ea", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\Extract_part_array\\extraction.vec already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\Extract_part_array\\extraction.vec.extra already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\PICC\\PICC_Vesdre.shp already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\PICC\\PICC_Vesdre.dbf already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\PICC\\PICC_Vesdre.shx already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\PICC\\PICC_Vesdre.prj already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\PICC\\PICC_Vesdre.cpg already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\PICC\\PICC_Vesdre.sbn already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\PICC\\PICC_Vesdre.sbx already exists. Skipping download.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Vector file: C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\Extract_part_array\\extraction.vec\n", "Extra exists? True\n", "\n", "Shapefile components: ['.cpg', '.dbf', '.prj', '.sbn', '.sbx', '.shp', '.shx']\n" ] } ], "source": [ "# Download a WOLF vector file (.vec + .vec.extra)\n", "vec_path = toys_dataset('Extract_part_array', 'extraction.vec')\n", "print(f\"Vector file: {vec_path}\")\n", "print(f\"Extra exists? {Path(str(vec_path) + '.extra').exists()}\")\n", "\n", "# Download a shapefile (all components)\n", "shp_path = toys_dataset('PICC', 'PICC_Vesdre.shp')\n", "parent = shp_path.parent\n", "companions = list(parent.glob('PICC_Vesdre.*'))\n", "print(f\"\\nShapefile components: {[p.suffix for p in companions]}\")" ] }, { "cell_type": "markdown", "id": "655d9a83", "metadata": {}, "source": [ "## Available example directories\n", "\n", "The `wolf_examples` repository contains several directories:\n", "\n", "| Directory | Contents |\n", "|-----------|---------|\n", "| `Array_Theux_Pepinster` | `mnt.bin`, `mnt.tif` — DEM Theux/Pepinster |\n", "| `Extract_part_array` | `extraction.vec`, `Array_vector.proj` |\n", "| `SurfaceVolume` | `elevation.tif`, `waterdepth.tif`, `selection.vec` |\n", "| `PICC` | `PICC_Vesdre.shp` — cadastral polygons |\n", "| `Profiles` | `profiles.txt`, `support.vec` — cross-section data |\n", "| `Triangulation_cross_sections` | `support_cs.vecz` |\n", "| `GPU_for_dummies_data` | `bathy_topo.tif`, `manning.tif` — GPU setup |\n", "| `LAZ` | `.laz` LiDAR point clouds |\n", "| `DXF` | `example.dxf` |\n", "| `Communes_Belgique` | Belgian municipalities (shapefile) |\n", "| `Analysis_PICC_poly` | Flood return period arrays |" ] }, { "cell_type": "markdown", "id": "4a0d8090", "metadata": {}, "source": [ "## DownloadFiles — companion file handling\n", "\n", "The `DownloadFiles` enum maps each file type to its set of companion\n", "extensions. This is why requesting a `.shp` also downloads `.dbf`,\n", "`.shx`, `.prj`, etc." ] }, { "cell_type": "code", "execution_count": 4, "id": "de92e693", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WOLFARRAYS -> ('bin', 'bin.txt')\n", "TIFARRAYS -> ('tif',)\n", "TIFFARRAYS -> ('tiff',)\n", "SHPFILES -> ('shp', 'dbf', 'shx', 'prj', 'cpg', 'sbn', 'sbx')\n", "GPKGFILES -> ('gpkg',)\n", "VECFILES -> ('vec', 'vec.extra')\n", "VECZFILES -> ('vecz', 'vecz.extra')\n", "PROJECTFILES -> ('proj',)\n", "NUMPYFILES -> ('npy',)\n", "NPZFILES -> ('npz',)\n", "JSONFILES -> ('json',)\n", "TXTFILES -> ('txt',)\n", "CSVFILES -> ('csv',)\n", "DXFFILES -> ('dxf',)\n", "ZIPFILES -> ('zip',)\n", "LAZFILES -> ('laz',)\n", "LISTFILES -> ('lst',)\n", "LAZBIN -> ('bin',)\n", "VRT -> ('vrt',)\n" ] } ], "source": [ "# Show companion extensions for each file type\n", "for df in DownloadFiles:\n", " print(f\"{df.name:15s} -> {df.value}\")" ] }, { "cell_type": "markdown", "id": "0c6f497e", "metadata": {}, "source": [ "## Downloading a GPU simulation\n", "\n", "`toys_gpu_dataset(dir)` downloads a complete GPU simulation:\n", "input maps (`.npy`), `parameters.json`, and all result time steps\n", "(`.npz` files in `simul_gpu_results/`). It returns a `ResultsStore`." ] }, { "cell_type": "code", "execution_count": 5, "id": "dcce1d61", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\NAP.npy already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\bathymetry.npy already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\bridge_roof.npy already exists. Skipping download.\n", "ERROR:root:HTTP error occurred while downloading https://gitlab.uliege.be/HECE/wolfgpu_examples/-/raw/main/channel_w_archbridge_fully_man004/bridge_deck.npy: 404 Client Error: Not Found for url: https://gitlab.uliege.be/HECE/wolfgpu_examples/-/raw/main/channel_w_archbridge_fully_man004/bridge_deck.npy\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\h.npy already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\manning.npy already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\qx.npy already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\qy.npy already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\parameters.json already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\metadata.json already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\nap.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\nb_results.txt already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\sim_times.csv already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\h_0000001.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qx_0000001.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qy_0000001.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\h_0000002.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qx_0000002.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qy_0000002.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\h_0000003.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qx_0000003.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qy_0000003.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\h_0000004.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qx_0000004.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qy_0000004.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\h_0000005.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qx_0000005.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qy_0000005.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\h_0000006.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qx_0000006.npz already exists. Skipping download.\n", "INFO:root:File C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\channel_w_archbridge_fully_man004\\simul_gpu_results\\qy_0000006.npz already exists. Skipping download.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Type: ResultsStore\n", "Number of saved results: 6\n" ] } ], "source": [ "# Download a complete GPU simulation (this may take a moment)\n", "store = toys_gpu_dataset('channel_w_archbridge_fully_man004')\n", "print(f\"Type: {type(store).__name__}\")\n", "print(f\"Number of saved results: {store.nb_results}\")" ] }, { "cell_type": "markdown", "id": "fc5095af", "metadata": {}, "source": [ "## Generic download_file\n", "\n", "`download_file(url, destination, ...)` is the low-level downloader.\n", "It supports HTTP/HTTPS/FTP and optional GitLab tokens for private\n", "repositories.\n", "\n", "```python\n", "download_file(\n", " url='https://example.com/data/myfile.tif',\n", " destination='local_data/myfile.tif',\n", " load_from_cache=True, # skip if exists\n", " token='glpat-xxxx', # GitLab private token (optional)\n", ")\n", "```" ] }, { "cell_type": "markdown", "id": "a6a82083", "metadata": {}, "source": [ "## Force re-download\n", "\n", "Pass `load_from_cache=False` to bypass the local cache and re-download." ] }, { "cell_type": "code", "execution_count": 6, "id": "0d555201", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Re-downloaded: C:\\Users\\pierre\\Documents\\Gitlab\\HECEPython\\wolfhece\\data\\downloads\\Array_Theux_Pepinster\\mnt.bin\n" ] } ], "source": [ "# Force re-download\n", "fpath2 = toys_dataset('Array_Theux_Pepinster', 'mnt.bin', load_from_cache=False)\n", "print(f\"Re-downloaded: {fpath2}\")" ] }, { "cell_type": "markdown", "id": "6f3d4117", "metadata": {}, "source": [ "## Summary\n", "\n", "| Function | Signature | Returns |\n", "|----------|-----------|--------|\n", "| `toys_dataset` | `(dir, file, load_from_cache=True)` | `Path` |\n", "| `toys_gpu_dataset` | `(dir, dirweb=None, load_from_cache=True)` | `ResultsStore` |\n", "| `toys_injector_gpu` | `(file, load_from_cache=True)` | `Path` |\n", "| `toys_laz_grid` | `(dir, file, load_from_cache=True)` | `Path` (directory) |\n", "| `toys_lidaxe` | `(dir, file, load_from_cache=True)` | `Path` |\n", "| `download_file` | `(url, destination=None, load_from_cache=True, token=None)` | `Path` |\n", "| `download_gpu_simulation` | `(url, destination, load_from_cache=True)` | `ResultsStore` |\n", "\n", "See also:\n", "- [2D simulation results](wolfresults_2d.ipynb) — loading and analyzing results\n", "- [WolfArray tutorial](wolfarray.ipynb) — array operations" ] }, { "cell_type": "code", "execution_count": null, "id": "a53d9ad4", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "python311", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 5 }