Native Crash — Re-entrant Import Under the Python Import Lock
This document describes a reproducible native crash at wolf.py startup, its root cause, and a diagnostic procedure that can be reused for any similar crash.
—
Symptom
wolf.py terminates abruptly (non-zero exit code, no visible Python exception)
during the call to pd.DataFrame() in kiwis.py
(hydrometry.get_requests()).
No Python
Exceptionis raised (sotry / exceptblocks catch nothing).faulthandlerproduces no traceback (the crash occurs inside a Windows DLL before the SIGSEGV signal reaches Python).The Windows Event Viewer reports an ACCESS_VIOLATION 0xc0000005 in
arrow.dll.
—
Root Cause
pandas 3.x enables infer_string=True by default: every string column in a
pd.DataFrame is backed by ArrowDtype (Arrow C++ backend).
On the first call to pd.DataFrame(), pandas lazy-imports
pyarrow.pandas_compat (it is needed for the first time at that point).
If that call occurs while the wolfhece import tree has not yet finished building — typically along the chain:
wolf.py
→ wolfhece.apps.splashscreen
→ wolfhece (__init__ in progress)
→ kiwis_wolfgui
→ kiwis.__init__ → get_requests() → pd.DataFrame()
— Python attempts a re-entrant import of pyarrow.pandas_compat while
the import lock is already held by the parent module.
Arrow’s C extensions (arrow.dll, arrow_python.dll, …) are then loaded
in a partially initialised state → ACCESS_VIOLATION crash.
Note on pyarrow DLL isolation (delvewheel)
pyarrow ships its own renamed MSVC runtimes (msvcp140-<hash>.dll) in a
pyarrow.libs/ subdirectory and registers them via os.add_dll_directory()
inside pyarrow/__init__.py.
If a directory on PATH contains a different version of msvcp140.dll
(e.g. wolf_libs/), that is not the cause of this crash:
os.add_dll_directory() takes priority over PATH, and System32
takes priority over PATH for vcruntime140.dll.
The crash originates from the import-lock re-entrance, not from a DLL version
conflict.
—
Fix Applied
Pre-import pyarrow and pyarrow.pandas_compat at the top of
wolfhece/__init__.py, before any wolfhece sub-module is instantiated:
# wolfhece/__init__.py
from . import _add_path # add wolf_libs, wolfhece/libs to PATH
from .libs import * # load MKL / Fortran DLLs
# Pre-import Arrow BEFORE any sub-module that may call pd.DataFrame()
import pyarrow # pyarrow/__init__.py → os.add_dll_directory(pyarrow.libs)
import pyarrow.pandas_compat # prevents lazy-loading under the import lock
With this in place, pyarrow.pandas_compat is already in sys.modules
when get_requests() calls pd.DataFrame(): pandas triggers no further
import at that point.
—
Preventing Re-entrant Import Crashes in General
There is no Python built-in mechanism to “force the end of the import tree” at an arbitrary point. Python’s import system uses per-module locks (since 3.3) and deliberately returns a partially-initialised module object when a re-entrant import is detected — this avoids a deadlock, but leaves C extensions in an undefined state when their DLL-load constructors reference not-yet-initialised globals.
The three available strategies, from least to most invasive:
Strategy 1 — Pre-import at package top level (applied here)
Import every “heavy” dependency (C extensions, DLL-backed modules) at the very
top of the package __init__.py, before any sub-module that might trigger
them lazily is loaded.
Pros: minimal code change, zero impact on the API.
Cons: the package __init__ must be kept in sync whenever a new lazy
dependency is introduced deeper in the tree.
# wolfhece/__init__.py — keep these at the top, before any sub-module import
import pyarrow
import pyarrow.pandas_compat
Strategy 2 — Lazy object initialisation (deferred __init__ work)
Move any code that calls C extensions, makes network requests, or creates
DataFrames out of __init__ methods and into an explicit setup method
or a @property that runs on first access.
Pros: the only truly robust solution; the object only does work when the
caller is ready.
Cons: requires refactoring the class API; callers must either call
setup() explicitly or accept a lazy-init pattern.
# Instead of calling get_requests() in __init__:
class kiwis:
def __init__(self, ...):
self._requests = None # nothing loaded yet
@property
def requests(self):
if self._requests is None:
self._requests = self._fetch_requests()
return self._requests
def _fetch_requests(self):
...
return pd.DataFrame(json_data[0]['Requests'])
Strategy 3 — Isolate the event loop (wx.CallAfter)
If the heavy initialisation cannot be moved out of __init__, schedule it
to run after the wx event loop has started and all imports are complete:
# In the wx frame or application class:
wx.CallAfter(self._deferred_init)
def _deferred_init(self):
self.hydrometry = hydrometry(dir=self.dir) # safe: all imports done
Pros: guarantees that the entire Python import tree has settled before any
heavy work starts.
Cons: the object is unusable until the event loop runs; requires careful
handling of code that accesses the object before _deferred_init fires.
Summary table
Strategy |
Code change |
Robustness |
When to use |
|---|---|---|---|
Pre-import at package top |
Minimal |
Medium |
Known dependency, one-off fix |
Lazy |
Medium |
High |
Any class with heavy |
|
Medium |
High |
GUI objects only, event loop available |
—
Detection Method
When facing a native crash at startup, proceed in this order:
1. Check the Windows Event Viewer
Open PowerShell and run:
Get-WinEvent -LogName Application -MaxEvents 50 |
Where-Object { $_.Id -eq 1000 } |
Select-Object TimeCreated, Message |
Format-List
Look for recent Application Error entries.
The AppPath field shows which Python crashed (system or venv) — do not
confuse it with the Python process used by Pylance / VS Code.
2. Enable faulthandler
import faulthandler, tempfile, os
_log = open(os.path.join(tempfile.gettempdir(), 'wolf_crash.log'), 'w')
faulthandler.enable(file=_log)
Warning
faulthandler does not capture STATUS_HEAP_CORRUPTION or certain
ACCESS_VIOLATIONs that occur inside native DLLs before Python is notified.
An empty log does not rule out a native crash.
3. Trace imports at the crash site
Temporarily insert into the suspected module (here kiwis.py):
import sys
# Just before pd.DataFrame():
print("=== sys.modules pyarrow before DataFrame ===")
for k in sorted(sys.modules):
if 'arrow' in k or 'pandas' in k:
print(' ', k)
print("============================================")
If pyarrow.pandas_compat is not in sys.modules at that point, the
lazy import will fire here — proof that the pre-import is missing.
4. Locate the position in the import tree
Add an import hook to detect re-entrances:
import builtins, traceback
_real_import = builtins.__import__
_in_import = set()
def _tracing_import(name, *args, **kwargs):
if name in _in_import:
print(f"[REENTRANT] {name}")
traceback.print_stack(limit=8)
_in_import.add(name)
try:
return _real_import(name, *args, **kwargs)
finally:
_in_import.discard(name)
builtins.__import__ = _tracing_import
Place this block before the wolfhece import.
Any [REENTRANT] line accompanied by a stack trace pointing to a C extension
(arrow, numpy, scipy, …) is the likely cause of the crash.
5. Rule out VCRUNTIME / DLL version conflicts
Check the MSVC DLL versions on the search path:
$dirs = ($env:PATH -split ';') + 'C:\Windows\System32'
foreach ($d in $dirs) {
$f = Join-Path $d 'vcruntime140.dll'
if (Test-Path $f) {
$v = (Get-Item $f).VersionInfo.FileVersion
Write-Host "$v $f"
}
}
A DLL version conflict typically causes a crash before any Python import (at C extension load time), not a re-entrant one.
—
References
wolfhece.__init__— active pyarrow pre-import (fix applied here)wolfhece/hydrometry/kiwis.py—get_requests()callspd.DataFrame()wolfhece/hydrometry/kiwis_wolfgui.py— instantiateskiwisin__init__