Hi, I hit an intermittent abort inside pthread_mutex_lock() during a research-oriented multi-threaded stress test I run as part of my concurrency testing work. The stress test links against HTTrack's own implementation of hts_mutexlock() (from src/htsthread.c). I was not able to get a clean, deterministic black-box reproducer in the full webhttrack app, so I kept the report focused on a minimal reproducer and a source-level analysis pointing to hts_mutexlock() in src/htsthread.c.
In short: hts_mutexlock() does lazy initialization when *mutex == HTSMUTEX_INIT (NULL), but that initialization is not synchronized. If two threads try to lock the same htsmutex for the first time at the same time, the init path can race, leading to undefined behavior (on my system: an abort inside glibc).
Environment
- OS: Linux (glibc/pthreads)
- Compiler:
gcc
- Repro: minimal PoC (re-implements HTTrack's
hts_mutexlock/hts_mutexinit logic)
Observed behavior
Under concurrency, I can trigger an abort like:
pthread_mutex_lock.c: Assertion `mutex->__data.__owner == 0' failed.
This looks like the underlying pthread_mutex_t state is getting corrupted or the program is not consistently locking the same mutex object across threads.
Minimal reproduction (PoC)
To avoid relying on a full project build, I wrote a tiny PoC that re-implements the core logic used by HTTrack:
if (*mutex == NULL) hts_mutexinit(mutex);
- then immediately
pthread_mutex_lock(&(*mutex)->handle);
The PoC starts many threads which all attempt to lock the same htsmutex for the first time (starting from NULL), to maximize the race window.
PoC code (inline)
/*
* Minimal reproducer for an unsynchronized lazy-init mutex wrapper.
*
* Build:
* gcc -O2 -pthread poc_mutex_lazy_init_race.c -o /tmp/httrack_mutex_race_poc
*/
#include <pthread.h>
#include <sched.h>
#include <stdlib.h>
typedef struct htsmutex_s { pthread_mutex_t handle; } htsmutex_s, *htsmutex;
#define HTSMUTEX_INIT NULL
static void hts_mutexinit(htsmutex *m) {
htsmutex_s *p = (htsmutex_s*)malloc(sizeof(*p));
if (!p) abort();
pthread_mutex_init(&p->handle, 0);
*m = p;
}
static void hts_mutexlock(htsmutex *m) {
if (*m == HTSMUTEX_INIT) hts_mutexinit(m); /* <-- unsynchronized lazy init */
pthread_mutex_lock(&(*m)->handle);
}
static void hts_mutexrelease(htsmutex *m) {
pthread_mutex_unlock(&(*m)->handle);
}
static htsmutex g_lock = HTSMUTEX_INIT;
static pthread_barrier_t g_bar;
static void *worker(void *arg) {
(void)arg;
pthread_barrier_wait(&g_bar);
for (int i = 0; i < 50000; i++) {
hts_mutexlock(&g_lock);
if ((i & 0xff) == 0) sched_yield();
hts_mutexrelease(&g_lock);
}
return NULL;
}
int main(void) {
const int n = 64;
pthread_t th[n];
pthread_barrier_init(&g_bar, NULL, (unsigned)n);
for (int i = 0; i < n; i++) pthread_create(&th[i], NULL, worker, NULL);
for (int i = 0; i < n; i++) pthread_join(th[i], NULL);
return 0;
}
Build
gcc -O2 -pthread poc_mutex_lazy_init_race.c -o /tmp/httrack_mutex_race_poc
Run
/tmp/httrack_mutex_race_poc
On my Linux machine, this is fairly reliable at triggering the pthread_mutex_lock assertion abort shown above.
Suspected root cause (source location)
hts_mutexlock() on Linux does:
HTSEXT_API void hts_mutexlock(htsmutex * mutex) {
assertf(mutex != NULL);
if (*mutex == HTSMUTEX_INIT) { /* must be initialized */
hts_mutexinit(mutex);
}
assertf(*mutex != NULL);
#ifndef _WIN32
pthread_mutex_lock(&(*mutex)->handle);
#endif
}
The key issue is:
- when
*mutex == NULL, it allocates/initializes a new htsmutex_s
- but this path has no synchronization (no global lock /
pthread_once / atomic CAS) to ensure the initialization is performed exactly once
So if two threads call hts_mutexlock(&m) concurrently when m == NULL, it can go like:
- T1 sees
m == NULL and starts init
- T2 also sees
m == NULL and starts init
- both allocate different
htsmutex_s objects and race to store to m
This makes mutual exclusion unreliable. In my case it manifests as a crash/assertion abort; in general it may also lead to two threads "locking" different underlying mutexes and entering a critical section concurrently.
Why this seems plausible in normal usage
The API itself explicitly supports *mutex == NULL by performing initialization inside hts_mutexlock(), so "locking without explicit init" seems to be part of the intended semantics.
webhttrack also uses this pattern (static mutex set to HTSMUTEX_INIT and then first-use init via hts_mutexlock()):
src/htsweb.c: static htsmutex pingMutex = HTSMUTEX_INIT;
- background thread
client_ping() locks pingMutex
- main thread locks it as well after each
accept()
- note: in current
client_ping() implementation it mostly sleeps while the parent process is alive; it starts calling getPingId() (and thus locking pingMutex) after it detects the parent is gone, so the overlap is more likely later in runtime (after parent death) rather than during early startup
So this can happen in normal operation (probability depends on scheduling/load). More generally, any htsmutex that relies on first-use initialization is affected if its first lock can occur concurrently.
Expected behavior
hts_mutexlock() should be safe under concurrent first use:
- the lazy-init path should be thread-safe (initialize exactly once)
- should not crash or break mutual exclusion
Questions
- Is
hts_mutexlock()'s lazy-init intended to be safe under multi-threading? If yes, then it looks like the init path needs synchronization.
- Is there a recommended way to initialize
htsmutex objects (e.g. require explicit init in a single-threaded phase)?
Hi, I hit an intermittent abort inside
pthread_mutex_lock()during a research-oriented multi-threaded stress test I run as part of my concurrency testing work. The stress test links against HTTrack's own implementation ofhts_mutexlock()(fromsrc/htsthread.c). I was not able to get a clean, deterministic black-box reproducer in the fullwebhttrackapp, so I kept the report focused on a minimal reproducer and a source-level analysis pointing tohts_mutexlock()insrc/htsthread.c.In short:
hts_mutexlock()does lazy initialization when*mutex == HTSMUTEX_INIT (NULL), but that initialization is not synchronized. If two threads try to lock the samehtsmutexfor the first time at the same time, the init path can race, leading to undefined behavior (on my system: an abort inside glibc).Environment
gcchts_mutexlock/hts_mutexinitlogic)Observed behavior
Under concurrency, I can trigger an abort like:
This looks like the underlying
pthread_mutex_tstate is getting corrupted or the program is not consistently locking the same mutex object across threads.Minimal reproduction (PoC)
To avoid relying on a full project build, I wrote a tiny PoC that re-implements the core logic used by HTTrack:
if (*mutex == NULL) hts_mutexinit(mutex);pthread_mutex_lock(&(*mutex)->handle);The PoC starts many threads which all attempt to lock the same
htsmutexfor the first time (starting fromNULL), to maximize the race window.PoC code (inline)
Build
Run
On my Linux machine, this is fairly reliable at triggering the
pthread_mutex_lockassertion abort shown above.Suspected root cause (source location)
hts_mutexlock()on Linux does:The key issue is:
*mutex == NULL, it allocates/initializes a newhtsmutex_spthread_once/ atomic CAS) to ensure the initialization is performed exactly onceSo if two threads call
hts_mutexlock(&m)concurrently whenm == NULL, it can go like:m == NULLand starts initm == NULLand starts inithtsmutex_sobjects and race to store tomThis makes mutual exclusion unreliable. In my case it manifests as a crash/assertion abort; in general it may also lead to two threads "locking" different underlying mutexes and entering a critical section concurrently.
Why this seems plausible in normal usage
The API itself explicitly supports
*mutex == NULLby performing initialization insidehts_mutexlock(), so "locking without explicit init" seems to be part of the intended semantics.webhttrackalso uses this pattern (static mutex set toHTSMUTEX_INITand then first-use init viahts_mutexlock()):src/htsweb.c:static htsmutex pingMutex = HTSMUTEX_INIT;client_ping()lockspingMutexaccept()client_ping()implementation it mostly sleeps while the parent process is alive; it starts callinggetPingId()(and thus lockingpingMutex) after it detects the parent is gone, so the overlap is more likely later in runtime (after parent death) rather than during early startupSo this can happen in normal operation (probability depends on scheduling/load). More generally, any
htsmutexthat relies on first-use initialization is affected if its first lock can occur concurrently.Expected behavior
hts_mutexlock()should be safe under concurrent first use:Questions
hts_mutexlock()'s lazy-init intended to be safe under multi-threading? If yes, then it looks like the init path needs synchronization.htsmutexobjects (e.g. require explicit init in a single-threaded phase)?