diff --git a/docs/cli/index.rst b/docs/cli/index.rst index fd0b970c9..d051c9dd5 100644 --- a/docs/cli/index.rst +++ b/docs/cli/index.rst @@ -45,6 +45,7 @@ to learn more about the configuration. :hidden: configuration + parallel-processing Extending ========= diff --git a/docs/cli/parallel-processing.rst b/docs/cli/parallel-processing.rst new file mode 100644 index 000000000..3c9f8d3a2 --- /dev/null +++ b/docs/cli/parallel-processing.rst @@ -0,0 +1,92 @@ +.. include:: /include.rst.txt + +=================== +Parallel Processing +=================== + +The guides tool can render your documentation using multiple CPU cores, +significantly reducing build times for large documentation projects. + +.. note:: + + Parallel processing requires the ``pcntl`` PHP extension, which is only + available on Linux and macOS. On Windows, the tool falls back to + sequential processing automatically. + +Automatic Detection +=================== + +By default, the guides tool automatically: + +1. **Detects** available CPU cores on your system +2. **Enables** parallel processing when beneficial +3. **Falls back** to sequential when parallel isn't available + +No configuration is required—parallel processing works out of the box. + +When Parallel Processing Is Used +================================ + +The tool uses parallel processing when: + +- The ``pcntl`` PHP extension is available +- Your documentation has 10 or more files +- Multiple CPU cores are detected + +For small documentation sets (< 10 files), sequential processing is used +as the forking overhead isn't worth it. + +Requirements +============ + +- **PHP Extension**: ``pcntl`` (included in most Linux/macOS PHP builds) +- **Operating System**: Linux or macOS (Windows not supported) +- **PHP Version**: 8.1 or higher + +To check if pcntl is available: + +.. code-block:: bash + + php -m | grep pcntl + +Performance Benefits +==================== + +The performance gain depends on: + +- **Number of CPU cores**: More cores = more parallel workers +- **Documentation size**: Larger projects benefit more +- **I/O speed**: SSD storage helps maximize throughput + +Typical speedups: + +- 4-core system: ~2-3x faster +- 8-core system: ~4-6x faster +- 16-core system: ~6-10x faster + +Troubleshooting +=============== + +If parallel processing isn't working: + +1. **Check pcntl extension**: + + .. code-block:: bash + + php -m | grep pcntl + + If not listed, install it or enable it in your php.ini. + +2. **Check file count**: With fewer than 10 files, sequential is used. + +3. **Check logs**: Enable verbose output to see processing mode: + + .. code-block:: bash + + ./vendor/bin/guides docs -v + +For Developers +============== + +For implementation details and integration into custom applications, +see :doc:`/developers/parallel-processing`. diff --git a/docs/developers/index.rst b/docs/developers/index.rst index c51818060..c756ec7de 100644 --- a/docs/developers/index.rst +++ b/docs/developers/index.rst @@ -14,3 +14,4 @@ it in some other way that is not possible with the ``guides`` command line tool. extensions/index compiler directive + parallel-processing diff --git a/docs/developers/parallel-processing.rst b/docs/developers/parallel-processing.rst new file mode 100644 index 000000000..dec5c203c --- /dev/null +++ b/docs/developers/parallel-processing.rst @@ -0,0 +1,298 @@ +=================== +Parallel Processing +=================== + +The guides library provides infrastructure for parallel processing using PHP's +``pcntl_fork()``. This enables applications to utilize multiple CPU cores for +parsing, compiling, and rendering documentation. + +.. note:: + + Parallel processing requires the ``pcntl`` PHP extension, which is only + available on Linux and macOS. Windows users should use sequential processing. + + For user-facing documentation, see :doc:`/cli/parallel-processing`. + +Architecture Overview +===================== + +Parallel processing support consists of several layers: + +**Core Utilities** (``Build\Parallel``): + +- ``CpuDetector``: Cross-platform CPU core detection +- ``ProcessManager``: Forked process management with timeout +- ``ParallelSettings``: Configuration for worker counts + +**Compiler** (``Compiler\Parallel``): + +- ``ParallelCompiler``: Phase-based parallel compilation +- ``DocumentCompilationResult``: Serializable compilation results +- ``CompilationCacheInterface``: Cache state for parallel compilation + +**Renderer** (``Renderer\Parallel``): + +- ``ForkingRenderer``: Parallel Twig rendering with COW memory +- ``DocumentNavigationProvider``: Pre-computed prev/next navigation +- ``StaticDocumentIterator``: Thread-safe document iteration +- ``DirtyDocumentProvider``: Interface for incremental rendering + +Core Utilities +============== + +CPU Detection +------------- + +The ``CpuDetector`` class provides cross-platform detection of available CPU +cores: + +.. code-block:: php + + use phpDocumentor\Guides\Build\Parallel\CpuDetector; + + // Detect cores with default settings (max 8, default 4) + $workerCount = CpuDetector::detectCores(); + + // Customize limits + $workerCount = CpuDetector::detectCores( + maxWorkers: 16, // Allow up to 16 workers + defaultWorkers: 2, // Use 2 if detection fails + ); + +Detection methods (in order): + +1. **Linux**: Reads ``/proc/cpuinfo`` +2. **Linux/GNU**: Executes ``nproc`` +3. **macOS/BSD**: Executes ``sysctl -n hw.ncpu`` +4. **Fallback**: Returns the configured default + +Process Management +------------------ + +The ``ProcessManager`` class provides utilities for managing forked processes: + +.. code-block:: php + + use phpDocumentor\Guides\Build\Parallel\ProcessManager; + + // Fork workers + $childPids = []; + for ($i = 0; $i < $workerCount; $i++) { + $pid = pcntl_fork(); + if ($pid === 0) { + ProcessManager::clearTempFileTracking(); + processDocuments($i); + exit(0); + } + $childPids[$i] = $pid; + } + + // Wait with timeout + $result = ProcessManager::waitForChildrenWithTimeout( + $childPids, + timeoutSeconds: 300, + ); + +Key features: + +- Non-blocking wait with configurable timeout +- Automatic SIGKILL for stuck processes +- Secure temp file creation (0600 permissions) +- Signal handlers for cleanup on SIGTERM/SIGINT + +Parallel Settings +----------------- + +Configure parallel processing behavior: + +.. code-block:: php + + use phpDocumentor\Guides\Build\Parallel\ParallelSettings; + + $settings = new ParallelSettings(); + $settings->setWorkerCount(8); // Explicit count + $settings->setWorkerCount(0); // Auto-detect + $settings->setWorkerCount(-1); // Disable (sequential) + + // Get effective count + $workers = $settings->resolveWorkerCount( + CpuDetector::detectCores() + ); + +Parallel Compiler +================= + +The ``ParallelCompiler`` splits compilation into phases based on shared state +dependencies: + +**Phase 1 - Collection (parallel)**: priority ≥ 4900 + + - DocumentEntryRegistrationTransformer, CollectLinkTargetsTransformer + - Write to ProjectNode, don't read cross-document data + - Results serialized via ``DocumentCompilationResult`` + +**Phase 2 - Merge (sequential)**: O(n) + + - Merge all DocumentCompilationResults into ProjectNode + - Reconstruct toctree relationships from path-based data + +**Phase 3 - Resolution (parallel)**: priority 1000-4500 + + - Menu resolvers, citation resolvers + - Read from complete ProjectNode, write to documents + +**Phase 4 - Finalization (sequential)**: priority < 1000 + + - AutomaticMenuPass, GlobalMenuPass, ToctreeValidationPass + - Cross-document mutations + +Usage: + +.. code-block:: php + + use phpDocumentor\Guides\Compiler\Parallel\ParallelCompiler; + + $compiler = new ParallelCompiler( + $sequentialCompiler, // Fallback compiler + $compilerPasses, + $nodeTransformerFactory, + $compilationCache, // Optional, for incremental + $logger, // Optional + $workerCount, // Optional, null = auto-detect + ); + + $documents = $compiler->run($documents, $compilerContext); + +Document Compilation Result +--------------------------- + +The ``DocumentCompilationResult`` class captures all data written to +ProjectNode during compilation, surviving serialization across process +boundaries: + +.. code-block:: php + + // In child process: + $result = DocumentCompilationResult::extractFromProjectNode($projectNode); + + // Contains: + // - $documentEntries: All DocumentEntryNode objects + // - $internalLinkTargets: Link target mappings + // - $citationTargets: Citation references + // - $toctreeRelationships: Path-based parent/child relations + + // In parent process: + $result->mergeIntoProjectNode($parentProjectNode); + +Parallel Renderer +================= + +The ``ForkingRenderer`` implements ``TypeRenderer`` and parallelizes Twig +rendering: + +.. code-block:: php + + use phpDocumentor\Guides\Renderer\Parallel\ForkingRenderer; + + $renderer = new ForkingRenderer( + $commandBus, + $navigationProvider, + $dirtyDocumentProvider, // Optional, for incremental + $parallelSettings, // Optional + $logger, // Optional + ); + + $renderer->render($renderCommand); + +Key design decisions: + +1. **Fork after parsing**: AST is in memory, inherited via copy-on-write +2. **No write conflicts**: Each child renders to different output files +3. **Graceful fallback**: Sequential when pcntl unavailable or < 10 docs + +Document Navigation Provider +---------------------------- + +When forking, each child renders a subset of documents, but prev/next +navigation needs knowledge of the full document order: + +.. code-block:: php + + use phpDocumentor\Guides\Renderer\Parallel\DocumentNavigationProvider; + + $provider = new DocumentNavigationProvider(); + + // Initialize BEFORE forking + $provider->initializeFromArray($allDocuments); + + // After fork, in child process: + $previous = $provider->getPreviousDocument($currentPath); + $next = $provider->getNextDocument($currentPath); + +The provider's state is inherited via copy-on-write and provides O(1) lookups +for any document in any child process. + +Incremental Rendering Integration +--------------------------------- + +The ``DirtyDocumentProvider`` interface enables integration with incremental +build systems: + +.. code-block:: php + + use phpDocumentor\Guides\Renderer\Parallel\DirtyDocumentProvider; + + class MyIncrementalProvider implements DirtyDocumentProvider + { + public function isIncrementalEnabled(): bool + { + return true; + } + + public function computeDirtySet(): array + { + // Return paths of documents that need re-rendering + return ['chapter1', 'chapter2']; + } + } + +When provided to ``ForkingRenderer``, only dirty documents are rendered. + +Best Practices +============== + +1. **Limit worker count**: Use ``CpuDetector`` with appropriate ``maxWorkers`` + to avoid overloading the system + +2. **Handle failures gracefully**: Always check ``$result['failures']`` and + provide appropriate error handling + +3. **Clear temp tracking in children**: Call ``clearTempFileTracking()`` in + child processes to prevent cleanup conflicts + +4. **Use appropriate timeouts**: Set timeouts based on expected workload to + detect stuck processes + +5. **Consider fallback**: Provide sequential fallback for systems without + ``pcntl`` extension + +6. **Initialize navigation before forking**: The ``DocumentNavigationProvider`` + must be initialized with the full document order before ``pcntl_fork()`` + +7. **Serialize data, not objects**: Use path-based relationships rather than + object references across process boundaries + +Memory Considerations +===================== + +Parallel processing uses copy-on-write (COW) semantics: + +- **Before fork**: Parent has parsed AST in memory +- **After fork**: Children share memory until they write +- **During rendering**: Each child's writes trigger COW copies + +This means: + +- Read-only access to shared data is efficient +- Write-heavy operations should be minimized pre-fork +- The ``DocumentNavigationProvider`` should be populated before forking diff --git a/packages/guides/src/Build/Parallel/CpuDetector.php b/packages/guides/src/Build/Parallel/CpuDetector.php new file mode 100644 index 000000000..8c2f2edf7 --- /dev/null +++ b/packages/guides/src/Build/Parallel/CpuDetector.php @@ -0,0 +1,72 @@ + 0) { + return min($count, $maxWorkers); + } + } + } + + // Try nproc command (Linux/GNU) + $nproc = @shell_exec('nproc 2>/dev/null'); + if ($nproc !== null && $nproc !== false) { + $count = (int) trim($nproc); + if ($count > 0) { + return min($count, $maxWorkers); + } + } + + // Try sysctl on macOS/BSD + $sysctl = @shell_exec('sysctl -n hw.ncpu 2>/dev/null'); + if ($sysctl !== null && $sysctl !== false) { + $count = (int) trim($sysctl); + if ($count > 0) { + return min($count, $maxWorkers); + } + } + + return $defaultWorkers; + } +} diff --git a/packages/guides/src/Build/Parallel/ParallelSettings.php b/packages/guides/src/Build/Parallel/ParallelSettings.php new file mode 100644 index 000000000..0af269a4d --- /dev/null +++ b/packages/guides/src/Build/Parallel/ParallelSettings.php @@ -0,0 +1,95 @@ +workerCount = $count; + + // -1 means disabled + if ($count === -1) { + $this->enabled = false; + $this->workerCount = 1; + } else { + $this->enabled = true; + } + } + + public function getWorkerCount(): int + { + return $this->workerCount; + } + + public function isEnabled(): bool + { + return $this->enabled; + } + + /** + * Get effective worker count for forking. + * Returns 0 for auto-detection, or the explicit count. + */ + public function getEffectiveWorkerCount(): int + { + if (!$this->enabled) { + return 1; // Sequential + } + + return $this->workerCount; + } + + /** + * Resolve the actual number of workers to use. + * + * @param int $autoDetectedCount The auto-detected CPU count + * + * @return int The number of workers to use + */ + public function resolveWorkerCount(int $autoDetectedCount): int + { + if (!$this->enabled) { + return 1; + } + + if ($this->workerCount === 0) { + return $autoDetectedCount; // Auto-detect + } + + return min($this->workerCount, self::MAX_WORKERS); + } +} diff --git a/packages/guides/src/Build/Parallel/ProcessManager.php b/packages/guides/src/Build/Parallel/ProcessManager.php new file mode 100644 index 000000000..09bf381c2 --- /dev/null +++ b/packages/guides/src/Build/Parallel/ProcessManager.php @@ -0,0 +1,314 @@ + + */ + private static array $tempFilesToClean = []; + + /** + * Whether shutdown handler is registered. + * + * Static to ensure handlers are only registered once per process. + */ + private static bool $shutdownRegistered = false; + + /** + * Wait for all child processes with timeout. + * + * Uses non-blocking WNOHANG to poll process status, allowing timeout detection. + * Sends SIGKILL to stuck processes after timeout expires. + * + * @param array $childPids Map of workerId => pid + * @param int $timeoutSeconds Maximum time to wait (default 300s) + * + * @return array{successes: list, failures: array} + */ + public static function waitForChildrenWithTimeout( + array $childPids, + int $timeoutSeconds = self::DEFAULT_TIMEOUT_SECONDS, + ): array { + $startTime = time(); + $remaining = $childPids; + $successes = []; + $failures = []; + + while ($remaining !== []) { + foreach ($remaining as $workerId => $pid) { + $status = 0; + $result = pcntl_waitpid($pid, $status, WNOHANG); + + if ($result === 0) { + // Still running + continue; + } + + if ($result === -1) { + // Error - child doesn't exist + $failures[$workerId] = 'waitpid failed'; + unset($remaining[$workerId]); + continue; + } + + // Child exited + unset($remaining[$workerId]); + + assert(is_int($status)); + + if (pcntl_wifexited($status)) { + $exitCode = pcntl_wexitstatus($status); + if ($exitCode === 0) { + $successes[] = $workerId; + } else { + $failures[$workerId] = sprintf('exit code %d', $exitCode); + } + } elseif (pcntl_wifsignaled($status)) { + $signal = pcntl_wtermsig($status); + $failures[$workerId] = sprintf('killed by signal %d', $signal); + } + } + + // Check timeout + if (time() - $startTime > $timeoutSeconds) { + // Kill remaining children + foreach ($remaining as $workerId => $pid) { + self::killProcess($pid); + pcntl_waitpid($pid, $status); // Reap zombie + $failures[$workerId] = sprintf('killed after %ds timeout', $timeoutSeconds); + } + + break; + } + + // Don't spin-wait if processes still running + if ($remaining === []) { + continue; + } + + usleep(self::POLL_INTERVAL_USEC); + } + + return ['successes' => $successes, 'failures' => $failures]; + } + + /** + * Create a secure temp file with restricted permissions. + * + * Creates temp file with 0600 permissions to prevent other users from reading. + * Registers file for cleanup on shutdown/signal. + * + * @param string $prefix Temp file prefix + * + * @return string|false Path to temp file, or false on failure + */ + public static function createSecureTempFile(string $prefix): string|false + { + self::ensureShutdownHandler(); + + $tempFile = tempnam(sys_get_temp_dir(), $prefix); + if ($tempFile === false) { + return false; + } + + // Set restrictive permissions (owner read/write only) + chmod($tempFile, 0o600); + + // Register for cleanup + self::$tempFilesToClean[] = $tempFile; + + return $tempFile; + } + + /** + * Remove a temp file from cleanup list (already cleaned). + */ + public static function unregisterTempFile(string $tempFile): void + { + $key = array_search($tempFile, self::$tempFilesToClean, true); + if ($key === false) { + return; + } + + unset(self::$tempFilesToClean[$key]); + self::$tempFilesToClean = array_values(self::$tempFilesToClean); + } + + /** + * Clean up a temp file and unregister it. + */ + public static function cleanupTempFile(string $tempFile): void + { + @unlink($tempFile); + self::unregisterTempFile($tempFile); + } + + /** + * Clear temp file tracking list. + * + * Call this in child processes after fork to prevent them from cleaning up + * temp files that belong to the parent process when they exit. + */ + public static function clearTempFileTracking(): void + { + self::$tempFilesToClean = []; + } + + /** + * Ensure shutdown and signal handlers are registered. + */ + private static function ensureShutdownHandler(): void + { + if (self::$shutdownRegistered) { + return; + } + + // Skip handler registration in test environments + if (defined('PHPUNIT_COMPOSER_INSTALL') || defined('__PHPUNIT_PHAR__')) { + self::$shutdownRegistered = true; + + return; + } + + // Register shutdown function for normal termination + register_shutdown_function([self::class, 'cleanupAllTempFiles']); + + // Register signal handlers if pcntl available + if (function_exists('pcntl_signal')) { + pcntl_signal(SIGTERM, [self::class, 'handleSignal']); + pcntl_signal(SIGINT, [self::class, 'handleSignal']); + } + + self::$shutdownRegistered = true; + } + + /** + * Handle termination signals by cleaning up temp files and re-raising signal. + * + * Re-raises the signal after cleanup to ensure proper termination status + * is visible to parent processes and shell (WIFSIGNALED instead of WIFEXITED). + */ + public static function handleSignal(int $signal): void + { + self::cleanupAllTempFiles(); + + // Restore default handler and re-raise signal for proper termination + // This ensures the exit status correctly reflects the signal + if (function_exists('pcntl_signal')) { + pcntl_signal($signal, SIG_DFL); + } + + posix_kill(posix_getpid(), $signal); + } + + /** + * Kill a process by PID. + * + * Uses posix_kill if available, falls back to shell command otherwise. + * The PID is always an integer from pcntl_fork, so the shell fallback is safe. + */ + private static function killProcess(int $pid): void + { + if (function_exists('posix_kill')) { + posix_kill($pid, SIGKILL); + + return; + } + + // Fallback for systems without posix extension: use shell command + // Safe because $pid is always an integer from pcntl_fork + @exec(sprintf('kill -9 %d', $pid)); + } + + /** + * Clean up all registered temp files. + */ + public static function cleanupAllTempFiles(): void + { + foreach (self::$tempFilesToClean as $file) { + if (!file_exists($file)) { + continue; + } + + @unlink($file); + } + + self::$tempFilesToClean = []; + } +} diff --git a/packages/guides/src/Compiler/Parallel/CompilationCacheInterface.php b/packages/guides/src/Compiler/Parallel/CompilationCacheInterface.php new file mode 100644 index 000000000..a7ed049f2 --- /dev/null +++ b/packages/guides/src/Compiler/Parallel/CompilationCacheInterface.php @@ -0,0 +1,44 @@ + Serializable cache state + */ + public function extractState(): array; + + /** + * Merge cache state from a child process. + * + * @param array $state State extracted from child process + */ + public function mergeState(array $state): void; + + /** + * Get all document exports for logging/debugging. + * + * @return array + */ + public function getAllExports(): array; +} diff --git a/packages/guides/src/Compiler/Parallel/DocumentCompilationResult.php b/packages/guides/src/Compiler/Parallel/DocumentCompilationResult.php new file mode 100644 index 000000000..5185795be --- /dev/null +++ b/packages/guides/src/Compiler/Parallel/DocumentCompilationResult.php @@ -0,0 +1,192 @@ +> + */ + public array $internalLinkTargets = []; + + /** + * Citation targets collected from ProjectNode. + * + * @var array + */ + public array $citationTargets = []; + + /** + * Any warnings or errors collected during processing. + * + * @var list + */ + public array $messages = []; + + /** + * Toctree relationships as path-based data (serialization-safe). + * + * Stored as path strings instead of object references to survive serialization + * across process boundaries during parallel compilation. + * + * Structure: [documentPath => ['children' => array, 'parent' => string|null]] + * Children can be: + * - ['type' => 'document', 'path' => string] for DocumentEntryNode + * - ['type' => 'external', 'url' => string, 'title' => string] for ExternalEntryNode + * + * @var array, parent: string|null}> + */ + public array $toctreeRelationships = []; + + /** + * Extract all relevant data from a ProjectNode after running collection transformers. + * + * This is called in the child process after transformers have run, to capture + * all the data that was added to the child's copy of ProjectNode. + */ + public static function extractFromProjectNode(ProjectNode $projectNode): self + { + $result = new self(); + + // Extract document entries (keyed by file path) + $result->documentEntries = $projectNode->getAllDocumentEntries(); + + // Extract internal link targets + $result->internalLinkTargets = $projectNode->getAllInternalTargets(); + + // Extract citation targets + $result->citationTargets = $projectNode->getAllCitationTargets(); + + // Extract toctree relationships as path-based data (serialization-safe) + $result->toctreeRelationships = self::extractToctreeRelationships($result->documentEntries); + + return $result; + } + + /** + * Extract toctree parent/child relationships as path-based data. + * + * Object references don't survive serialization across process boundaries, + * so we convert them to path strings that can be resolved later. + * + * @param DocumentEntryNode[] $documentEntries + * + * @return array, parent: string|null}> + */ + private static function extractToctreeRelationships(array $documentEntries): array + { + $relationships = []; + + foreach ($documentEntries as $entry) { + $path = $entry->getFile(); + + // Extract children as path-based references + $children = []; + foreach ($entry->getMenuEntries() as $child) { + if ($child instanceof DocumentEntryNode) { + $children[] = [ + 'type' => 'document', + 'path' => $child->getFile(), + ]; + } elseif ($child instanceof ExternalEntryNode) { + $children[] = [ + 'type' => 'external', + 'url' => $child->getValue(), + 'title' => $child->getTitle(), + ]; + } + } + + // Extract parent as path reference + $parent = $entry->getParent(); + $parentPath = $parent instanceof DocumentEntryNode ? $parent->getFile() : null; + + $relationships[$path] = [ + 'children' => $children, + 'parent' => $parentPath, + ]; + } + + return $relationships; + } + + /** + * Merge this result into a ProjectNode. + * + * Called by the parent process to merge child results into the real ProjectNode. + */ + public function mergeIntoProjectNode(ProjectNode $projectNode): void + { + // Merge document entries + foreach ($this->documentEntries as $entry) { + $projectNode->addDocumentEntry($entry); + } + + // Merge internal link targets + foreach ($this->internalLinkTargets as $targets) { + foreach ($targets as $anchor => $target) { + try { + // Cast to string as PHP converts numeric string keys to int + $projectNode->addLinkTarget((string) $anchor, $target); + } catch (DuplicateLinkAnchorException) { + // Ignore duplicates - first writer wins + } + } + } + + // Merge citation targets + foreach ($this->citationTargets as $target) { + $projectNode->addCitationTarget($target); + } + } + + /** + * Add a message (warning/error) collected during processing. + */ + public function addMessage(string $level, string $message): void + { + $this->messages[] = ['level' => $level, 'message' => $message]; + } +} diff --git a/packages/guides/src/Compiler/Parallel/ParallelCompiler.php b/packages/guides/src/Compiler/Parallel/ParallelCompiler.php new file mode 100644 index 000000000..63ba553a8 --- /dev/null +++ b/packages/guides/src/Compiler/Parallel/ParallelCompiler.php @@ -0,0 +1,835 @@ += 4900 + * - DocumentEntryRegistrationTransformer, CollectLinkTargetsTransformer, etc. + * - These WRITE to ProjectNode but don't READ cross-document data + * - Each child collects metadata to DocumentCompilationResult + * + * Phase 2 - Merge (sequential): fast O(n) + * - Merge all DocumentCompilationResults into ProjectNode + * + * Phase 3 - Resolution (parallel): priority 4500-1000 + * - Menu resolvers, citation resolvers, etc. + * - These READ from ProjectNode (now complete) and WRITE to documents + * + * Phase 4 - Finalization (sequential): priority < 1000 + * - AutomaticMenuPass, GlobalMenuPass, ToctreeValidationPass + * - These do cross-document mutations + */ +final class ParallelCompiler +{ + /** Minimum document count before parallelization is worthwhile */ + private const MIN_DOCS_FOR_PARALLEL = 10; + + /** Priority threshold for collection phase */ + private const COLLECTION_PRIORITY_MIN = 4900; + + /** Priority threshold for resolution phase */ + private const RESOLUTION_PRIORITY_MIN = 1000; + private const RESOLUTION_PRIORITY_MAX = 4500; + + /** @var SplPriorityQueue */ + private readonly SplPriorityQueue $collectionPasses; + + /** @var SplPriorityQueue */ + private readonly SplPriorityQueue $resolutionPasses; + + /** @var SplPriorityQueue */ + private readonly SplPriorityQueue $finalizationPasses; + + private readonly int $workerCount; + private bool $parallelEnabled = true; + + /** @param iterable $passes */ + public function __construct( + private readonly Compiler $sequentialCompiler, + iterable $passes, + NodeTransformerFactory $nodeTransformerFactory, + private readonly CompilationCacheInterface|null $compilationCache = null, + private readonly LoggerInterface|null $logger = null, + int|null $workerCount = null, + ) { + $this->collectionPasses = new SplPriorityQueue(); + $this->resolutionPasses = new SplPriorityQueue(); + $this->finalizationPasses = new SplPriorityQueue(); + $this->workerCount = $workerCount ?? $this->detectCpuCount(); + + // Convert to array to allow multiple iterations + $passesArray = $passes instanceof Traversable ? iterator_to_array($passes) : (array) $passes; + + // Categorize compiler passes for parallel execution + foreach ($passesArray as $pass) { + $this->categorizePass($pass); + } + + // Categorize transformer passes + $transformerPriorities = $nodeTransformerFactory->getPriorities(); + foreach ($transformerPriorities as $priority) { + $pass = new TransformerPass( + new DocumentNodeTraverser($nodeTransformerFactory, $priority), + $priority, + ); + $this->categorizePass($pass); + } + } + + private function categorizePass(CompilerPass $pass): void + { + $priority = $pass->getPriority(); + + if ($priority >= self::COLLECTION_PRIORITY_MIN) { + $this->collectionPasses->insert($pass, $priority); + } elseif ($priority >= self::RESOLUTION_PRIORITY_MIN && $priority <= self::RESOLUTION_PRIORITY_MAX) { + $this->resolutionPasses->insert($pass, $priority); + } else { + $this->finalizationPasses->insert($pass, $priority); + } + } + + /** + * @param DocumentNode[] $documents + * + * @return DocumentNode[] + */ + public function run(array $documents, CompilerContext $compilerContext): array + { + $documentCount = count($documents); + + if (!$this->shouldFork($documentCount)) { + $this->logger?->debug(sprintf( + 'Using sequential compilation: %d documents (parallel=%s, pcntl=%s)', + $documentCount, + $this->parallelEnabled ? 'enabled' : 'disabled', + function_exists('pcntl_fork') ? 'available' : 'unavailable', + )); + + return $this->runSequentially($documents, $compilerContext); + } + + $this->logger?->info(sprintf( + 'Starting parallel compilation: %d documents across %d workers', + $documentCount, + $this->workerCount, + )); + + // Phase 1: Parallel Collection + $this->logger?->debug('Phase 1: Parallel collection'); + [$documents, $results] = $this->runCollectionPhase($documents, $compilerContext); + + // Phase 2: Sequential Merge (including toctree relationships) + $this->logger?->debug('Phase 2: Sequential merge'); + $mergedRelationships = $this->runMergePhase($results, $compilerContext); + + // Phase 2.5: Fix document entry references + // After serialization, documents have different DocumentEntryNode instances than ProjectNode. + // We must fix this BEFORE resolution phase so transformers work with correct entries. + $this->logger?->debug('Phase 2.5: Fixing document entry references'); + $documents = $this->fixDocumentEntryReferences($documents, $compilerContext); + + // Phase 2.6: Resolve toctree relationships + // Convert path-based relationships back to object references on ProjectNode's entries + $this->logger?->debug('Phase 2.6: Resolving toctree relationships'); + $this->resolveDocumentRelationships($mergedRelationships, $compilerContext); + + // Phase 3: Parallel Resolution + $this->logger?->debug('Phase 3: Parallel resolution'); + $documents = $this->runResolutionPhase($documents, $compilerContext); + + // Phase 3.5: Fix document entry references again + // Resolution phase serializes documents, breaking references. Fix them again. + $this->logger?->debug('Phase 3.5: Fixing document entry references post-resolution'); + $documents = $this->fixDocumentEntryReferences($documents, $compilerContext); + + // Phase 4: Sequential Finalization + $this->logger?->debug('Phase 4: Sequential finalization'); + $documents = $this->runFinalizationPhase($documents, $compilerContext); + + $this->logger?->info('Parallel compilation complete'); + + return $documents; + } + + /** + * Run all passes sequentially using the original Compiler. + * + * This ensures exact compatibility with the standard compilation behavior. + * + * @param DocumentNode[] $documents + * + * @return DocumentNode[] + */ + private function runSequentially(array $documents, CompilerContext $compilerContext): array + { + return $this->sequentialCompiler->run($documents, $compilerContext); + } + + /** + * Phase 1: Run collection transformers in parallel. + * + * @param DocumentNode[] $documents + * + * @return array{0: DocumentNode[], 1: DocumentCompilationResult[]} + */ + private function runCollectionPhase(array $documents, CompilerContext $compilerContext): array + { + // Partition documents into batches + $batches = $this->partitionDocuments($documents, $this->workerCount); + + $tempFiles = []; + $childPids = []; + + foreach ($batches as $workerId => $batch) { + if ($batch === []) { + continue; + } + + $tempFile = ProcessManager::createSecureTempFile('compile_collect_' . $workerId . '_'); + if ($tempFile === false) { + $this->logger?->error('Failed to create temp file, falling back to sequential'); + + return [$this->runSequentially($documents, $compilerContext), []]; + } + + $tempFiles[$workerId] = $tempFile; + + $pid = pcntl_fork(); + + if ($pid === -1) { + $this->logger?->error('pcntl_fork failed, falling back to sequential'); + foreach ($tempFiles as $tf) { + ProcessManager::cleanupTempFile($tf); + } + + return [$this->runSequentially($documents, $compilerContext), []]; + } + + if ($pid === 0) { + // Child process: clear inherited temp file tracking + ProcessManager::clearTempFileTracking(); + $this->processCollectionBatch($batch, $compilerContext, $tempFile); + exit(0); + } + + $childPids[$workerId] = $pid; + } + + // Wait for children with timeout and collect results + $waitResult = ProcessManager::waitForChildrenWithTimeout($childPids); + $allDocuments = []; + $allResults = []; + + foreach ($childPids as $workerId => $pid) { + // Only read results from successful workers + if (in_array($workerId, $waitResult['successes'], true)) { + $serialized = file_get_contents($tempFiles[$workerId]); + if ($serialized !== false && $serialized !== '') { + $data = unserialize($serialized); + if (is_array($data) && isset($data['documents'], $data['result'])) { + /** @var array $batchDocuments */ + $batchDocuments = $data['documents']; + foreach ($batchDocuments as $doc) { + if (!($doc instanceof DocumentNode)) { + continue; + } + + $allDocuments[$doc->getFilePath()] = $doc; + } + + if ($data['result'] instanceof DocumentCompilationResult) { + $allResults[] = $data['result']; + } + } + } + } + + ProcessManager::cleanupTempFile($tempFiles[$workerId]); + } + + // Fail fast on worker failures to prevent incomplete ProjectNode + if ($waitResult['failures'] !== []) { + $errorDetails = []; + foreach ($waitResult['failures'] as $workerId => $reason) { + $errorDetails[] = sprintf('Worker %d: %s', $workerId, $reason); + $this->logger?->error(sprintf('Collection worker %d failed: %s', $workerId, $reason)); + } + + throw new RuntimeException( + 'Parallel collection failed: ' . implode(', ', $errorDetails), + ); + } + + // Preserve document order + $orderedDocuments = []; + foreach ($documents as $doc) { + if (!isset($allDocuments[$doc->getFilePath()])) { + continue; + } + + $orderedDocuments[] = $allDocuments[$doc->getFilePath()]; + } + + return [$orderedDocuments, $allResults]; + } + + /** + * Process a batch of documents in child process for collection phase. + * + * @param DocumentNode[] $batch + */ + private function processCollectionBatch( + array $batch, + CompilerContext $compilerContext, + string $tempFile, + ): void { + // Run collection passes on this batch + // These transformers will write to the child's copy of ProjectNode + $passes = clone $this->collectionPasses; + foreach ($passes as $pass) { + $batch = $pass->run($batch, $compilerContext); + } + + // Extract all data that was added to ProjectNode during collection + // This captures document entries, link targets, citations, etc. + $result = DocumentCompilationResult::extractFromProjectNode( + $compilerContext->getProjectNode(), + ); + + // Serialize documents and extracted result + $serialized = serialize([ + 'documents' => $batch, + 'result' => $result, + ]); + + if (file_put_contents($tempFile, $serialized) === false) { + fwrite(STDERR, "Failed to write collection results to temp file\n"); + exit(1); + } + } + + /** + * Phase 2: Merge all collected data into ProjectNode. + * + * This is O(n) where n = total entries across all results. + * Uses hash-based deduplication for O(1) duplicate checks. + * + * @param DocumentCompilationResult[] $results + * + * @return array, parent: string|null}> + */ + private function runMergePhase(array $results, CompilerContext $compilerContext): array + { + $projectNode = $compilerContext->getProjectNode(); + + // Merge toctree relationships from all batches + // Use separate tracking for seen children per path for O(1) deduplication + /** @var array, parent: string|null}> $allRelationships */ + $allRelationships = []; + /** @var array> $seenChildren path -> [childKey => true] */ + $seenChildren = []; + + foreach ($results as $result) { + $result->mergeIntoProjectNode($projectNode); + + // Merge toctree relationships + foreach ($result->toctreeRelationships as $path => $relations) { + if (!isset($allRelationships[$path])) { + $allRelationships[$path] = ['children' => [], 'parent' => null]; + $seenChildren[$path] = []; + } + + // Merge children using hash-based deduplication (O(1) per child) + foreach ($relations['children'] as $child) { + // Generate unique key for this child + $childKey = $this->getChildKey($child); + + // O(1) duplicate check using isset + if (isset($seenChildren[$path][$childKey])) { + continue; + } + + $seenChildren[$path][$childKey] = true; + $allRelationships[$path]['children'][] = $child; + } + + // Take non-null parent (should be consistent across batches) + if ($relations['parent'] === null) { + continue; + } + + $allRelationships[$path]['parent'] = $relations['parent']; + } + } + + $this->logger?->debug(sprintf( + 'Merged %d results: %d document entries, %d link target types, %d toctree relationships', + count($results), + count($projectNode->getAllDocumentEntries()), + count($projectNode->getAllInternalTargets()), + count($allRelationships), + )); + + return $allRelationships; + } + + /** + * Generate a unique key for a toctree child entry. + * + * @param array{type: string, path?: string, url?: string, title?: string} $child + */ + private function getChildKey(array $child): string + { + return $child['type'] . ':' . ($child['path'] ?? $child['url'] ?? ''); + } + + /** + * Resolve path-based toctree relationships to actual object references. + * + * During parallel compilation, relationships are stored as path strings to survive + * serialization. This method reconstructs the object graph by looking up paths + * in ProjectNode's document entries. + * + * @param array, parent: string|null}> $relationships + */ + private function resolveDocumentRelationships( + array $relationships, + CompilerContext $compilerContext, + ): void { + $projectNode = $compilerContext->getProjectNode(); + $allEntries = $projectNode->getAllDocumentEntries(); + + // Build path => DocumentEntryNode lookup for O(1) resolution + $entriesByPath = []; + foreach ($allEntries as $entry) { + $entriesByPath[$entry->getFile()] = $entry; + } + + $resolvedCount = 0; + $externalCount = 0; + + // Resolve relationships for each document entry + foreach ($relationships as $path => $relations) { + $entry = $entriesByPath[$path] ?? null; + if ($entry === null) { + continue; + } + + // Clear existing children (they have broken object refs from serialization) + $entry->setMenuEntries([]); + + // Resolve and add children + foreach ($relations['children'] as $childData) { + if ($childData['type'] === 'document') { + $childPath = $childData['path'] ?? ''; + $childEntry = $entriesByPath[$childPath] ?? null; + if ($childEntry !== null) { + $entry->addChild($childEntry); + $resolvedCount++; + } + } elseif ($childData['type'] === 'external') { + // Reconstruct ExternalEntryNode + $url = $childData['url'] ?? ''; + $title = $childData['title'] ?? ''; + $externalEntry = new ExternalEntryNode($url, $title); + $entry->addChild($externalEntry); + $externalCount++; + } + } + + // Resolve and set parent + if ($relations['parent'] === null) { + continue; + } + + $parentEntry = $entriesByPath[$relations['parent']] ?? null; + $entry->setParent($parentEntry); + } + + $this->logger?->debug(sprintf( + 'Resolved %d document relationships, %d external entries', + $resolvedCount, + $externalCount, + )); + } + + /** + * Phase 3: Run resolution transformers in parallel. + * + * @param DocumentNode[] $documents + * + * @return DocumentNode[] + */ + private function runResolutionPhase(array $documents, CompilerContext $compilerContext): array + { + if (count(clone $this->resolutionPasses) === 0) { + return $documents; + } + + $batches = $this->partitionDocuments($documents, $this->workerCount); + $tempFiles = []; + $childPids = []; + + foreach ($batches as $workerId => $batch) { + if ($batch === []) { + continue; + } + + $tempFile = ProcessManager::createSecureTempFile('compile_resolve_' . $workerId . '_'); + if ($tempFile === false) { + return $this->runResolutionSequentially($documents, $compilerContext); + } + + $tempFiles[$workerId] = $tempFile; + + $pid = pcntl_fork(); + + if ($pid === -1) { + foreach ($tempFiles as $tf) { + ProcessManager::cleanupTempFile($tf); + } + + return $this->runResolutionSequentially($documents, $compilerContext); + } + + if ($pid === 0) { + // Child process: clear inherited temp file tracking + ProcessManager::clearTempFileTracking(); + $this->processResolutionBatch($batch, $compilerContext, $tempFile); + exit(0); + } + + $childPids[$workerId] = $pid; + } + + // Wait for children with timeout and collect results + $waitResult = ProcessManager::waitForChildrenWithTimeout($childPids); + $allDocuments = []; + $cacheStates = []; + + foreach ($childPids as $workerId => $pid) { + // Only read results from successful workers + if (in_array($workerId, $waitResult['successes'], true)) { + $serialized = file_get_contents($tempFiles[$workerId]); + if ($serialized !== false && $serialized !== '') { + $data = unserialize($serialized); + if (is_array($data)) { + // New format with cache state + if (isset($data['documents']) && is_array($data['documents'])) { + foreach ($data['documents'] as $doc) { + if (!($doc instanceof DocumentNode)) { + continue; + } + + $allDocuments[$doc->getFilePath()] = $doc; + } + + // Collect cache state for merging + if (isset($data['cacheState']) && is_array($data['cacheState'])) { + /** @var array{exports?: array>, dependencies?: array, outputPaths?: array} $cacheState */ + $cacheState = $data['cacheState']; + $cacheStates[] = $cacheState; + } + } else { + // Legacy format (just documents array) + foreach ($data as $doc) { + if (!($doc instanceof DocumentNode)) { + continue; + } + + $allDocuments[$doc->getFilePath()] = $doc; + } + } + } + } + } + + ProcessManager::cleanupTempFile($tempFiles[$workerId]); + } + + // Fail fast on worker failures to prevent incomplete compilation + if ($waitResult['failures'] !== []) { + $errorDetails = []; + foreach ($waitResult['failures'] as $workerId => $reason) { + $errorDetails[] = sprintf('Worker %d: %s', $workerId, $reason); + $this->logger?->error(sprintf('Resolution worker %d failed: %s', $workerId, $reason)); + } + + throw new RuntimeException( + 'Parallel resolution failed: ' . implode(', ', $errorDetails), + ); + } + + // Merge cache states from all children + if ($this->compilationCache !== null && $cacheStates !== []) { + foreach ($cacheStates as $state) { + $this->compilationCache->mergeState($state); + } + + $this->logger?->debug(sprintf( + 'Merged cache states from %d workers, now have %d exports', + count($cacheStates), + count($this->compilationCache->getAllExports()), + )); + } + + // Preserve order + $orderedDocuments = []; + foreach ($documents as $doc) { + if (!isset($allDocuments[$doc->getFilePath()])) { + continue; + } + + $orderedDocuments[] = $allDocuments[$doc->getFilePath()]; + } + + return $orderedDocuments; + } + + /** @param DocumentNode[] $batch */ + private function processResolutionBatch( + array $batch, + CompilerContext $compilerContext, + string $tempFile, + ): void { + $passes = clone $this->resolutionPasses; + foreach ($passes as $pass) { + $batch = $pass->run($batch, $compilerContext); + } + + // Serialize documents and cache state (if cache is available) + $data = [ + 'documents' => $batch, + 'cacheState' => $this->compilationCache?->extractState() ?? [], + ]; + + if (file_put_contents($tempFile, serialize($data)) === false) { + fwrite(STDERR, "Failed to write resolution results to temp file\n"); + exit(1); + } + } + + /** + * @param DocumentNode[] $documents + * + * @return DocumentNode[] + */ + private function runResolutionSequentially(array $documents, CompilerContext $compilerContext): array + { + $passes = clone $this->resolutionPasses; + foreach ($passes as $pass) { + $documents = $pass->run($documents, $compilerContext); + } + + return $documents; + } + + /** + * Phase 4: Run finalization passes sequentially. + * + * @param DocumentNode[] $documents + * + * @return DocumentNode[] + */ + private function runFinalizationPhase(array $documents, CompilerContext $compilerContext): array + { + $passes = clone $this->finalizationPasses; + foreach ($passes as $pass) { + $documents = $pass->run($documents, $compilerContext); + } + + return $documents; + } + + /** + * Fix document entry references after parallel processing. + * + * After serialization/unserialization in child processes, DocumentNode objects + * have different DocumentEntryNode instances than those stored in ProjectNode. + * The renderer uses identity comparison (===) to match documents with entries, + * so we need to restore object identity by setting the ProjectNode's entries + * on each document. + * + * Additionally, during resolution phase, transformers may have added children + * to the unserialized entries. We must transfer those children to ProjectNode's + * entries before replacing the references. + * + * @param DocumentNode[] $documents + * + * @return DocumentNode[] + */ + private function fixDocumentEntryReferences(array $documents, CompilerContext $compilerContext): array + { + $projectNode = $compilerContext->getProjectNode(); + $projectEntries = $projectNode->getAllDocumentEntries(); + + // Build a lookup map by file path + $entriesByPath = []; + foreach ($projectEntries as $entry) { + $entriesByPath[$entry->getFile()] = $entry; + } + + // Pre-build existing children sets for O(1) duplicate detection + /** @var array> $existingChildrenByEntry path -> [childKey => true] */ + $existingChildrenByEntry = []; + foreach ($projectEntries as $entry) { + $entryPath = $entry->getFile(); + $existingChildrenByEntry[$entryPath] = []; + foreach ($entry->getMenuEntries() as $child) { + if ($child instanceof DocumentEntryNode) { + $existingChildrenByEntry[$entryPath]['doc:' . $child->getFile()] = true; + } elseif ($child instanceof ExternalEntryNode) { + $existingChildrenByEntry[$entryPath]['ext:' . $child->getValue()] = true; + } + } + } + + // Update each document to use the ProjectNode's entry instance + foreach ($documents as $document) { + $filePath = $document->getFilePath(); + if (!isset($entriesByPath[$filePath])) { + continue; + } + + $projectEntry = $entriesByPath[$filePath]; + $documentEntry = $document->getDocumentEntry(); + + // Transfer children from unserialized entry to ProjectNode's entry + // (children may have been added during resolution phase) + if ($documentEntry !== null && $documentEntry !== $projectEntry) { + foreach ($documentEntry->getMenuEntries() as $child) { + // Resolve child to ProjectNode's entry if it's a document + if ($child instanceof DocumentEntryNode) { + $childPath = $child->getFile(); + $childKey = 'doc:' . $childPath; + $resolvedChild = $entriesByPath[$childPath] ?? null; + + // O(1) duplicate check using isset + if ($resolvedChild !== null && !isset($existingChildrenByEntry[$filePath][$childKey])) { + $projectEntry->addChild($resolvedChild); + $resolvedChild->setParent($projectEntry); + $existingChildrenByEntry[$filePath][$childKey] = true; + } + } elseif ($child instanceof ExternalEntryNode) { + $childKey = 'ext:' . $child->getValue(); + + // O(1) duplicate check using isset + if (!isset($existingChildrenByEntry[$filePath][$childKey])) { + $projectEntry->addChild($child); + $existingChildrenByEntry[$filePath][$childKey] = true; + } + } + } + + // Transfer parent if set and not already set on ProjectNode's entry + $parent = $documentEntry->getParent(); + if ($parent !== null && $projectEntry->getParent() === null) { + $parentPath = $parent->getFile(); + $resolvedParent = $entriesByPath[$parentPath] ?? null; + if ($resolvedParent !== null) { + $projectEntry->setParent($resolvedParent); + } + } + } + + $document->setDocumentEntry($projectEntry); + } + + return $documents; + } + + /** + * @param DocumentNode[] $documents + * + * @return array + */ + private function partitionDocuments(array $documents, int $workerCount): array + { + $batchSize = (int) ceil(count($documents) / $workerCount); + + return array_chunk($documents, max(1, $batchSize)); + } + + private function shouldFork(int $documentCount): bool + { + if (!$this->parallelEnabled) { + return false; + } + + if (!function_exists('pcntl_fork')) { + return false; + } + + if ($documentCount < self::MIN_DOCS_FOR_PARALLEL) { + return false; + } + + return $this->workerCount >= 2; + } + + private function detectCpuCount(): int + { + return CpuDetector::detectCores(); + } + + public function setParallelEnabled(bool $enabled): void + { + $this->parallelEnabled = $enabled; + } + + public function isParallelEnabled(): bool + { + return $this->parallelEnabled; + } +} diff --git a/packages/guides/src/Nodes/ProjectNode.php b/packages/guides/src/Nodes/ProjectNode.php index 2afce43e2..e0c590e0c 100644 --- a/packages/guides/src/Nodes/ProjectNode.php +++ b/packages/guides/src/Nodes/ProjectNode.php @@ -148,6 +148,12 @@ public function getCitationTarget(string $name): CitationTarget|null return $this->citationTargets[$name] ?? null; } + /** @return array */ + public function getAllCitationTargets(): array + { + return $this->citationTargets; + } + /** @throws DuplicateLinkAnchorException */ public function addLinkTarget(string $anchorName, InternalTarget $target): void { diff --git a/packages/guides/src/Pipeline/SingleForkPipeline.php b/packages/guides/src/Pipeline/SingleForkPipeline.php new file mode 100644 index 000000000..7c033ceb6 --- /dev/null +++ b/packages/guides/src/Pipeline/SingleForkPipeline.php @@ -0,0 +1,357 @@ + + * + * @see https://regex101.com/r/9IjvEa/1 + */ + private const NAV_PLACEHOLDER_REGEX = '//'; + + private int $workerCount; + + public function __construct( + private readonly LoggerInterface|null $logger = null, + int|null $workerCount = null, + ) { + $this->workerCount = $workerCount ?? $this->detectCpuCount(); + } + + /** + * Execute the full pipeline with optional parallelization. + * + * @param callable(string[]): array{documents: DocumentNode[], projectNode: ProjectNode} $pipelineExecutor + * Function that executes parse→compile→render for a batch of files + * @param string[] $allFiles All files to process + * @param string $outputDir Output directory for rendered HTML + * + * @return array{documents: DocumentNode[], projectNode: ProjectNode} + */ + public function execute( + callable $pipelineExecutor, + array $allFiles, + string $outputDir, + ): array { + // Check if parallel is worthwhile + if (!$this->shouldFork(count($allFiles))) { + $this->logger?->debug('Using sequential pipeline'); + + return $pipelineExecutor($allFiles); + } + + $this->logger?->info(sprintf( + 'Starting single-fork pipeline: %d files across %d workers', + count($allFiles), + $this->workerCount, + )); + + // Partition files into batches + $batchSize = (int) ceil(count($allFiles) / $this->workerCount); + $batches = array_chunk($allFiles, max(1, $batchSize)); + + // Create temp files for results + $tempFiles = []; + $childPids = []; + + foreach ($batches as $workerId => $batch) { + if ($batch === []) { + continue; + } + + $tempFile = ProcessManager::createSecureTempFile('pipeline_' . $workerId . '_'); + if ($tempFile === false) { + $this->logger?->error('Failed to create temp file, falling back to sequential'); + + return $pipelineExecutor($allFiles); + } + + $tempFiles[$workerId] = $tempFile; + + $pid = pcntl_fork(); + + if ($pid === -1) { + $this->logger?->error('pcntl_fork failed, falling back to sequential'); + foreach ($tempFiles as $tf) { + ProcessManager::cleanupTempFile($tf); + } + + return $pipelineExecutor($allFiles); + } + + if ($pid === 0) { + // Child: clear inherited temp file tracking + ProcessManager::clearTempFileTracking(); + try { + $result = $pipelineExecutor($batch); + // Only serialize document paths (not full AST) to save memory + $paths = array_map( + static fn (DocumentNode $doc) => $doc->getFilePath(), + $result['documents'], + ); + + if (file_put_contents($tempFile, serialize(['paths' => $paths])) === false) { + fwrite(STDERR, '[Worker ' . $workerId . '] Failed to write results to temp file' . "\n"); + exit(1); + } + } catch (Throwable $e) { + fwrite(STDERR, sprintf( + "[Worker %d] Pipeline failed: %s\n", + $workerId, + $e->getMessage(), + )); + // Best effort to write error - if this fails too, exit with error code + if (file_put_contents($tempFile, serialize(['error' => $e->getMessage()])) === false) { + exit(1); + } + } + + exit(0); + } + + // Parent: record child PID + $childPids[$workerId] = $pid; + } + + // Wait for all children with timeout + $waitResult = ProcessManager::waitForChildrenWithTimeout($childPids); + $allPaths = []; + $failures = []; + + foreach ($childPids as $workerId => $pid) { + // Only read results from successful workers + if (in_array($workerId, $waitResult['successes'], true)) { + $serialized = file_get_contents($tempFiles[$workerId]); + if ($serialized !== false && $serialized !== '') { + $data = unserialize($serialized); + if (is_array($data) && isset($data['paths']) && is_array($data['paths'])) { + /** @var string[] $paths */ + $paths = $data['paths']; + $allPaths = array_merge($allPaths, $paths); + } + + if (is_array($data) && isset($data['error']) && is_string($data['error'])) { + $failures[$workerId] = $data['error']; + } + } + } else { + $reason = $waitResult['failures'][$workerId] ?? 'unknown'; + $failures[$workerId] = $reason; + } + + ProcessManager::cleanupTempFile($tempFiles[$workerId]); + } + + // Fail fast on worker failures to prevent incomplete documentation + if ($failures !== []) { + $errorDetails = []; + foreach ($failures as $workerId => $reason) { + $errorDetails[] = sprintf('Worker %d: %s', $workerId, $reason); + $this->logger?->error(sprintf('Pipeline worker %d failed: %s', $workerId, $reason)); + } + + throw new RuntimeException( + 'Single-fork pipeline failed: ' . implode(', ', $errorDetails), + ); + } + + // Post-process: resolve navigation placeholders + $this->resolveNavigationPlaceholders($outputDir, $allPaths); + + $this->logger?->info(sprintf( + 'Single-fork pipeline complete: %d documents processed', + count($allPaths), + )); + + // Return empty result since documents were rendered by children + return ['documents' => [], 'projectNode' => new ProjectNode()]; + } + + /** + * Post-process HTML files to resolve navigation placeholders. + * + * Placeholders format: + * After all rendering is complete, we know the full document order and can resolve these. + * + * @param string[] $documentPaths + */ + private function resolveNavigationPlaceholders(string $outputDir, array $documentPaths): void + { + // Build path -> index map for quick lookup + $pathIndex = array_flip($documentPaths); + + // Scan all HTML files using RecursiveDirectoryIterator (portable, works everywhere) + $htmlFiles = $this->findHtmlFiles($outputDir); + + foreach ($htmlFiles as $htmlFile) { + $content = file_get_contents($htmlFile); + if ($content === false) { + continue; + } + + // Check if file has placeholders + if (strpos($content, '