Skip to content

feat: decouple log configuration from apifylog#3399

Merged
l2ysho merged 62 commits intov4from
3068-decouple-log-configuration-from-apifylog
Mar 3, 2026
Merged

feat: decouple log configuration from apifylog#3399
l2ysho merged 62 commits intov4from
3068-decouple-log-configuration-from-apifylog

Conversation

@l2ysho
Copy link
Contributor

@l2ysho l2ysho commented Feb 9, 2026

Adds a pluggable logger abstraction so users can swap @apify/log for any logger (Winston, Pino, Bunyan, etc.).

What's added:

  • CrawleeLogger interface — the contract any logger must satisfy (13 methods: error, warning, info, debug, exception, warningOnce, deprecated, perf, softFail, internal, child, getLevel/setLevel, getOptions/setOptions)
  • BaseCrawleeLogger abstract class — implement _log() + _createChild(), get everything else for free
  • ServiceLocator.getLogger() / setLogger() — injection point; defaults to @apify/log

Full example with 60e3c30 build

import { BaseCrawleeLogger,  CheerioCrawler,  Configuration } from "../crawlee-v4/packages/crawlee/dist";
import type { CrawleeLogger, CrawleeLoggerOptions } from "../crawlee-v4/packages/crawlee/dist";
import winston from "winston";

const winstonLogger = winston.createLogger({
  level: "debug",
  format: winston.format.combine(
    winston.format.colorize(),
    winston.format.timestamp(),
    winston.format.printf(({ level, message, timestamp, prefix }) => {
      const tag = prefix ? `[${prefix}] ` : "";
      return `${timestamp} ${level}: ${tag}${message}`;
    }),
  ),
  transports: [new winston.transports.Console()],
});

class WinstonAdapter extends BaseCrawleeLogger {
  constructor(
    private logger: winston.Logger,
    options?: Partial<CrawleeLoggerOptions>,
  ) {
    super(options);
  }

  protected _log(level: number, message: string, data?: Record<string, unknown>): void {
    const winstonLevel = CRAWLEE_TO_WINSTON[level] ?? "info";
    this.logger.log(winstonLevel, message, { ...data, prefix: this.getOptions().prefix });
  }

  protected _createChild(options: Partial<CrawleeLoggerOptions>): CrawleeLogger {
    return new WinstonAdapter(
      this.logger.child({ prefix: options.prefix }),
      { ...this.getOptions(), ...options },
    );
  }
}

// --- Usage ---

const config = new Configuration({
  loggerProvider: new WinstonAdapter(winstonLogger),
  logLevel: 5,
});

const crawler = new CheerioCrawler(
  {
    async requestHandler({ request, $, log }) {
      log.info(`Processing ${request.url}`);
      const title = $("title").text();
      log.debug("Page title extracted", { title });
      console.log(`Title: ${title}`);
    },
  },
  config,
);

await crawler.run(["https://crawlee.dev"]);

@l2ysho l2ysho self-assigned this Feb 9, 2026
l2ysho and others added 5 commits February 17, 2026 19:04
Allow connecting to remote browser services (Browserless, BrowserBase,
Playwright Server) instead of always launching local browsers. (#1822)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@l2ysho l2ysho changed the title 3068 decouple log configuration from apifylog feat: decouple log configuration from apifylog Feb 19, 2026
@l2ysho l2ysho marked this pull request as ready for review February 20, 2026 09:27
@l2ysho l2ysho requested review from B4nan and barjin February 20, 2026 09:28
@l2ysho
Copy link
Contributor Author

l2ysho commented Feb 20, 2026

ready for review, user now only need to define 2 functions and wrapper handle others.

Copy link
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @l2ysho 🙌 I only have some minor notes ⬇️

Comment on lines +18 to +24
// Lazy singleton — evaluated on first use so a user-configured logger is picked up
// rather than the default that exists at module load time.
let _log: CrawleeLogger | undefined;
const getLog = () => {
_log ??= Configuration.getGlobalConfig().getLogger().child({ prefix: 'Request' });
return _log;
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit / refactor: can getLog(ger) be a static method on Request, just to make the design a bit tidier?

/**
* Internal logging method used by some Crawlee internals.
*/
internal(level: number, message: string, data?: Record<string, unknown>, exception?: Error): void;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is exposed in the public interface, we should assume people will use this (especially if there is no other method that does this).

How about naming this logger.logWithLevel or sth similar and add a friendly comment to encourage the (power)users instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean rename just in interface and keep the internal implementation in Abstract Class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you'll need to rename both the interface member and the implementing function, but yes, it's just about the name / JSDoc for me. I believe our more advanced users could find usecases for this method too, no need to hide it from them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am digging deep into it and I am not sure how you mean it. Internal function is implemented in apify/log, you want this change to bubble up to apify/log package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry if I'm confusing.

I'd prefer to have a more descriptive name for the internal method, as it's public. This change would have to rename both CrawleeLogger.internal and BaseCrawleeLogger.internal.

As Honza mentions in the other comment

We don't really need to keep @apify/log directly assignable to CrawlingContext.log

so the @apify/log's interface doesn't really bother me now (ok it does - for the same reason as here - but we don't need to fix it now :)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry if I'm confusing.

I believe I am the source of my confusion :]

But I have this example, this is one of a few usages of .internal -> basic-crawler.ts

this.log.internal(LogLevel[(options.level as 'DEBUG') ?? 'DEBUG'], message, data);

this.log is either @apify/log or custom logger provided by user. So i have to ensure that it exists on both. I cannot rename it in CrawleeLogger and BaseCrawleeLogger because I have to match @apify/log

Or I am missing something? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Honza mentions in the other comment

Feel free to tweak stuff to a reasonable degree; we can afford BC breaks. We don't really need to keep @apify/log directly assignable to CrawlingContext.log. It's no problem if that's going to require some additional wrapper or even adding a new method to @apify/log.

I.e. right now, you don't have to match the @apify/log and CrawleeLogger APIs (I believe that was the original reason for these changes; we want to free ourselves from @apify/log and create our own Crawlee thing).

this.log is either @apify/log or custom logger provided by user.

imo after merging this PR, this.log should implement CrawleeLogger interface (and that's all we'll know about it). Whether it's a custom user-supplied logger or a CrawleeLogger wrapper for @apify/log should be inconsequential.

Let's keep the .internal() method for now if it's easier. I don't want to muddle your PR with minor things like this :) It's still good that we discussed this, because maybe I'm understanding the motivations behind this PR wrong.

Can we please get a third pair of eyes @janbuchar? 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. What is internal used for? And how many usages are there? It sounds like a band-aid fix that may not be necessary anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single usage is in basic-crawler.ts

 this.log.internal(LogLevel[(options.level as 'DEBUG') ?? 'DEBUG'], message, data);

I believe only reason to use it is we can dynamically set level. It is bad name for what it does but there is no other function in @apify/log to do the job. In most of logging libraries this method is called simply log or _log :]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apify/log wrapped to get rid of internal, now logWithError


if (BasicCrawler.useStateCrawlerIds.size > 1) {
defaultLog.warningOnce(
this.log.warningOnce(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a semantic change - originally, we would call warningOnce on the global instance (so it would log the warning only once).

Now we're calling warningOnce on each crawler's instance separately. These do not share the state, so this will log the message n times (n == crawler count).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter really? Feels all right to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It logs the same multiline message multiple times... not the end of the world, but imo unnecessary. I noticed this just because I dealt with this when implementing this not so long ago :)

If using the global log instance proves too complicated, I'm fine with this; if it's a one-line change, I'd prefer clean logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yop, also claude suggested this to me but I was not believe him at the moment. I can take a look, probably I can store the state and pass it to child.

Copy link
Contributor Author

@l2ysho l2ysho Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by static logger instance for basic-crawler

@janbuchar janbuchar self-requested a review February 20, 2026 11:30
hasFinishedBefore = false;

readonly log: Log;
readonly log: CrawleeLogger;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a get property that reaches into serviceLocator instead? Just to avoid storing stuff in random places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this basically following how it worked before (log property storing child logger)
before:

  • log -> child instance with prefix -> log = defaultLog.child({ prefix: this.constructor.name })
  • parent call for warning once -> defaultLog.warningOnce()

now:

  • log -> log = serviceLocator.getLogger().child({ prefix: this.constructor.name })
  • parent call for wanring once -> serviceLocator.getLogger().warningOnce()

only now there is internal log property and logger property for scopedServiceLocator, i think we do not need both 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, warning once is now implemented in the crawlee logger, so we probably don't need any special handling for that.

/**
* Internal logging method used by some Crawlee internals.
*/
internal(level: number, message: string, data?: Record<string, unknown>, exception?: Error): void;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. What is internal used for? And how many usages are there? It sounds like a band-aid fix that may not be necessary anymore.

}

private _dispatch(level: number, message: string, data?: Record<string, unknown>): void {
if (level <= this.level) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels weird to implement log-level filtering at this level, pretty much every logging library will also implement this. Why is it necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it, question is if configuration.logLevel or CRAWLEE_LOG_LEVEL should propagate also to provided logger or only if default apify/log is used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Maybe the CrawleeLogger interface could have a setLevel method that we'd call at some point? But it is not exactly clear at which point...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is used only if no user logger provided and logLevel set in configuration (as level handling of provided logger is in user's hands) so no need to have it in interface at the moment

@l2ysho l2ysho requested a review from janbuchar February 27, 2026 15:07
Copy link
Contributor

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me, just please resolve that one comment

hasFinishedBefore = false;

readonly log: Log;
readonly log: CrawleeLogger;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, warning once is now implemented in the crawlee logger, so we probably don't need any special handling for that.

@janbuchar
Copy link
Contributor

One more thing, it'd be great to add a guide/example to set a custom logger to the docs. Could you either do it or make an issue?

@l2ysho
Copy link
Contributor Author

l2ysho commented Mar 3, 2026

I will merge this and create an issue.

@l2ysho l2ysho merged commit c467d18 into v4 Mar 3, 2026
6 checks passed
@l2ysho l2ysho deleted the 3068-decouple-log-configuration-from-apifylog branch March 3, 2026 10:48
janbuchar pushed a commit that referenced this pull request Mar 5, 2026
Adds a pluggable logger abstraction so users can swap @apify/log for any                                                                    logger (Winston, Pino, Bunyan, etc.).

What's added:
  - CrawleeLogger interface — the contract any logger must satisfy (13
    methods: error, warning, info, debug, exception, softFail,
    warningOnce, deprecated, perf, logWithLevel, child, getOptions,
    setOptions)
  - BaseCrawleeLogger abstract class — implement logWithLevel() +
    createChild(), get everything else for free
  - ServiceLocator.getLogger() / setLogger() — injection point; defaults
    to @apify/log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants