feat: decouple log configuration from apifylog#3399
Conversation
Allow connecting to remote browser services (Browserless, BrowserBase, Playwright Server) instead of always launching local browsers. (#1822) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ghtCrawler" This reverts commit 90fbb0c.
|
ready for review, user now only need to define 2 functions and wrapper handle others. |
packages/core/src/request.ts
Outdated
| // Lazy singleton — evaluated on first use so a user-configured logger is picked up | ||
| // rather than the default that exists at module load time. | ||
| let _log: CrawleeLogger | undefined; | ||
| const getLog = () => { | ||
| _log ??= Configuration.getGlobalConfig().getLogger().child({ prefix: 'Request' }); | ||
| return _log; | ||
| }; |
There was a problem hiding this comment.
nit / refactor: can getLog(ger) be a static method on Request, just to make the design a bit tidier?
packages/core/src/log.ts
Outdated
| /** | ||
| * Internal logging method used by some Crawlee internals. | ||
| */ | ||
| internal(level: number, message: string, data?: Record<string, unknown>, exception?: Error): void; |
There was a problem hiding this comment.
If this is exposed in the public interface, we should assume people will use this (especially if there is no other method that does this).
How about naming this logger.logWithLevel or sth similar and add a friendly comment to encourage the (power)users instead?
There was a problem hiding this comment.
You mean rename just in interface and keep the internal implementation in Abstract Class?
There was a problem hiding this comment.
I suppose you'll need to rename both the interface member and the implementing function, but yes, it's just about the name / JSDoc for me. I believe our more advanced users could find usecases for this method too, no need to hide it from them.
There was a problem hiding this comment.
I am digging deep into it and I am not sure how you mean it. Internal function is implemented in apify/log, you want this change to bubble up to apify/log package?
There was a problem hiding this comment.
I'm sorry if I'm confusing.
I'd prefer to have a more descriptive name for the internal method, as it's public. This change would have to rename both CrawleeLogger.internal and BaseCrawleeLogger.internal.
As Honza mentions in the other comment
We don't really need to keep @apify/log directly assignable to CrawlingContext.log
so the @apify/log's interface doesn't really bother me now (ok it does - for the same reason as here - but we don't need to fix it now :)).
There was a problem hiding this comment.
I'm sorry if I'm confusing.
I believe I am the source of my confusion :]
But I have this example, this is one of a few usages of .internal -> basic-crawler.ts
this.log.internal(LogLevel[(options.level as 'DEBUG') ?? 'DEBUG'], message, data);
this.log is either @apify/log or custom logger provided by user. So i have to ensure that it exists on both. I cannot rename it in CrawleeLogger and BaseCrawleeLogger because I have to match @apify/log
Or I am missing something? 🤔
There was a problem hiding this comment.
As Honza mentions in the other comment
Feel free to tweak stuff to a reasonable degree; we can afford BC breaks. We don't really need to keep
@apify/logdirectly assignable toCrawlingContext.log. It's no problem if that's going to require some additional wrapper or even adding a new method to@apify/log.
I.e. right now, you don't have to match the @apify/log and CrawleeLogger APIs (I believe that was the original reason for these changes; we want to free ourselves from @apify/log and create our own Crawlee thing).
this.log is either @apify/log or custom logger provided by user.
imo after merging this PR, this.log should implement CrawleeLogger interface (and that's all we'll know about it). Whether it's a custom user-supplied logger or a CrawleeLogger wrapper for @apify/log should be inconsequential.
Let's keep the .internal() method for now if it's easier. I don't want to muddle your PR with minor things like this :) It's still good that we discussed this, because maybe I'm understanding the motivations behind this PR wrong.
Can we please get a third pair of eyes @janbuchar? 🙏
There was a problem hiding this comment.
Huh. What is internal used for? And how many usages are there? It sounds like a band-aid fix that may not be necessary anymore.
There was a problem hiding this comment.
single usage is in basic-crawler.ts
this.log.internal(LogLevel[(options.level as 'DEBUG') ?? 'DEBUG'], message, data);I believe only reason to use it is we can dynamically set level. It is bad name for what it does but there is no other function in @apify/log to do the job. In most of logging libraries this method is called simply log or _log :]
There was a problem hiding this comment.
apify/log wrapped to get rid of internal, now logWithError
|
|
||
| if (BasicCrawler.useStateCrawlerIds.size > 1) { | ||
| defaultLog.warningOnce( | ||
| this.log.warningOnce( |
There was a problem hiding this comment.
This is a semantic change - originally, we would call warningOnce on the global instance (so it would log the warning only once).
Now we're calling warningOnce on each crawler's instance separately. These do not share the state, so this will log the message n times (n == crawler count).
There was a problem hiding this comment.
Does it matter really? Feels all right to me.
There was a problem hiding this comment.
It logs the same multiline message multiple times... not the end of the world, but imo unnecessary. I noticed this just because I dealt with this when implementing this not so long ago :)
If using the global log instance proves too complicated, I'm fine with this; if it's a one-line change, I'd prefer clean logs.
There was a problem hiding this comment.
yop, also claude suggested this to me but I was not believe him at the moment. I can take a look, probably I can store the state and pass it to child.
There was a problem hiding this comment.
fixed by static logger instance for basic-crawler
packages/playwright-crawler/src/internals/utils/playwright-utils.ts
Outdated
Show resolved
Hide resolved
packages/puppeteer-crawler/src/internals/enqueue-links/click-elements.ts
Outdated
Show resolved
Hide resolved
| hasFinishedBefore = false; | ||
|
|
||
| readonly log: Log; | ||
| readonly log: CrawleeLogger; |
There was a problem hiding this comment.
Could this be a get property that reaches into serviceLocator instead? Just to avoid storing stuff in random places.
There was a problem hiding this comment.
this basically following how it worked before (log property storing child logger)
before:
- log -> child instance with prefix ->
log = defaultLog.child({ prefix: this.constructor.name }) - parent call for warning once ->
defaultLog.warningOnce()
now:
- log ->
log = serviceLocator.getLogger().child({ prefix: this.constructor.name }) - parent call for wanring once ->
serviceLocator.getLogger().warningOnce()
only now there is internal log property and logger property for scopedServiceLocator, i think we do not need both 🤔
There was a problem hiding this comment.
Yeah, warning once is now implemented in the crawlee logger, so we probably don't need any special handling for that.
packages/core/src/log.ts
Outdated
| /** | ||
| * Internal logging method used by some Crawlee internals. | ||
| */ | ||
| internal(level: number, message: string, data?: Record<string, unknown>, exception?: Error): void; |
There was a problem hiding this comment.
Huh. What is internal used for? And how many usages are there? It sounds like a band-aid fix that may not be necessary anymore.
packages/core/src/log.ts
Outdated
| } | ||
|
|
||
| private _dispatch(level: number, message: string, data?: Record<string, unknown>): void { | ||
| if (level <= this.level) { |
There was a problem hiding this comment.
It feels weird to implement log-level filtering at this level, pretty much every logging library will also implement this. Why is it necessary?
There was a problem hiding this comment.
I can remove it, question is if configuration.logLevel or CRAWLEE_LOG_LEVEL should propagate also to provided logger or only if default apify/log is used?
There was a problem hiding this comment.
Right. Maybe the CrawleeLogger interface could have a setLevel method that we'd call at some point? But it is not exactly clear at which point...
There was a problem hiding this comment.
it is used only if no user logger provided and logLevel set in configuration (as level handling of provided logger is in user's hands) so no need to have it in interface at the moment
janbuchar
left a comment
There was a problem hiding this comment.
Looks mostly good to me, just please resolve that one comment
| hasFinishedBefore = false; | ||
|
|
||
| readonly log: Log; | ||
| readonly log: CrawleeLogger; |
There was a problem hiding this comment.
Yeah, warning once is now implemented in the crawlee logger, so we probably don't need any special handling for that.
|
One more thing, it'd be great to add a guide/example to set a custom logger to the docs. Could you either do it or make an issue? |
|
I will merge this and create an issue. |
Adds a pluggable logger abstraction so users can swap @apify/log for any logger (Winston, Pino, Bunyan, etc.).
What's added:
- CrawleeLogger interface — the contract any logger must satisfy (13
methods: error, warning, info, debug, exception, softFail,
warningOnce, deprecated, perf, logWithLevel, child, getOptions,
setOptions)
- BaseCrawleeLogger abstract class — implement logWithLevel() +
createChild(), get everything else for free
- ServiceLocator.getLogger() / setLogger() — injection point; defaults
to @apify/log
Adds a pluggable logger abstraction so users can swap @apify/log for any logger (Winston, Pino, Bunyan, etc.).
What's added:
Full example with 60e3c30 build