-
Notifications
You must be signed in to change notification settings - Fork 0
Add support for partial results tracking on a scan #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add RUNNING status to JobStatus enum - Add ThreadLocal in BulkScanWorker to pass ScanJobDescription to scan implementations - Add upsertPartialScanResult() method to IPersistenceProvider interface - Implement upsertPartialScanResult() in MongoPersistenceProvider - Update DummyPersistenceProvider test implementation - Update BulkScanWorkerManager to pass ScanJobDescription to worker.handle() This enables real-time persistence of intermediate scan results to MongoDB as probes complete during TLS scans.
… input. scanTarget can be accessed from scanJobDescription, and the UUID in scanJobDescription is needed to write partial results to MongoDB during a scan
|
Add one more test using DummyPersistenceProvider to check the partial results. |
…ersistenceProvider
src/main/java/de/rub/nds/crawler/core/BulkScanWorkerManager.java
Outdated
Show resolved
Hide resolved
juaninf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my comments
Replace all hostname usages with TEST-NET-1 IPs (RFC 5737)
|
looks good to me @XoMEX are you ok with this? |
|
I don't get what |
|
|
|
Do I get it correctly, that you modify the mongodb also from the TLS-Crawler then? |
…al results persistence. Rework Worker, BulkScanWorker and BulkScanWorkerManager handle method to use ScheduledScan instead of Future<Document> and removed use of ThreadLocal.
|
yes, that is correct, mongoDB was also modified in TLS-Crawler. I changed it so now mongoDB is only modified in Crawler Core. |
XoMEX
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool that you got it working that fast.
I am still a bit unhappy about how the partial results are passed back to the scanner. In my head I envision that the report is observable, and as soon as it is updated, the new data is persisted into the DB. But that comes with other issues (like, do we want to write all changes?) and a redesign of the scanner result class. I am not sure if it is worth the effort.
Other than that, I left some minor comments
| public class ScheduledScan { | ||
|
|
||
| private volatile Document currentResult; | ||
| private final CompletableFuture<Document> finalResult = new CompletableFuture<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reasoning for wrapping the future instead of extending it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also find the name to not capture the functionality.
If this would extend the future, I'd call it something like ProgressableFuture<Document>.
| public void setPersistenceProvider(IPersistenceProvider persistenceProvider) { | ||
| this.persistenceProvider = persistenceProvider; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead pass this in the constructor? Is there a use-case to having this as a variable instead of a constant?
| * @param partialResult The partial result document to persist | ||
| */ | ||
| protected void persistPartialResult(ScanJobDescription jobDescription, Document partialResult) { | ||
| if (persistenceProvider != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this check?
| public abstract Document scan(ScanTarget scanTarget); | ||
| public abstract Document scan(ScanJobDescription jobDescription, ScheduledScan scheduledScan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike having an "out parameter". The only alternative I see would be a functional interface to feed the partial results into. But I guess that isn't that much of a difference.
| * @param jobDescription The job description for the scan | ||
| * @param partialResult The partial result document to persist | ||
| */ | ||
| protected void persistPartialResult(ScanJobDescription jobDescription, Document partialResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not somehow linked to ScheduledScan.updateResult?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two approaches:
Either this function updates the results in the ScheduledScan object
Or this function is registered as an observer for updates on the ScheduledScan object; i.e. if updateResult is called, it fires an even that causes persistPratialResult to be called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or are there cases where one function is called and not the other?
Adds infrastructure to support partial results by providing scan implementations access to their ScanJobDescription context during execution
Added: