-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Hello,
I tried using the fges-mb algorithm, in an attempt to work around the issue I had when running the fges algorithm. The system I use has 14 threads and 128gb RAM. The data I enter are approximately 20000 continuous variables and 3000 samples. One of the commands I used was the following:
java -Xmx128G -jar ~/bin/causal-cmd/causal-cmd-1.4.1-SNAPSHOT-jar-with-dependencies.jar --data-type continuous --default --delimiter tab --json-graph --algorithm fges-mb --dataset dataset.txt --target targetname --score sem-bic-score --penaltyDiscount 10.0 --maxDegree 100
As with fges, the largest data size it could handle was 2000 variables with 100 samples, printing this additional message: "heuristicSpeedup = false".
The --parallelized option was not available for this algorithm, but it could run parallel when using the --default switch. The issue with that was that I could not change the other options for the algorithm, even if I specified them in the command (e.g. in the log it is "verbose: yes" even if I specify "--verbose no" in the command). Is there a way to get parallel execution while being able to choose the other algorithm options?
After trying to run the fges-mb algorithm with a larger data size I got errors. When used a 2000 variable-2000sample dataset, I got "Exception in thread "main" java.util.ConcurrentModificationException".
When I used an even larger dataset(i.e. 10000variables-3000samples) I got the following error line: Exception in thread "main" java.lang.NullPointerException. After displaying the errors, the executable exits.
I attach the full error messages to this ticket. I tried running the fges-mb algorithm with multiple different variables as target. Is there a way to get around these errors?
One other issue I had was when I was using the fges-mb algorithm on the whole dataset, it took too long to even display the message, approximately one hour, meaning it takes too long and repeating it for every variable will take years.
All the tests were done on the same 14 thread and 128gb RAM system, but the errors on the fges-mb algorithm were also presented on a different system with less ram and fewer threads. On the paper accompanying the caucal-cmd executable(Ramsey Et al. 2017), it is stated that it can be used for a million variables and more, but it seems impossible to use it on more than 2000 variables, on an above average system.
Thank you in advance,
George
ConcurrentModificationException.txt
NullPointerException.txt