-
Notifications
You must be signed in to change notification settings - Fork 6
Description
When attempting to use the CLI to validate my data package using the command:
data validate datapackage.jsonI first get a warning about a memory leak:
(node:21287) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 121 end listeners added. Use emitter.setMaxListeners() to increase limitfollowed by a core dump an hour or more later. The data package I'm working with can be found on datahub. It currently consists of two tabular data resources. One (mines) contains ~30MB of CSV data, and triggers the memory leak warning but validates successfully in under a minute. The other (employment-production-quarterly) is ~160MB of CSV data, and also triggers the memory leak warning, proceeding to run for many minutes using ~100-150% of a CPU, while slowly and continuously increasing its memory footprint (but only up to ~10% of available memory), eventually resulting in the following error:
<--- Last few GCs --->
[21287:0x5610d38d7aa0] 2412787 ms: Mark-sweep 2011.0 (2121.7) -> 2011.0 (2091.2) MB, 1581.1 / 0.0 ms last resort GC in old space requested
[21287:0x5610d38d7aa0] 2414404 ms: Mark-sweep 2011.0 (2091.2) -> 2011.0 (2091.7) MB, 1615.7 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x74ce6898fe1 <JSObject>
1: push(this=0x1115d9486161 <JSArray[1956331]>)
2: _callee2$ [/home/zane/anaconda3/lib/node_modules/data-cli/node_modules/tableschema/lib/table.js:~469] [pc=0x27385caf7d07](this=0x1115d94826e9 <Table map = 0x224e9de6491>,_context2=0x1115d9482689 <Context map = 0x224e9de1211>)
3: tryCatch(aka tryCatch) [/home/zane/anaconda3/lib/node_modules/data-cli/node_modules/regenerator-runtime/run...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node::Abort() [node]
2: 0x5610d228a3b3 [node]
3: v8::Utils::ReportOOMFailure(char const*, bool) [node]
4: v8::internal::V8::FatalProcessOutOfMemory(char const*, bool) [node]
5: v8::internal::Factory::NewUninitializedFixedArray(int) [node]
6: 0x5610d1e698a5 [node]
7: 0x5610d1e69a9f [node]
8: v8::internal::JSObject::AddDataElement(v8::internal::Handle<v8::internal::JSObject>, unsigned int, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::Object::ShouldThrow) [node]
9: v8::internal::Object::AddDataProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyAttributes, v8::internal::Object::ShouldThrow, v8::internal::Object::StoreFromKeyed) [node]
10: v8::internal::Object::SetProperty(v8::internal::LookupIterator*, v8::internal::Handle<v8::internal::Object>, v8::internal::LanguageMode, v8::internal::Object::StoreFromKeyed) [node]
11: v8::internal::Runtime::SetObjectProperty(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, v8::internal::LanguageMode) [node]
12: v8::internal::Runtime_SetProperty(int, v8::internal::Object**, v8::internal::Isolate*) [node]
13: 0x27385c8040bd
From within python, using goodtables.validate() on the same data package including all ~2 million records, validation completes successfully and takes about 10 minutes.
I am running Ubuntu 18.04.1 on a Thinkpad T470S with 2, 2-thread cores, and 24GB of RAM. The version of node (v8.11.1) and npm (v6.4.1) that I'm using are the ones distributed with the current anaconda3 distribution (v5.2). The version of data is 0.9.5.