WIP: log dask task stream and rmm events into results dir #225

kevingerman · 2022-01-06T15:55:12Z

Adds a couple of utilities to capture metrics per query triggered in benchmark_runner.py
rmm_monitor logs Cuda allocations from each dask worker
dask_task_logger grabs the tasks into a json file

TBD:

gate with an environment variable as the logs add up quickly.
mark rmm logs with worker ID

Q28: Reset index on train/test df's after sorting

…emory usage

Q23 Dask-sql oom fix

Updating bdb_tools & test dask-sql queries for shared Context

Fix Q29 OOM issue

Update q22 date logic to go via int

[REVIEW]Q25 fix

Variable renaming: bsql->dask-sql

randerzander

Thanks for this work.

Can you maybe include some README changes about this configuration? Do you have a parser that uses these logs or just eyeball them so far?

gpu_bdb/benchmark_runner.py

kevingerman · 2022-01-25T16:46:55Z

I have another commit pending which fetches the logs into a single df which can then be serialized (parquet?)
Still need an example analysis. Probably go back to a bokeh plot that works as a standalone example.

Query 03 drop na fix

Replace deprecated df.one_hot_encoder with cudf.get_dummies

[REVIEW]Query 22 Float Fix

Chris Jarrett and others added 30 commits August 12, 2021 11:01

Add dask-sql query files

46eefa1

Fixes

b14335a

Specify web_clickstreams columns

2de0db5

Set persist=False when creating tables

0f20be6

Remove debugging code

c216a3f

Cleanup query 5

7b96544

Add DISTRIBUTE BY Operator

5c1eeb8

Reset index on train/test df's after sorting

0ba6115

Merge pull request #1 from ayushdg/dask-sql

41054eb

Q28: Reset index on train/test df's after sorting

Fix query 18

9a2eace

Merge std_dev aggregation to query2, and remove persist's to reduce m…

0b5eb2c

…emory usage

Merge pull request rapidsai#2 from ayushdg/dask-sql

2101d1e

Q23 Dask-sql oom fix

Fix duplicate index for 5, 8, and 26

ddf0c9f

Updating bdb_tools & test dask-sql queries for shared Context

73ee7a9

Merge pull request rapidsai#3 from randerzander/224_2

e726156

Updating bdb_tools & test dask-sql queries for shared Context

added split_out to q29

ed7b137

Added persist to prevent duplicate computation

a2a52e2

fixed comment

c5b40f0

fixed comment

0dc6440

Merge pull request rapidsai#4 from VibhuJawa/q29_sql

c0d7471

Fix Q29 OOM issue

Update remaining queries to use shared context

c44374e

Remove extra dask-sql imports

382c7c1

Fix q22 errros by casting date column to int

b71cb50

Merge pull request rapidsai#5 from ayushdg/dask-sql

a1ec59e

Update q22 date logic to go via int

added Query-25 dask-sql alternate implimentation

3a26c91

fixed comment

558c5de

Merge pull request rapidsai#6 from VibhuJawa/q25_fix

a51521e

[REVIEW]Q25 fix

removed not useful order bys

fbaa648

remove persist from query-02

c453417

q03 removed persist

a0334a2

ChrisJar and others added 8 commits December 14, 2021 17:08

Merge pull request rapidsai#9 from randerzander/224_9

db45969

Variable renaming: bsql->dask-sql

Update copyrights

d3ca1b1

Cleanup

f5303bb

Address reviews

19a271d

Remove load_q03

6637e11

Share code between sql and Dask queries

0720a66

Remove lock files

999818b

Remove category codes casts

10c71d1

randerzander requested changes Jan 19, 2022

View reviewed changes

gpu_bdb/benchmark_runner.py Outdated Show resolved Hide resolved

sft-managed added 4 commits January 20, 2022 06:11

Refactor read_tables and constants into shared files

8d9db0d

Update copyrights

6470303

Remove unused imports

3e9d688

Cleanup remaining repeated code

a55eddc

sft-managed and others added 15 commits January 25, 2022 18:01

Add dask-sql environment file

d73b1c0

Update dask-sql version

a9a0833

fix query 22

b60d2f1

Query 03 fi

3cf17b5

small style fix

e3a94c1

Merge pull request rapidsai#10 from VibhuJawa/q03_drop_na_fix

5fe4e41

Query 03 drop na fix

replace deprecated df.one_hot_encoder with cudf.get_dummies

ca49940

Merge pull request rapidsai#12 from ayushdg/replace-one-hot-encoder

cac2144

Replace deprecated df.one_hot_encoder with cudf.get_dummies

Q22 result verfied

1dcc3f5

Merge pull request rapidsai#11 from VibhuJawa/q22_fix

7b1c16c

[REVIEW]Query 22 Float Fix

log dask task stream and rmm events into results dir

99e5f30

unneccesary

cbd632e

rmm logs named for each worker by pid

d38e827

gate logging with config options

db13a58

rebase dask-sql

e4d7817

kevingerman force-pushed the metrics branch from 7f9ffd1 to e4d7817 Compare February 3, 2022 02:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: log dask task stream and rmm events into results dir #225

WIP: log dask task stream and rmm events into results dir #225

Uh oh!

kevingerman commented Jan 6, 2022

Uh oh!

randerzander left a comment

Uh oh!

Uh oh!

kevingerman commented Jan 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

WIP: log dask task stream and rmm events into results dir #225

Are you sure you want to change the base?

WIP: log dask task stream and rmm events into results dir #225

Uh oh!

Conversation

kevingerman commented Jan 6, 2022

Uh oh!

randerzander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kevingerman commented Jan 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants