Skip to content

Commit a040691

Browse files
njzjzpre-commit-ci[bot]coderabbitai[bot]
authored
feat: dpdisp run (#456)
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Introduced a new `run` command to execute Python scripts with associated PEP 723 metadata. - Added documentation on how to use the `run` command in `dpdispatcher`. - Expanded the glossary in the documentation to include the term "run." - **Documentation** - Corrected a spelling error in the `README.md`. - Added a new guide `doc/run.md` for running Python scripts using `dpdispatcher`. - **Bug Fixes** - Ensured the `run` command is included in CLI tests. - **Chores** - Updated dependencies in `pyproject.toml` to include `tomli` for Python versions below 3.11. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
1 parent 8ac13da commit a040691

File tree

13 files changed

+306
-1
lines changed

13 files changed

+306
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,4 +36,4 @@ See [Contributing Guide](CONTRIBUTING.md) to become a contributor! 🤓
3636

3737
## References
3838

39-
DPDispatcher is derivated from the [DP-GEN](https://github.com/deepmodeling/dpgen) package. To mention DPDispatcher in a scholarly publication, please read Section 3.3 in the [DP-GEN paper](https://doi.org/10.1016/j.cpc.2020.107206).
39+
DPDispatcher is derived from the [DP-GEN](https://github.com/deepmodeling/dpgen) package. To mention DPDispatcher in a scholarly publication, please read Section 3.3 in the [DP-GEN paper](https://doi.org/10.1016/j.cpc.2020.107206).

doc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ DPDispatcher will monitor (poke) until these jobs finish and download the result
2222
machine
2323
resources
2424
task
25+
run
2526
cli
2627
api/api
2728

doc/pep723.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.. dargs::
2+
:module: dpdispatcher.run
3+
:func: pep723_args

doc/run.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Run Python scripts
2+
3+
DPDispatcher can be used to directly run a single Python script:
4+
5+
```sh
6+
dpdisp run script.py
7+
```
8+
9+
The script must include [inline script metadata](https://packaging.python.org/en/latest/specifications/inline-script-metadata/) compliant with [PEP 723](https://peps.python.org/pep-0723/).
10+
An example of the script is shown below.
11+
12+
```{literalinclude} ../examples/dpdisp_run.py
13+
:language: py
14+
:linenos:
15+
```
16+
17+
The PEP 723 metadata entries for `tool.dpdispatcher` are defined as follows:
18+
19+
```{eval-rst}
20+
.. include:: pep723.rst
21+
```

dpdispatcher/dpdisp.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from typing import List, Optional
44

55
from dpdispatcher.entrypoints.gui import start_dpgui
6+
from dpdispatcher.entrypoints.run import run
67
from dpdispatcher.entrypoints.submission import handle_submission
78

89

@@ -81,6 +82,18 @@ def main_parser() -> argparse.ArgumentParser:
8182
"to the network on both IPv4 and IPv6 (where available)."
8283
),
8384
)
85+
##########################################
86+
# run
87+
parser_run = subparsers.add_parser(
88+
"run",
89+
help="Run a Python script.",
90+
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
91+
)
92+
parser_run.add_argument(
93+
"filename",
94+
type=str,
95+
help="Python script to run. PEP 723 metadata should be contained in this file.",
96+
)
8497
return parser
8598

8699

@@ -117,6 +130,8 @@ def main():
117130
port=args.port,
118131
bind_all=args.bind_all,
119132
)
133+
elif args.command == "run":
134+
run(filename=args.filename)
120135
elif args.command is None:
121136
pass
122137
else:

dpdispatcher/entrypoints/run.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
"""Run PEP 723 script."""
2+
3+
from dpdispatcher.run import run_pep723
4+
5+
6+
def run(*, filename: str):
7+
with open(filename) as f:
8+
script = f.read()
9+
run_pep723(script)

dpdispatcher/run.py

Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
import os
2+
import re
3+
import sys
4+
from glob import glob
5+
from hashlib import sha1
6+
7+
from dpdispatcher.machine import Machine
8+
from dpdispatcher.submission import Resources, Submission, Task
9+
10+
if sys.version_info >= (3, 11):
11+
import tomllib
12+
else:
13+
import tomli as tomllib
14+
from typing import List, Optional
15+
16+
from dargs import Argument
17+
18+
from dpdispatcher.arginfo import machine_dargs, resources_dargs, task_dargs
19+
20+
REGEX = r"(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$"
21+
22+
23+
def read_pep723(script: str) -> Optional[dict]:
24+
"""Read a PEP 723 script metadata from a script string.
25+
26+
Parameters
27+
----------
28+
script : str
29+
Script content.
30+
31+
Returns
32+
-------
33+
dict
34+
PEP 723 metadata.
35+
"""
36+
name = "script"
37+
matches = list(
38+
filter(lambda m: m.group("type") == name, re.finditer(REGEX, script))
39+
)
40+
if len(matches) > 1:
41+
# TODO: Add tests for scenarios where multiple script blocks are found
42+
raise ValueError(f"Multiple {name} blocks found")
43+
elif len(matches) == 1:
44+
content = "".join(
45+
line[2:] if line.startswith("# ") else line[1:]
46+
for line in matches[0].group("content").splitlines(keepends=True)
47+
)
48+
return tomllib.loads(content)
49+
else:
50+
# TODO: Add tests for scenarios where no metadata is found
51+
return None
52+
53+
54+
def pep723_args() -> Argument:
55+
"""Return the argument parser for PEP 723 metadata."""
56+
machine_args = machine_dargs()
57+
machine_args.fold_subdoc = True
58+
machine_args.doc = "Machine configuration. See related documentation for details."
59+
resources_args = resources_dargs(detail_kwargs=False)
60+
resources_args.fold_subdoc = True
61+
resources_args.doc = (
62+
"Resources configuration. See related documentation for details."
63+
)
64+
task_args = task_dargs()
65+
command_arg = task_args["command"]
66+
command_arg.doc = (
67+
"Python interpreter or launcher. No need to contain the Python script filename."
68+
)
69+
command_arg.default = "python"
70+
command_arg.optional = True
71+
task_args["task_work_path"].doc += " Can be a glob pattern."
72+
task_args.name = "task_list"
73+
task_args.doc = "List of tasks to execute."
74+
task_args.repeat = True
75+
task_args.dtype = (list,)
76+
return Argument(
77+
"pep723",
78+
dtype=dict,
79+
doc="PEP 723 metadata",
80+
sub_fields=[
81+
Argument(
82+
"work_base",
83+
dtype=str,
84+
optional=True,
85+
default="./",
86+
doc="Base directory for the work",
87+
),
88+
Argument(
89+
"forward_common_files",
90+
dtype=List[str],
91+
optional=True,
92+
default=[],
93+
doc="Common files to forward to the remote machine",
94+
),
95+
Argument(
96+
"backward_common_files",
97+
dtype=List[str],
98+
optional=True,
99+
default=[],
100+
doc="Common files to backward from the remote machine",
101+
),
102+
machine_args,
103+
resources_args,
104+
task_args,
105+
],
106+
)
107+
108+
109+
def create_submission(metadata: dict, hash: str) -> Submission:
110+
"""Create a Submission instance from a PEP 723 metadata.
111+
112+
Parameters
113+
----------
114+
metadata : dict
115+
PEP 723 metadata.
116+
hash : str
117+
Submission hash.
118+
119+
Returns
120+
-------
121+
Submission
122+
Submission instance.
123+
"""
124+
base = pep723_args()
125+
metadata = base.normalize_value(metadata, trim_pattern="_*")
126+
base.check_value(metadata, strict=False)
127+
128+
tasks = []
129+
for task in metadata["task_list"]:
130+
task = task.copy()
131+
task["command"] += f" $REMOTE_ROOT/script_{hash}.py"
132+
task_work_path = os.path.join(
133+
metadata["machine"]["local_root"],
134+
metadata["work_base"],
135+
task["task_work_path"],
136+
)
137+
if os.path.isdir(task_work_path):
138+
tasks.append(Task.load_from_dict(task))
139+
elif glob(task_work_path):
140+
for file in glob(task_work_path):
141+
tasks.append(Task.load_from_dict({**task, "task_work_path": file}))
142+
# TODO: Add tests for scenarios where the task work path is a glob pattern
143+
else:
144+
# TODO: Add tests for scenarios where the task work path is not found
145+
raise FileNotFoundError(f"Task work path {task_work_path} not found.")
146+
return Submission(
147+
work_base=metadata["work_base"],
148+
forward_common_files=metadata["forward_common_files"],
149+
backward_common_files=metadata["backward_common_files"],
150+
machine=Machine.load_from_dict(metadata["machine"]),
151+
resources=Resources.load_from_dict(metadata["resources"]),
152+
task_list=tasks,
153+
)
154+
155+
156+
def run_pep723(script: str):
157+
"""Run a PEP 723 script.
158+
159+
Parameters
160+
----------
161+
script : str
162+
Script content.
163+
"""
164+
metadata = read_pep723(script)
165+
if metadata is None:
166+
raise ValueError("No PEP 723 metadata found.")
167+
dpdispatcher_metadata = metadata["tool"]["dpdispatcher"]
168+
script_hash = sha1(script.encode("utf-8")).hexdigest()
169+
submission = create_submission(dpdispatcher_metadata, script_hash)
170+
submission.machine.context.write_file(f"script_{script_hash}.py", script)
171+
# write script
172+
submission.run_submission()

examples/dpdisp_run.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# /// script
2+
# # dpdispatcher doesn't use `requires-python` and `dependencies`
3+
# requires-python = ">=3"
4+
# dependencies = [
5+
# ]
6+
# [tool.dpdispatcher]
7+
# work_base = "./"
8+
# forward_common_files=[]
9+
# backward_common_files=[]
10+
# [tool.dpdispatcher.machine]
11+
# batch_type = "Shell"
12+
# local_root = "./"
13+
# context_type = "LazyLocalContext"
14+
# [tool.dpdispatcher.resources]
15+
# number_node = 1
16+
# cpu_per_node = 1
17+
# gpu_per_node = 0
18+
# group_size = 0
19+
# [[tool.dpdispatcher.task_list]]
20+
# # no need to contain the script filename
21+
# command = "python"
22+
# # can be a glob pattern
23+
# task_work_path = "./"
24+
# forward_files = []
25+
# backward_files = ["log"]
26+
# ///
27+
28+
print("hello world!")

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ dependencies = [
2828
'tqdm>=4.9.0',
2929
'typing_extensions; python_version < "3.7"',
3030
'pyyaml',
31+
'tomli >= 1.1.0; python_version < "3.11"',
3132
]
3233
requires-python = ">=3.7"
3334
readme = "README.md"

tests/context.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
# test backward compatibility with dflow
1616
from dpdispatcher.dpcloudserver.client import RequestInfoException as _ # noqa: F401
17+
from dpdispatcher.entrypoints.run import run # noqa: F401
1718
from dpdispatcher.entrypoints.submission import handle_submission # noqa: F401
1819
from dpdispatcher.machine import Machine # noqa: F401
1920
from dpdispatcher.machines.distributed_shell import DistributedShell # noqa: F401

0 commit comments

Comments
 (0)