Skip to content

Commit 11177e7

Browse files
authored
feat: allow to run T containers as any user, for better volume permission handling (#253)
#### Relevant issue or PR n/a #### Description of changes This enables users to execute Tesseracts as any system user, even in the presence of volume mounts. We achieve this in the following ways: 1. Default to running Tesseracts with the same uid as the current host user. This fixes most permission issues right away (since everything that's readable from the outside also becomes readable from the inside). 2. Create mount points ahead of time and ensure they have lax permissions (rw for everyone). This makes it so named and bind-mounted volumes are mounted with appropriate permissions, even if they're not owned by the host user. **Some sharp edges:** - By default, podman mounts all volumes as owned by root, not the host user. This needs to be overwritten via `PODMAN_USERNS=keep-id`. We automatically add this flag if podman is detected. - There is no reliable way to enable this option in `podman-compose`. That is, podman users will have to use `--no-compose` for sane volume permissions. We raise an exception if podman is used with compose + volume mounts. - In cases where the Tesseract user is neither `root` nor the local user / file owner, [files in bind-mounts need to be world-readable on the host](https://github.com/pasteurlabs/tesseract-core/pull/253/files#diff-26bec4f76a9f4e334b738405b56d680ad704871cb6233560e9b8e703cd2becf6R494). - Currently assumes (but not enforces) that users will use `/tesseract/input_data` and `/tesseract/output_data` as their mount points. I expect this to be coupled to the `--input-path` and `--output-path` CLI args in the future (see e.g. #249). **Main code changes:** - Do not create a `tesseractor` user in Docker images. Instead, set global permissions on files so containers can run as any uid / gid. - Test [all relevant scenarios](https://github.com/pasteurlabs/tesseract-core/pull/253/files#diff-26bec4f76a9f4e334b738405b56d680ad704871cb6233560e9b8e703cd2becf6R421). - Add logic to detect podman and automatically inject required flags for consistent permission handling. - Introduce `TESSERACT_DOCKER_RUN_ARGS` config which allows users to pass custom args to `tesseract run`. This isn't used in the final form of this PR but I figured it's nice to have anyway. #### Testing done CI, + manually with Docker Desktop on OSX (where permissions are always mapped automatically)
1 parent c695c46 commit 11177e7

File tree

14 files changed

+224
-129
lines changed

14 files changed

+224
-129
lines changed

docs/content/examples/building-blocks/arm64.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ Via `tesseract_config.yaml`, it is possible to somewhat flexibly alter the build
1212
:language: yaml
1313
```
1414

15-
Using the `custom_build_steps` field, we can run arbitrary commands on the image as if they were in a Dockerfile. We start here by temporarily setting the user to `root`, as the default user in the Tesseract build process is `tesseractor` -- which does not have root privileges -- and then switch back to the `tesseractor` user at the very end. We then run commands directly on the shell via `RUN` commands. All these steps specified in `custom_build_steps` are executed at the very end of the build process, followed only by a last execution of `tesseract-runtime check` that checks that the runtime can be launched and the user-defined `tesseract_api` module can be imported.
15+
Using the `custom_build_steps` field, we can run arbitrary commands on the image as if they were in a Dockerfile. We run commands directly on the shell via `RUN` commands. All these steps specified in `custom_build_steps` are executed at the very end of the build process, followed only by a last execution of `tesseract-runtime check` that checks that the runtime can be launched and the user-defined `tesseract_api` module can be imported.

docs/content/using-tesseracts/advanced.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Advanced usage
22

3-
## File system I/O
3+
## File aliasing
44

55
The `tesseract` command can take care of
66
passing data from local disk
@@ -16,6 +16,18 @@ target path:
1616
$ tesseract run vectoradd apply --output-path /tmp/output @inputs.json
1717
```
1818

19+
## Volume mounts and user permissions
20+
21+
When mounting a volume into a Tesseract container, default behavior depends on the Docker engine being used. Specifically, Docker Desktop, Docker Engine, and Podman have different ways of handling user permissions for mounted volumes.
22+
23+
Tesseract tries to ensure that the container user has the same permissions as the host user running the `tesseract` command. This is done by setting the user ID and group ID of the container user to match those of the host user.
24+
25+
In cases where this fails or is not desired, you can explicitly set the user ID and group ID of the container user using the `--user` argument. This allows you to specify a different user or group for the container, which can be useful for ensuring proper permissions when accessing mounted volumes.
26+
27+
```{warning}
28+
In cases where the Tesseract user is neither `root` nor the local user / file owner, you may encounter permission issues when accessing files in mounted volumes. To resolve this, ensure that the user ID and group ID are set correctly using the `--user` argument, or modify the permissions of files to be readable by any user.
29+
```
30+
1931
## Passing environment variables to Tesseract containers
2032

2133
Through the optional `--env` argument, you can pass environment variables to Tesseracts.

examples/pyvista-arm64/tesseract_config.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,10 @@ build_config:
1313

1414
custom_build_steps:
1515
- |
16-
USER root
1716
# Python bindings into the Python site-packages directory
1817
RUN python_site=$(python -c "import site; print(site.getsitepackages()[0])") && \
1918
ln -s /usr/lib/python3/dist-packages/vtk* $python_site && \
2019
ls -l $python_site/vtk* && \
2120
python -c "import vtk"
2221
# Must install pyvista with --no-deps to avoid installing vtk (which we copied from the system)
2322
RUN pip install matplotlib numpy pillow pooch scooby && pip install --no-deps pyvista==0.44.1
24-
USER tesseractor

tesseract_core/sdk/cli.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -523,7 +523,8 @@ def serve(
523523
typer.Option(
524524
"--user",
525525
help=(
526-
"User to run the Tesseracts as e.g. '1000' or '1000:1000' (uid:gid)."
526+
"User to run the Tesseracts as e.g. '1000' or '1000:1000' (uid:gid). "
527+
"Defaults to the current user."
527528
),
528529
),
529530
] = None,
@@ -876,7 +877,10 @@ def run_container(
876877
str | None,
877878
typer.Option(
878879
"--user",
879-
help=("User to run the Tesseract as e.g. '1000' or '1000:1000' (uid:gid)."),
880+
help=(
881+
"User to run the Tesseract as e.g. '1000' or '1000:1000' (uid:gid). "
882+
"Defaults to the current user."
883+
),
880884
),
881885
] = None,
882886
) -> None:

tesseract_core/sdk/config.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,7 @@ class RuntimeConfig(BaseModel):
5050
docker_build_args: Annotated[
5151
tuple[str, ...], BeforeValidator(maybe_split_args)
5252
] = ()
53+
docker_run_args: Annotated[tuple[str, ...], BeforeValidator(maybe_split_args)] = ()
5354

5455
model_config = ConfigDict(frozen=True, extra="forbid")
5556

tesseract_core/sdk/docker_client.py

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,21 @@ def _get_executable(program: Literal["docker", "docker-compose"]) -> tuple[str,
3131
raise ValueError(f"Unknown program: {program}")
3232

3333

34+
def is_podman() -> bool:
35+
"""Check if the current environment is using Podman instead of Docker."""
36+
docker = _get_executable("docker")
37+
try:
38+
result = subprocess.run(
39+
[*docker, "version"],
40+
capture_output=True,
41+
text=True,
42+
check=True,
43+
)
44+
return "podman" in result.stdout.lower()
45+
except subprocess.CalledProcessError:
46+
return False
47+
48+
3449
@dataclass
3550
class Image:
3651
"""Image class to wrap Docker image details."""
@@ -514,6 +529,7 @@ def run(
514529
stdout: bool = True,
515530
stderr: bool = False,
516531
user: str | None = None,
532+
extra_args: list_[str] | None = None,
517533
) -> Container | tuple[bytes, bytes] | bytes:
518534
"""Run a command in a container from an image.
519535
@@ -535,16 +551,17 @@ def run(
535551
and the values are the container ports.
536552
stdout: If True, return stdout.
537553
stderr: If True, return stderr.
554+
environment: Environment variables to set in the container.
555+
extra_args: Additional arguments to pass to the `docker run` CLI command.
538556
539557
Returns:
540558
Container object if detach is True, otherwise returns list of stdout and stderr.
541559
"""
560+
config = get_config()
542561
docker = _get_executable("docker")
543562

544-
# If command is a type string and not list, make list
545563
if isinstance(command, str):
546564
command = [command]
547-
logger.debug(f"Running command: {command}")
548565

549566
optional_args = []
550567

@@ -586,15 +603,21 @@ def run(
586603
for host_port, container_port in ports.items():
587604
optional_args.extend(["-p", f"{host_port}:{container_port}"])
588605

589-
# Run with detached to get the container id of the running container.
606+
if extra_args is None:
607+
extra_args = []
608+
590609
full_cmd = [
591610
*docker,
592611
"run",
593612
*optional_args,
613+
*config.docker_run_args,
614+
*extra_args,
594615
image,
595616
*command,
596617
]
597618

619+
logger.debug(f"Running command: {full_cmd}")
620+
598621
result = subprocess.run(
599622
full_cmd,
600623
capture_output=True,

tesseract_core/sdk/engine.py

Lines changed: 42 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
ContainerError,
4040
Image,
4141
build_docker_image,
42+
is_podman,
4243
)
4344
from .exceptions import UserError
4445

@@ -570,6 +571,7 @@ def serve(
570571
no_compose: if True, do not use Docker Compose to serve the Tesseracts.
571572
service_names: list of service names under which to expose each Tesseract container on the shared network.
572573
user: user to run the Tesseracts as, e.g. '1000' or '1000:1000' (uid:gid).
574+
Defaults to the current user.
573575
574576
Returns:
575577
A string representing the Tesseract project ID.
@@ -597,6 +599,10 @@ def serve(
597599
)
598600
_validate_service_names(service_names)
599601

602+
if user is None:
603+
# Use the current user if not specified
604+
user = f"{os.getuid()}:{os.getgid()}" if os.name != "nt" else None
605+
600606
if no_compose:
601607
if len(images) > 1:
602608
raise ValueError(
@@ -639,15 +645,22 @@ def serve(
639645
if debug:
640646
logger.info(f"Debugpy server listening at http://{ping_ip}:{debugpy_port}")
641647

648+
parsed_volumes = _parse_volumes(volumes) if volumes else {}
649+
650+
extra_args = []
651+
if is_podman():
652+
extra_args.extend(["--userns", "keep-id"])
653+
642654
container = docker_client.containers.run(
643655
image=image_ids[0],
644656
command=["serve", *args],
645657
device_requests=gpus,
646658
ports=port_mappings,
647659
detach=True,
648-
volumes=volumes,
660+
volumes=parsed_volumes,
649661
user=user,
650662
environment=environment,
663+
extra_args=extra_args,
651664
)
652665
# wait for server to start
653666
timeout = 30
@@ -668,6 +681,12 @@ def serve(
668681

669682
return container.name
670683

684+
if is_podman() and volumes:
685+
raise UserError(
686+
"Podman does not support volume mounts in Docker Compose. "
687+
"Please use --no-compose / no_compose=True instead."
688+
)
689+
671690
template = _create_docker_compose_template(
672691
image_ids,
673692
host_ip,
@@ -751,13 +770,15 @@ def _create_docker_compose_template(
751770
else:
752771
gpu_settings = f"device_ids: {gpus}"
753772

773+
parsed_volumes = _parse_volumes(volumes) if volumes else {}
774+
754775
for i, image_id in enumerate(image_ids):
755776
service = {
756777
"name": service_names[i],
757778
"user": user,
758779
"image": image_id,
759780
"port": f"{ports[i]}:8000",
760-
"volumes": volumes,
781+
"volumes": parsed_volumes,
761782
"gpus": gpu_settings,
762783
"environment": {
763784
"TESSERACT_DEBUG": "1" if debug else "0",
@@ -818,8 +839,11 @@ def _parse_option(option: str):
818839
f"Invalid mount volume specification {option} "
819840
"(must be `/path/to/source:/path/totarget:(ro|rw)`)",
820841
)
821-
# Docker doesn't like paths like ".", so we convert to absolute path here
822-
source = str(Path(source).resolve())
842+
843+
is_local_mount = "/" in source or Path(source).exists()
844+
if is_local_mount:
845+
# Docker doesn't like paths like ".", so we convert to absolute path here
846+
source = str(Path(source).resolve())
823847
return source, {"bind": target, "mode": mode}
824848

825849
return dict(_parse_option(opt) for opt in options)
@@ -865,6 +889,7 @@ def run_tesseract(
865889
environment: list of environment variables to set in the container,
866890
in Docker format: key=value.
867891
user: user to run the Tesseract as, e.g. '1000' or '1000:1000' (uid:gid).
892+
Defaults to the current user.
868893
869894
Returns:
870895
Tuple with the stdout and stderr of the Tesseract.
@@ -880,6 +905,10 @@ def run_tesseract(
880905
else:
881906
parsed_volumes = _parse_volumes(volumes)
882907

908+
if user is None:
909+
# Use the current user if not specified
910+
user = f"{os.getuid()}:{os.getgid()}" if os.name != "nt" else None
911+
883912
for arg in args:
884913
if arg.startswith("-"):
885914
current_cmd = arg
@@ -901,7 +930,7 @@ def run_tesseract(
901930
f"Path {local_path} provided as output is not a directory"
902931
)
903932

904-
path_in_container = "/mnt/output"
933+
path_in_container = "/tesseract/output_data"
905934
arg = path_in_container
906935

907936
# Bind-mount directory
@@ -914,7 +943,9 @@ def run_tesseract(
914943
if not local_path.is_file():
915944
raise RuntimeError(f"Path {local_path} provided as input is not a file")
916945

917-
path_in_container = os.path.join("/mnt", f"payload{local_path.suffix}")
946+
path_in_container = os.path.join(
947+
"/tesseract/input_data", f"payload{local_path.suffix}"
948+
)
918949
arg = f"@{path_in_container}"
919950

920951
# Bind-mount file
@@ -923,6 +954,10 @@ def run_tesseract(
923954
current_cmd = None
924955
cmd.append(arg)
925956

957+
extra_args = []
958+
if is_podman():
959+
extra_args.extend(["--userns", "keep-id"])
960+
926961
# Run the container
927962
stdout, stderr = docker_client.containers.run(
928963
image=image,
@@ -935,6 +970,7 @@ def run_tesseract(
935970
remove=True,
936971
stderr=True,
937972
user=user,
973+
extra_args=extra_args,
938974
)
939975
stdout = stdout.decode("utf-8")
940976
stderr = stderr.decode("utf-8")

tesseract_core/sdk/templates/Dockerfile.base

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -63,12 +63,8 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
6363
&& rm -rf /var/lib/apt/lists/*
6464
{% endif %}
6565

66-
# Drop to a non-root user
67-
RUN groupadd -o -g 1000 tesseractor && \
68-
useradd -o -u 1000 -g 1000 --create-home -s /bin/bash tesseractor
6966
WORKDIR /tesseract
70-
RUN chown tesseractor:tesseractor /tesseract
71-
USER tesseractor
67+
RUN chmod 755 /tesseract
7268

7369
# Set environment variables
7470
ENV TESSERACT_NAME="{{ config.name | replace('"', '\\"') | replace('\n', '\\n') }}" \
@@ -78,7 +74,7 @@ ENV TESSERACT_NAME="{{ config.name | replace('"', '\\"') | replace('\n', '\\n')
7874

7975
# Copy only necessary files
8076
COPY --from=build_stage /python-env /python-env
81-
COPY --chown=1000:1000 "{{ tesseract_source_directory }}/tesseract_api.py" ${TESSERACT_API_PATH}
77+
COPY "{{ tesseract_source_directory }}/tesseract_api.py" ${TESSERACT_API_PATH}
8278

8379
ENV PATH="/python-env/bin:$PATH"
8480

@@ -94,6 +90,12 @@ COPY ["{{ tesseract_source_directory }}/{{ source_path }}", "{{ target_path }}"]
9490
{{ config.build_config.custom_build_steps | join("\n") }}
9591
{% endif %}
9692

93+
RUN mkdir -p /tesseract/input_data /tesseract/output_data && \
94+
chmod 755 /tesseract/input_data && \
95+
chmod 777 /tesseract/output_data
96+
97+
USER 1000:1000
98+
9799
# Final sanity check to ensure the runtime is installed and tesseract_api.py is valid
98100
{% if not config.build_config.skip_checks %}
99101
RUN ["tesseract-runtime", "check"]

tesseract_core/sdk/templates/docker-compose.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ services:
1414
{% endif %}
1515
{%- if service.volumes %}
1616
volumes:
17-
{%- for volume in service.volumes %}
18-
- {{ volume }}
17+
{%- for volume_source, volume_data in service.volumes.items() %}
18+
- {{ volume_source }}:{{ volume_data["bind"] }}
1919
{% endfor -%}
2020
{% endif %}
2121
healthcheck:
@@ -42,6 +42,7 @@ services:
4242
{% endif %}
4343
{% endfor %}
4444

45+
4546
{%- if docker_volumes %}
4647
volumes:
4748
{%- for name, external in docker_volumes.items() %}

tests/conftest.py

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -210,15 +210,12 @@ def docker_client():
210210
@pytest.fixture
211211
def docker_volume(docker_client):
212212
# Create the Docker volume
213-
volume = docker_client.volumes.create(name="docker_client_test_volume")
213+
volume_name = f"test_volume_{''.join(random.choices(string.ascii_lowercase + string.digits, k=8))}"
214+
volume = docker_client.volumes.create(name=volume_name)
214215
try:
215216
yield volume
216217
finally:
217-
try:
218-
volume.remove()
219-
except Exception:
220-
# already removed
221-
pass
218+
volume.remove(force=True)
222219

223220

224221
@pytest.fixture(scope="module")
@@ -449,6 +446,7 @@ def exists(project_id: str) -> bool:
449446

450447
mock_instance = MockedDocker()
451448
monkeypatch.setattr(engine, "docker_client", mock_instance)
449+
monkeypatch.setattr(engine, "is_podman", lambda: False)
452450
monkeypatch.setattr(
453451
tesseract_core.sdk.docker_client, "CLIDockerClient", MockedDocker
454452
)

0 commit comments

Comments
 (0)