Skip to content

Conversation

@VeeraRajasekhar
Copy link
Contributor

Description

Added the dockerfile, which can be used to create the ci-artifactory images.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Added a new file docker/Dockerfile

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Copy link
Collaborator

@wenchenvincent wenchenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the comments.

RUN apt update \
&& apt install -y nano wget ninja-build \
&& apt install -y python3 python3-pip git \
&& apt install -y sqlite3 libsqlite3-dev libfmt-dev libmsgpack-dev libsuitesparse-dev
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are those packages (sqlite3 and further) for?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall aotriton need sqlite3 long time ago, not sure about now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have derived this dockerfile from this https://github.com/ROCm/DeepLearningModels/blob/main/docker/pyt_semianalysis_models.ubuntu.amd.Dockerfile,

Will remove these, if these are not required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VeeraRajasekhar Those are needed for the semianalysis models. We don't need them for the CI images. We only need to keep those packages necessary for building TE and running CI tests.


RUN python3 -m pip install --upgrade pip
RUN pip install ninja cmake setuptools wheel
RUN pip install uv tabulate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are those for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is derived from https://github.com/ROCm/DeepLearningModels/blob/main/docker/pyt_semianalysis_models.ubuntu.amd.Dockerfile, I have removed these, will test the resultant docker after my changes

RUN apt update \
&& apt install -y nano wget ninja-build \
&& apt install -y python3 python3-pip git \
&& apt install -y sqlite3 libsqlite3-dev libfmt-dev libmsgpack-dev libsuitesparse-dev
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall aotriton need sqlite3 long time ago, not sure about now

Copy link
Collaborator

@ipanfilo ipanfilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why conversations are marked as resolved w/o any actual action?

@VeeraRajasekhar
Copy link
Contributor Author

Why conversations are marked as resolved w/o any actual action?

Some of them, I have resolved, some I have currently resolved in my local, just to keep track I will mark them resolved.

@wenchenvincent
Copy link
Collaborator

@VeeraRajasekhar Is this PR still needed?

@wenchenvincent
Copy link
Collaborator

@VeeraRajasekhar Could you remind me of what we had decided on this PR? It seemed that it is no longer relevant and we should close it.

@VeeraRajasekhar
Copy link
Contributor Author

Hi @ipanfilo, @wangye805

I have updated this PR with latest 7.2 docker file and moved to .github/scripts.

Let me know if I need to add an action to automate docker build and upload to our artifactory?

Thanks.

RUN pip install ipython pytest fire pydantic pybind11 ninja pandas
RUN apt-get update && apt-get install -y vim

ARG PYTORCH_ROCM_ARCH=gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1200;gfx1201;gfx950;gfx1151
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this arg needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, when I manually installed torch I mentioned this, I can remove this now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@@ -0,0 +1,27 @@
# TE CI Dockerfile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put copyright here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed


ARG PYTORCH_ROCM_ARCH=gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1200;gfx1201;gfx950;gfx1151

# Install flash-attention v2.8.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better make FA version arg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated



RUN pip install \
https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/jax_rocm7_pjrt-0.8.0%2Brocm7.2.0-py3-none-manylinux_2_28_x86_64.whl \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since base image is ARG, JAX wheels should also be built based on ARG cause their ROCm and Python versions should match. So in general there should be ARGs for ROCm version, base image, JAX version, JAX wheels

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@VeeraRajasekhar
Copy link
Contributor Author

I had to force push to include new FA 2.8.3 support commit and my changes for 7.2 support to run the CI.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants