-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Simplify and relax dependencies #456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…plify-dependencies
- Remove `mpi4py` and `einops` - Update `numpy`, `wandb`, `transformers`, `lm_dataformat` and `ftfy`
- Revert `lm_dataformat` version back down to 0.0.19
- Update requirements files to specify ranges of package versions - Make `wandb` optional
Stella athena patch 1 1
Restores inoperability with pre-Volta hardware.
This reverts commit 21ba55c.
This looks good to me. @EricHallahan’s explanation of how he decided which imports go inside I haven’t run this myself, as I’m under the impression that you’ve been testing it extensively. |
Aside from the above ^ lgtm 🚀 |
@EricHallahan is there something you’re waiting on to merge this? |
I am working on verifying that the behavior with regards to missing Weights & Biases dependencies is what I intended/makes sense, and I also need to add corresponding documentation. I expect to have this ready to merge sometime early tomorrow. |
Eliminate the usage of `shortuuid`
- Fix CITATION.cff - Update README.md to reflect changes to wandb installation - Remove `shortuuid` from requirements-wandb.txt
…/gpt-neox into simplify-dependencies
Enable wandb by default for the EleutherAI cluster.
@@ -22,14 +22,9 @@ def _get_cuda_bare_metal_version(cuda_dir): | |||
srcpath = Path(__file__).parent.absolute() | |||
cc_flag = [] | |||
_, bare_metal_major, _ = _get_cuda_bare_metal_version(cpp_extension.CUDA_HOME) | |||
if int(bare_metal_major) >= 11: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you explain what this change is doing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code forces building sm_70
(Volta) and sm_80
(Ampere) and no other architectures. This means the built extensions will fail to execute on older architectures like Kepler, Maxwell, or Pascal. By default CUDAExtension
will build for the local hardware and so therefore it is safe to remove this code. Compiling for non-local hardware is as easy as setting TORCH_CUDA_ARCH_LIST
in the environment prior to running the script should it be needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good. Did you test the kernels built with this change
- still work the same on the A100s, and
- work on older architectures?
Re: the TORCH_CUDA_ARCH_LIST
thing, should we add a comment saying as such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tested it many times and have not seen any issues, and yes, we should absolutely add documentation about what to do if you need to force the arch.
try: | ||
import wandb | ||
except ModuleNotFoundError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will wandb not being present not cause some errors later in training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If wandb
is not installed it sets use_wandb
to False
. All subsequent Weights & Biases code relies on use_wandb
being True
, and therefore it never executes anything imported from wandb
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure of the motivation for the changed to fused_kernels.py
Also, I really don't like the requirements being so granular like this. I don't see the need for separate requirements files for wandb / tensorboard
If someone doesn't have WandB available and doesn't wish to use it, how would you prefer they proceed? |
The reason sparse attention and onebitadam were separated out is because they're optional dependencies which are also a bit of a pain to install (cupy-cuda requires you specify the cuda version, and triton used to break often), so removing them from requirements.txt reduced complexity for most users. As far as I'm aware - there are no such problems with the installation of wandb. You can just pip install it. Including it in requirements.txt does nothing more than take up a few kb more space on the user's device. Including it in a separate file may mean someone has to take a few minutes to figure out why their logging isn't working, and go back and realise that it's actually not in requirements.txt and you need to install it separately. I can't see a counter scenario where it would actually save time / decrease complexity. I don't know why / how |
The only reason why I separated it out is because it mirrored how TensorBoard was handled. If we don't think that makes sense I'm happy to change it. A counterargument to instructing users to "just install |
Maybe too much for this PR but related. Any reason not to use pip-compile to create the requirements file and actually fix dependencies? |
I had originally planned this PR with a larger scope which included more granular dependency management (such as only installing the dependencies for evaluation if the user specifies they are interested in evaluation) and a |
Closing as it’s unfixably far behind and better done from scratch |
The requirements files specified in ./requirements have historically been strict as to prevent CI Docker images changing without our prior knowledge. However, this places a burden on users who would like to run GPT-NeoX on a host without containerization, and many non-critical packages are unnecessarily strictly specified as well. This motivates the following changes in this PR:
einops
andmpi4py
packageswandb
optional, making installation accessible through a new requirements file ./requirements/requirements-wandb.txt