winglian commited on
Commit
a21935f
1 Parent(s): 8966a6f

add to docs (#703)

Browse files
Files changed (2) hide show
  1. README.md +2 -0
  2. docs/faq.md +14 -0
README.md CHANGED
@@ -901,6 +901,8 @@ CUDA_VISIBLE_DEVICES="" python3 -m axolotl.cli.merge_lora ...
901
 
902
  ## Common Errors 🧰
903
 
 
 
904
  > If you encounter a 'Cuda out of memory' error, it means your GPU ran out of memory during the training process. Here's how to resolve it:
905
 
906
  Please reduce any below
 
901
 
902
  ## Common Errors 🧰
903
 
904
+ See also the [FAQ's](./docs/faq.md).
905
+
906
  > If you encounter a 'Cuda out of memory' error, it means your GPU ran out of memory during the training process. Here's how to resolve it:
907
 
908
  Please reduce any below
docs/faq.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Axolotl FAQ's
2
+
3
+
4
+ > The trainer stopped and hasn't progressed in several minutes.
5
+
6
+ Usually an issue with the GPU's communicating with each other. See the [NCCL doc](../docs/nccl.md)
7
+
8
+ > Exitcode -9
9
+
10
+ This usually happens when you run out of system RAM.
11
+
12
+ > Exitcode -7 while using deepspeed
13
+
14
+ Try upgrading deepspeed w: `pip install -U deepspeed`