Skip to content

Commit 39acd5f

Browse files
committed
updated quick_docs
Signed-off-by: eplatero <[email protected]>
1 parent 58458c0 commit 39acd5f

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/source/quick_start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ qeff_model.generate(prompts=["My name is"])
151151
End to End demo examples for various models are available in **notebooks** directory. Please check them out.
152152

153153
### Draft-Based Speculative Decoding
154-
Draft-based speculative decoding is the approach where a small Draft Language Model (DLM) makes `num_speculative_tokens` autoregressive speculations ahead of the Target Language Model (TLM). The objective is to predict what the TLM would have predicted if it would have been used instead of the DLM. This approach is beneficial when the autoregressive decode phase of the TLM is memory bound and thus, we can leverage the extra computing resources of our hardware by batching the speculations of the DLM as an input to TLM to validate the speculations.
154+
Draft-based speculative decoding is a technique where a small Draft Language Model (DLM) makes `num_speculative_tokens` autoregressive speculations ahead of the Target Language Model (TLM). The objective is to predict what the TLM would have predicted if it would have been used instead of the DLM. This approach is beneficial when the autoregressive decode phase of the TLM is memory bound and thus, we can leverage the extra computing resources of our hardware by batching the speculations of the DLM as an input to TLM to validate the speculations.
155155

156156
To export and compile both DLM/TLM, add corresponding `is_tlm` and `num_speculative_tokens` for TLM and export DLM as you would any other QEfficient LLM model:
157157

0 commit comments

Comments
 (0)