mirror of
https://github.com/skindhu/Build-A-Large-Language-Model-CN.git
synced 2026-07-01 01:10:17 +08:00
Merge pull request #14 from voltage-poppy/patch-1
typo fix: <unk> corrected to <|unk|>
This commit is contained in:
@@ -346,7 +346,7 @@ KeyError: 'Hello'
|
||||
|
||||
<img src="../Image/chapter2/figure2.10.png" width="75%" />
|
||||
|
||||
现在,让我们修改词汇表,将这两个特殊token <unk> 和 <|endoftext|> 包含在内,方法是将它们添加到我们在上一节中创建的唯一单词列表中:
|
||||
现在,让我们修改词汇表,将这两个特殊token <|unk|> 和 <|endoftext|> 包含在内,方法是将它们添加到我们在上一节中创建的唯一单词列表中:
|
||||
|
||||
```python
|
||||
all_tokens = sorted(list(set(preprocessed)))
|
||||
|
||||
Reference in New Issue
Block a user