不乱码、下载 Transformers 模型 (抱抱脸、model)

不乱码、下载 Transformers 模型 (抱抱脸、model)

概述

目的: 因为需要对预训练模型等做一些查看、转移操作,不想要乱码,不想频繁下载模型等;

  • a. (可不乱码) 使用 huggingface_hub 的 snapshot_download(推荐);
  • b. (不乱码) 使用 wget 手动下载;
  • c. 使用 git lfs;
  • d. 使用 本地已经下载好的.

1. (可不乱码) 使用 huggingface_hub 的 snapshot_download

配置 local_dir_use_symlinks=False就不乱码了;

from huggingface_hub import snapshot_download

# repo_id = "ziqingyang/chinese-alpaca-lora-7b"
repo_id = "nghuyong/ernie-3.0-micro-zh"
local_dir = repo_id.replace("/", "_")
cache_dir = local_dir + "/cache"
snapshot_download(cache_dir=cache_dir,
                  local_dir=local_dir,
                  repo_id=repo_id,
                  local_dir_use_symlinks=False,  # 不转为缓存乱码的形式, auto, Small files (<5MB) are duplicated in `local_dir` while a symlink is created for bigger files.
                  resume_download=True,
                  allow_patterns=["*.model", "*.json", "*.bin",
                                   "*.py", "*.md", "*.txt"],
                  ignore_patterns=["*.safetensors", "*.msgpack",
                                   "*.h5", "*.ot", ],
                  )

2. (不乱码)使用 wget 手动下载

但是现在大模型的权重太大了,一般会拆分成比较多的文件,下载速度也有点慢;
根据地址下载, https://huggingface.co/models/{ {repo_id}}
下载路径为: https://huggingface.co/{ {repo_id}}/resolve/main/{ {具体的文件名}}
以为repo_id=="THUDM/chatglm-6b"为例子
网址: https://huggingface.co/THUDM/chatglm-6b

比如linux可以直接使用wget
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/README.md
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/config.json
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/configuration_chatglm.py
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/tokenizer_config.json
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/ice_text.model
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/quantization.py
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/tokenization_chatglm.py
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/modeling_chatglm.py

wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model.bin.index.json

wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00001-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00002-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00003-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00004-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00005-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00006-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00007-of-00008.bin
wget https://huggingface.co/THUDM/chatglm-6b/resolve/main/pytorch_model-00008-of-00008.bin

3. 使用 git lfs

安装git lfs:  git lfs install
下载模型:  git clone https://huggingface.co/THUDM/chatglm-6b

4. 使用 已经下载好的.


本地已经下载好的可以使用, 也可以转移模型目录,
默认windows地址在: C:\Users\{
    
    {
    
    账户}}\.cache\huggingface\hub
默认linux地址在: {
    
    {
    
    账户}}/.cache\huggingface\hub

from transformers import BertTokenizer, BertModel
repo_id = "nghuyong/ernie-3.0-micro-zh"
cache_dir = {
    
    {
    
    填实际地址}}
tokenizer = BertTokenizer.from_pretrained(repo_id, cache_dir=cache_dir)
model = BertModel.from_pretrained(repo_id, cache_dir=cache_dir)

参考

希望对你有所帮助!
lan.zhihu.com/p/475260268)

希望对你有所帮助!

猜你喜欢

转载自blog.csdn.net/rensihui/article/details/130135178