一、引言
pipeline(管道)是huggingface transformers库中一种极简方式使用大模型推理的抽象,将所有大模型分为语音(Audio)、计算机视觉(Computer vision)、自然语言处理(NLP)、多模态(Multimodal)等4大类,28小类任务(tasks)。共计覆盖32万个模型
本文对pipeline进行整体介绍,之后本专栏以每个task为主题,分别介绍各种task使用方法。
二、pipeline库
2.1 概述
管道是一种使用模型进行推理的简单而好用的方法。这些管道是从库中抽象出大部分复杂代码的对象,提供了专用于多项任务的简单 API,包括命名实体识别、掩码语言建模、情感分析、特征提取和问答
。在使用上,主要有2种方法
- 使用task实例化pipeline对象
- 使用model实例化pipeline对象
2.2 使用task实例化pipeline对象
2.2.1 基于task实例化“自动语音识别”
自动语音识别的task为automatic-speech-recognition:
import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
from transformers import pipeline
speech_file = "./output_video_enhanced.mp3"
pipe = pipeline(task="automatic-speech-recognition")
result = pipe(speech_file)
print(result)
2.2.2 task列表
task共计28类,按首字母排序,列表如下,直接替换2.2.1代码中的pipeline的task即可应用:
"audio-classification"
:将返回一个AudioClassificationPipeline。"automatic-speech-recognition"
:将返回一个AutomaticSpeechRecognitionPipeline。"depth-estimation"
:将返回一个DepthEstimationPipeline。"document-question-answering"
:将返回一个DocumentQuestionAnsweringPipeline。"feature-extraction"
:将返回一个FeatureExtractionPipeline。"fill-mask"
:将返回一个FillMaskPipeline:。"image-classification"
:将返回一个ImageClassificationPipeline。"image-feature-extraction"
:将返回一个ImageFeatureExtractionPipeline。"image-segmentation"
:将返回一个ImageSegmentationPipeline。"image-to-image"
:将返回一个ImageToImagePipeline。"image-to-text"
:将返回一个ImageToTextPipeline。"mask-generation"
:将返回一个MaskGenerationPipeline。"object-detection"
:将返回一个ObjectDetectionPipeline。"question-answering"
:将返回一个QuestionAnsweringPipeline。"summarization"
:将返回一个SummarizationPipeline。"table-question-answering"
:将返回一个TableQuestionAnsweringPipeline。"text2text-generation"
:将返回一个Text2TextGenerationPipeline。"text-classification"
("sentiment-analysis"
可用别名):将返回一个 TextClassificationPipeline。"text-generation"
:将返回一个TextGenerationPipeline:。"text-to-audio"
("text-to-speech"
可用别名):将返回一个TextToAudioPipeline:。"token-classification"
("ner"
可用别名):将返回一个TokenClassificationPipeline。"translation"
:将返回一个TranslationPipeline。"translation_xx_to_yy"
:将返回一个TranslationPipeline。"video-classification"
:将返回一个VideoClassificationPipeline。"visual-question-answering"
:将返回一个VisualQuestionAnsweringPipeline。"zero-shot-classification"
:将返回一个ZeroShotClassificationPipeline。"zero-shot-image-classification"
:将返回一个ZeroShotImageClassificationPipeline。"zero-shot-audio-classification"
:将返回一个ZeroShotAudioClassificationPipeline。"zero-shot-object-detection"
:将返回一个ZeroShotObjectDetectionPipeline。
2.2.3 task默认模型
针对每一个task,pipeline默认配置了模型,可以通过pipeline源代码查看:
SUPPORTED_TASKS = {
"audio-classification": {
"impl": AudioClassificationPipeline,
"tf": (),
"pt": (AutoModelForAudioClassification,) if is_torch_available() else (),
"default": {"model": {"pt": ("superb/wav2vec2-base-superb-ks", "372e048")}},
"type": "audio",
},
"automatic-speech-recognition": {
"impl": AutomaticSpeechRecognitionPipeline,
"tf": (),
"pt": (AutoModelForCTC, AutoModelForSpeechSeq2Seq) if is_torch_available() else (),
"default": {"model": {"pt": ("facebook/wav2vec2-base-960h", "55bb623")}},
"type": "multimodal",
},
"text-to-audio": {
"impl": TextToAudioPipeline,
"tf": (),
"pt": (AutoModelForTextToWaveform, AutoModelForTextToSpectrogram) if is_torch_available() else (),
"default": {"model": {"pt": ("suno/bark-small", "645cfba")}},
"type": "text",
},
"feature-extraction": {
"impl": FeatureExtractionPipeline,
"tf": (TFAutoModel,) if is_tf_available() else (),
"pt": (AutoModel,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("distilbert/distilbert-base-cased", "935ac13"),
"tf": ("distilbert/distilbert-base-cased", "935ac13"),
}
},
"type": "multimodal",
},
"text-classification": {
"impl": TextClassificationPipeline,
"tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),
"pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"),
"tf": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"),
},
},
"type": "text",
},
"token-classification": {
"impl": TokenClassificationPipeline,
"tf": (TFAutoModelForTokenClassification,) if is_tf_available() else (),
"pt": (AutoModelForTokenClassification,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"),
"tf": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"),
},
},
"type": "text",
},
"question-answering": {
"impl": QuestionAnsweringPipeline,
"tf": (TFAutoModelForQuestionAnswering,) if is_tf_available() else (),
"pt": (AutoModelForQuestionAnswering,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"),
"tf": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"),
},
},
"type": "text",
},
"table-question-answering": {
"impl": TableQuestionAnsweringPipeline,
"pt": (AutoModelForTableQuestionAnswering,) if is_torch_available() else (),
"tf": (TFAutoModelForTableQuestionAnswering,) if is_tf_available() else (),
"default": {
"model": {
"pt": ("google/tapas-base-finetuned-wtq", "69ceee2"),
"tf": ("google/tapas-base-finetuned-wtq", "69ceee2"),
},
},
"type": "text",
},
"visual-question-answering": {
"impl": VisualQuestionAnsweringPipeline,
"pt": (AutoModelForVisualQuestionAnswering,) if is_torch_available() else (),
"tf": (),
"default": {
"model": {"pt": ("dandelin/vilt-b32-finetuned-vqa", "4355f59")},
},
"type": "multimodal",
},
"document-question-answering": {
"impl": DocumentQuestionAnsweringPipeline,
"pt": (AutoModelForDocumentQuestionAnswering,) if is_torch_available() else (),
"tf": (),
"default": {
"model": {"pt": ("impira/layoutlm-document-qa", "52e01b3")},
},
"type": "multimodal",
},
"fill-mask": {
"impl": FillMaskPipeline,
"tf": (TFAutoModelForMaskedLM,) if is_tf_available() else (),
"pt": (AutoModelForMaskedLM,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("distilbert/distilroberta-base", "ec58a5b"),
"tf": ("distilbert/distilroberta-base", "ec58a5b"),
}
},
"type": "text",
},
"summarization": {
"impl": SummarizationPipeline,
"tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),
"pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),
"default": {
"model": {"pt": ("sshleifer/distilbart-cnn-12-6", "a4f8f3e"), "tf": ("google-t5/t5-small", "d769bba")}
},
"type": "text",
},
# This task is a special case as it's parametrized by SRC, TGT languages.
"translation": {
"impl": TranslationPipeline,
"tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),
"pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),
"default": {
("en", "fr"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
("en", "de"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
("en", "ro"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
},
"type": "text",
},
"text2text-generation": {
"impl": Text2TextGenerationPipeline,
"tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),
"pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),
"default": {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
"type": "text",
},
"text-generation": {
"impl": TextGenerationPipeline,
"tf": (TFAutoModelForCausalLM,) if is_tf_available() else (),
"pt": (AutoModelForCausalLM,) if is_torch_available() else (),
"default": {"model": {"pt": ("openai-community/gpt2", "6c0e608"), "tf": ("openai-community/gpt2", "6c0e608")}},
"type": "text",
},
"zero-shot-classification": {
"impl": ZeroShotClassificationPipeline,
"tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),
"pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("facebook/bart-large-mnli", "c626438"),
"tf": ("FacebookAI/roberta-large-mnli", "130fb28"),
},
"config": {
"pt": ("facebook/bart-large-mnli", "c626438"),
"tf": ("FacebookAI/roberta-large-mnli", "130fb28"),
},
},
"type": "text",
},
"zero-shot-image-classification": {
"impl": ZeroShotImageClassificationPipeline,
"tf": (TFAutoModelForZeroShotImageClassification,) if is_tf_available() else (),
"pt": (AutoModelForZeroShotImageClassification,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("openai/clip-vit-base-patch32", "f4881ba"),
"tf": ("openai/clip-vit-base-patch32", "f4881ba"),
}
},
"type": "multimodal",
},
"zero-shot-audio-classification": {
"impl": ZeroShotAudioClassificationPipeline,
"tf": (),
"pt": (AutoModel,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("laion/clap-htsat-fused", "973b6e5"),
}
},
"type": "multimodal",
},
"image-classification": {
"impl": ImageClassificationPipeline,
"tf": (TFAutoModelForImageClassification,) if is_tf_available() else (),
"pt": (AutoModelForImageClassification,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("google/vit-base-patch16-224", "5dca96d"),
"tf": ("google/vit-base-patch16-224", "5dca96d"),
}
},
"type": "image",
},
"image-feature-extraction": {
"impl": ImageFeatureExtractionPipeline,
"tf": (TFAutoModel,) if is_tf_available() else (),
"pt": (AutoModel,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("google/vit-base-patch16-224", "3f49326"),
"tf": ("google/vit-base-patch16-224", "3f49326"),
}
},
"type": "image",
},
"image-segmentation": {
"impl": ImageSegmentationPipeline,
"tf": (),
"pt": (AutoModelForImageSegmentation, AutoModelForSemanticSegmentation) if is_torch_available() else (),
"default": {"model": {"pt": ("facebook/detr-resnet-50-panoptic", "fc15262")}},
"type": "multimodal",
},
"image-to-text": {
"impl": ImageToTextPipeline,
"tf": (TFAutoModelForVision2Seq,) if is_tf_available() else (),
"pt": (AutoModelForVision2Seq,) if is_torch_available() else (),
"default": {
"model": {
"pt": ("ydshieh/vit-gpt2-coco-en", "65636df"),
"tf": ("ydshieh/vit-gpt2-coco-en", "65636df"),
}
},
"type": "multimodal",
},
"object-detection": {
"impl": ObjectDetectionPipeline,
"tf": (),
"pt": (AutoModelForObjectDetection,) if is_torch_available() else (),
"default": {"model": {"pt": ("facebook/detr-resnet-50", "2729413")}},
"type": "multimodal",
},
"zero-shot-object-detection": {
"impl": ZeroShotObjectDetectionPipeline,
"tf": (),
"pt": (AutoModelForZeroShotObjectDetection,) if is_torch_available() else (),
"default": {"model": {"pt": ("google/owlvit-base-patch32", "17740e1")}},
"type": "multimodal",
},
"depth-estimation": {
"impl": DepthEstimationPipeline,
"tf": (),
"pt": (AutoModelForDepthEstimation,) if is_torch_available() else (),
"default": {"model": {"pt": ("Intel/dpt-large", "e93beec")}},
"type": "image",
},
"video-classification": {
"impl": VideoClassificationPipeline,
"tf": (),
"pt": (AutoModelForVideoClassification,) if is_torch_available() else (),
"default": {"model": {"pt": ("MCG-NJU/videomae-base-finetuned-kinetics", "4800870")}},
"type": "video",
},
"mask-generation": {
"impl": MaskGenerationPipeline,
"tf": (),
"pt": (AutoModelForMaskGeneration,) if is_torch_available() else (),
"default": {"model": {"pt": ("facebook/sam-vit-huge", "997b15")}},
"type": "multimodal",
},
"image-to-image": {
"impl": ImageToImagePipeline,
"tf": (),
"pt": (AutoModelForImageToImage,) if is_torch_available() else (),
"default": {"model": {"pt": ("caidas/swin2SR-classical-sr-x2-64", "4aaedcb")}},
"type": "image",
},
}
2.3 使用model实例化pipeline对象
2.3.1 基于model实例化“自动语音识别”
如果不想使用task中默认的模型,可以指定huggingface中的模型:
import os
os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
from transformers import pipeline
speech_file = "./output_video_enhanced.mp3"
#transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-medium")
pipe = pipeline(model="openai/whisper-medium")
result = pipe(speech_file)
print(result)
2.3.2 查看model与task的对应关系
可以登录https://huggingface.co/tasks查看
三、总结
本文为transformers之pipeline专栏的第0篇,后面会以每个task为一篇,共计讲述28+个tasks的用法,通过28个tasks的pipeline使用学习,可以掌握语音、计算机视觉、自然语言处理、多模态乃至强化学习等30w+个huggingface上的开源大模型。让你成为大模型领域的专家!
如何学习大模型技术,享受AI红利?
面对AI大模型开发领域的复杂与深入,精准学习显得尤为重要。一份系统的技术路线图,详尽的全套学习资料,不仅能够帮助开发者清晰地了解从入门到精通所需掌握的知识点,还能提供一条高效、有序的学习路径。
无论是初学者,还是希望在某一细分领域深入发展的资深开发者,这样的学习路线图都能够起到事半功倍的效果。它不仅能够节省大量时间,避免无效学习,更能帮助开发者建立系统的知识体系,为职业生涯的长远发展奠定坚实的基础。
这份完整版的AI大模型全套学习资料已经上传CSDN,朋友们如果需要可以微信扫描下方CSDN官方认证二维码免费领取【保证100%免费】
大模型知识脑图
为了成为更好的 AI大模型 开发者,这里为大家提供了总的路线图。它的用处就在于,你可以按照上面的知识点去找对应的学习资源,保证自己学得较为全面。
经典书籍阅读
阅读AI大模型经典书籍可以帮助读者提高技术水平,开拓视野,掌握核心技术,提高解决问题的能力,同时也可以借鉴他人的经验。对于想要深入学习AI大模型开发的读者来说,阅读经典书籍是非常有必要的。
实战案例
光学理论是没用的,要学会跟着一起敲,要动手实操,才能将自己的所学运用到实际当中去,这时候可以搞点实战案例来学习。
面试资料
我们学习AI大模型必然是想找到高薪的工作,下面这些面试题都是总结当前最新、最热、最高频的面试题,并且每道题都有详细的答案,面试前刷完这套面试题资料,小小offer,不在话下
640套AI大模型报告合集
这套包含640份报告的合集,涵盖了AI大模型的理论研究、技术实现、行业应用等多个方面。无论您是科研人员、工程师,还是对AI大模型感兴趣的爱好者,这套报告合集都将为您提供宝贵的信息和启示。
这份完整版的AI大模型全套学习资料已经上传CSDN,朋友们如果需要可以微信扫描下方CSDN官方认证二维码免费领取【保证100%免费】