Abandoning Softmax, the first large linear attention Transformer model: 175 billion parameters, better speed and accuracy - Code World

Abandoning Softmax, the first large linear attention Transformer model: 175 billion parameters, better speed and accuracy

News 2023-08-01 19:58:10 views: null

NoSuchKey

Guess you like

Origin blog.csdn.net/qq_27590277/article/details/131989985

Abandoning Softmax, the first large linear attention Transformer model: 175 billion parameters, better speed and accuracy

TransNormerLLM: The first large model based on linear attention

Baichuan Intelligent released the first closed-source large model with 53 billion parameters, catching up with GPT-3.5 this year

Exploration of linear Attention: Must Attention have a Softmax?

The vertical large-scale model is getting better and better, decoding the first large-scale model in the field of intelligent proofreading in China "Midu Wenxiu"

3.6 trillion tokens, 340 billion parameters, details of Google's large model PaLM 2 exposed

A large model with 7 billion parameters running on the iPhone, the latest achievement from Chen Tianqi's team

Introduction to LLaMA: An introduction to the official website of a large-scale language model with 65 billion parameters

I installed a large model with 7 billion parameters on the iPhone, the latest achievement from Chen Tianqi's team

Wang Xiaochuan's big model debut! 7 billion parameters dominate the list, and Qingbei is the first to use it｜Exclusive interview

With 65 billion parameters, 8 GPUs can fine-tune the parameters of the large model. The latest paper of Qiu Xipeng's team is here!

FLatten Transformer: Focused Linear Attention Module

Deep Learning and Large Model Transformer

ChatGPT VS HKUST Xunfei Xunhuo large model use first experience, who is better?

The first fully quantized Vision Transformer method FQ-ViT, the large model is not far away

The first fully quantized Vision Transformer method FQ-ViT, AI large model landing is not far away!

Tencent Tang Daosheng: With over 100 billion parameters and over 2 trillion tokens, Tencent’s Hunyuan large model is fully open to the industry

Today, MathGPT, China's first 100-billion-dollar large-scale mathematical model, goes online and starts public beta testing

Fudan Qiu Xipeng's new work: single-machine fine-tuning of a large model with 65 billion parameters, industry insiders: it is of great significance to the popularization of large models...

Accuracy and speed are perfectly balanced, and the latest image segmentation SOTA model is released! ! !

[NLP]LLM--parameters of the transformer model

DeepSpeed Chat: One-click RLHF training, make your ChatGPT-like 100 billion large model speed up and save money by 15 times

One week AIGC丨China’s first large model unicorn with a valuation of 10 billion was born, Tencent Hunyuan and Ant Financial’s large models were unveiled...

Huawei's revenue in the first half of the year was 310.9 billion, with a net profit of 46.6 billion; my country successfully launched the first AI satellite; Xiaomi's large model was exposed for the first time丨Daily events...

Powerful like GPT-3, 175 billion parameters can't handle Chinese

TransNormerLLM: The first large model based on linear attention

TransNormerLLM: The first large model based on linear attention

Huawei's latest large model is here! Pangu 3.0 came out, with a scale of 100 billion parameters and 3 trillion tokens, saying "do not write poetry but do things"

65 billion parameters, training soared by 38%! The best practice of LLaMA basic large model reproduction is open source, and GitHub has won 30k stars

The most powerful large language model with 7 billion parameters so far: the open source and commercially available RedPajam 7B full version is released!

Recommended

Ranking

Source code interpretation of docValue in lucene (7) - reading of SortedDocValue

Performance Optimization | 30 Ge Java performance optimization techniques, would you?

In-depth understanding of the working principle of Cache

pytorch basic syntax

Arm Development Studio latest version 2020.0 release! Download the attached link

TCP system parameter settings

PhantomJS simple to use

Unsupported ONNX opset version: 11

I won't say much about what I'm doing, I understand everything.

Summary of several shortcut operations of visual studio programming C#-improve efficiency

Daily

More

2025-04-25(0)

2025-04-24(0)

2025-04-23(0)

2025-04-22(0)

2025-04-21(0)

2025-04-20(0)

2025-04-19(0)

2025-04-18(0)

2025-04-17(0)

2025-04-16(0)