神经网络中的正则化 - 代码天地

神经网络中的正则化

企业开发 2022-06-26 22:22:25 阅读次数: 0

本文已参与「新人创作礼」活动，一起开启掘金创作之路。

Adding regularization will often help To prevent overfitting problem (high variance problem ).

1. Logistic regression

回忆一下训练时的优化目标函数

\min \limits_{w,b}J\left(w,b\right), \ \ \ \ w\in\mathbb{R}^{n_x},b\in\mathbb{R} \tag{1-1}

其中

J\left(w,b\right)=\frac{1}{m}\sum_{i=1}^{m}L\left(\hat y^{(i)},y^{(i)}\right)\\ \tag{1-2}

$L_2 \ \ regularization$ (most commonly used)：

其中

Why do we regularize just the parameter w? Because w Is usually a high dimensional parameter vector while b is A scalar. Almost all The parameters are in w rather than b.
$L_1 \ \ regularization$

J\left(w,b\right)=\frac{1}{m}\sum_{i=1}^{m}L\left(\hat y^{(i)},y^{(i)}\right)+\frac{\lambda}{m}\left\lvert w \right\rvert_1\tag{1-5}

其中

\left\lvert w \right\rvert_1=\sum_j^{n_x}\left\lvert w_j \right\rvert \tag{1-6}

w will end up being sparse. In other words the w vector will have a lot of zeros in it. This can help with compressing the model a little.

2. Neural network "Frobenius norm"

其中

\left\lVert w^{[l]} \right\rVert_F^2=\sum_i^{n^{[l-1]}}\sum_j^{n^{[l]}}\left(w_{ij}\right)^2 \tag{2-2}

$L_2$ regulation is also called Weight decay:

扫描二维码关注公众号，回复： 14326363 查看本文章

\begin{aligned} dw^{[l]}&=\left(from\ backprop\right)+\frac{\lambda}{m}w^{[l]}\\ w^{l}:&=w^{[l]}-\alpha dw^{[l]}\\ &=\left(1-\frac{\alpha\lambda}{m}\right)w^{[l]}-\alpha(from\ backprop)\\ \tag{2-3} \end{aligned}

能够防止权重 $w$ 过大，从而避免过拟合

3. inverted dropout

对于不同的训练样本都可以随机消除一部分结点
反向随机失活（前向和后向都需要dropout）：

\begin{aligned} d^3&=np.random.rand(a_3.shape[0],a_3.shape[1]) < keep.prob\\ a^3&=np.multiply(a_3,d_3)\ \ \ \#a3*d3, element\ wise\ multiplication\\ a^3/&=keep.prob\ \ \ \#in\ order\ to\ not\ reduce\ the\ expected\ value\ of\ a^3\ \ inverted\ dropout\\ z^{[4]}&=w^{[4]}a^{[3]}+b^{[4]}\\ z^{[4]}/&=keep.prob\\ \tag{3-1} \end{aligned}

this inverted dropout technique by dividing by the keep.prob, it ensures that the expected value of a3 remains the same. This makes test time easier because you have less of a scaling problem. 测试时不需要使用drop out

猜你喜欢

转载自juejin.im/post/7109128137614721032

聊聊神经网络中的正则化

神经网络中的正则化

神经网络正则化

利用Keras，在神经网络中实现正则化

深度神经网络之正则化

深度神经网络（DNN）的正则化

学习笔记—神经网络与正则化

神经网络优化-正则化&DropOut

正则化、dropout深层神经网络

神经网络搭建（一、正则化）

神经网络 07(正则化)

神经网络优化----正则化（正则化损失函数）

神经网络与深度学习(四) —— 网络优化与正则化

神经网络的初始化与正则化

正则化与参数初始化对神经网络的影响

Batch Norm 对神经网络中的每一层进行正则化(未完成)

神经网络损失函数中的正则化项L1和L2

谷歌提出新型正则化方法，让深度神经网络克服大数据中的噪声

闲话深度神经网络中的正则化方法之一：Dropout

Dropout正则化和其他方法减少神经网络中的过拟合

从MAP角度理解神经网络训练过程中的正则化

神经网络中的常用损失函数以及正则化缓解过拟合

神经网络中的归一化

吴裕雄 python 神经网络——TensorFlow训练神经网络：不使用正则化

【深度学习_2.1.2】神经网络正则化

6.14关于神经网络正则化的学习

深度学习神经网络中正则化的使用

正则化对深层神经网络的影响分析

深度学习笔记4：深度神经网络的正则化

机器学习&神经网络—模型评估、正则化

今日推荐

Electron中的关于静态资源加载问题解决方案

《Cursor-AI编程》基础篇-界面指南

《Cursor-AI编程》基础篇-Tab代码智能补充

《Cursor-AI编程》基础篇-Composer功能详解

《Cursor-AI编程》基础篇-Chat功能详解

《Cursor-AI编程》进阶篇-自定义模型

《Cursor-AI编程》进阶篇-上下文详解

【大模型系列篇】最强检索增强技术GraphRAG基本原理详解

【大模型系列篇】基于Ollama和GraphRAG v2.0.0快速构建知识图谱

解释什么是迁移学习？在 CNN 中如何应用？（面试题200合集，高频、关键）

解释数据增强（Data Augmentation）的概念和方法（（面试题200合集，高频、关键））

揭秘大模型“魔法”：Function Calling 让 AI 不止会说，更能“做”！

周排行

ConfigurationClassParser类的parse方法源码解析

基础大讲堂-java 位运算符

ConsecutiveInteger判断给定的整数n能否表示成连续的m(m>1)个正整数之和

多项式问题之六——多项式快速幂

Spring Security技术栈开发企业级认证与授权（四）RESTful API服务异常处理

Linux基础命令---apachectl

MATLAB中的线性插值

Unity编辑器拓展之十七：NGUI ComponentSelector增加搜索框

SqlServer 备份还原教程

[Unity动画]01.

每日归档

更多

2025-04-12(10529)

2025-04-11(9561)

2025-04-10(1213)

2025-04-09(10354)

2025-04-08(12998)

2025-04-07(0)

2025-04-06(0)

2025-04-05(0)

2025-04-04(0)

2025-04-03(0)