https://www.toutiao.com/a6703348023008690701/

Hello everyone, today began to enter the column "AI middle age realm" of updated, this is the first article, describes how data-driven deep learning.

Into the realm of middle age, it is to master the move began, in this state needs its own independent thinking. If a study is from imitation, to follow, to the creative process, so at this stage, should imitate and follow the jump across the stage, he entered the stage of creation. From the beginning of this realm, tells the story of the problem may not have the answer, the more we work together is to stimulate thinking.

Author & editor | Almighty has three words

Deep learning success stems from the troika, models, data and hardware , which is behind most of the core data, the depth of learning precisely because we have learned from the data abstract knowledge to be able to complete a variety of tasks.

Development of artificial intelligence, along with the evolution of the use of data, today to talk about.

"AI perplexed territory" Data press how ruthless artificial intelligence there is more success

1 Data and Learning

I always say to students, if you fail to recognize the importance of data on a task, do not know what kind of data to complete the task at hand, it is not true entry-depth study.

Until then, you can go to indulge in various frameworks, techniques, project.

Recall that most people grow up is what kind of process.

(1) a newly born child, for all the world have shown no difference in interest, receiving a variety of information is growing.

(2) at a young age, our parents and teachers under the leadership began to learn to recite from homework, most of the wrong behavior will be error correction, the correct behavior will be rewarded.

(3) With the growth, some people in their own areas of work in skilled Fun and data model to fully explore and apply existing knowledge, another thing some people do not have the answer, so we need to explore new law, such as the establishment of his own company, the creation of new knowledge.

"AI perplexed territory" Data press how ruthless artificial intelligence there is more success

These stages, the core is behind the data.

(1) without knowledge of the time, all existing data is knowledge.

(2) learn when needed for the field they have to learn to select an existing database, and want to learn the language necessary to back the word library, learning to do math exam, practice is necessary to learn music notation, this time with the existing data learning .

(3) using the knowledge of the time, they must learn to adjust their own knowledge for new data input , in this process, knowledge also will be updated.

(4) when the creation of knowledge, it is necessary to observe the laws of science and society, from which sum up face is not fixing their data .

It is no exaggeration to say that a lifetime spent most significant statistical obtain, collate and analyze data, knowledge from the data, as I said, "Imitation of Nature."

2 Supervised Feature Project to unsupervised learning feature

Speaking of unsupervised and supervised methods, still to cite a rule of law and inaction contrast.

The core lies in the rule of law set a variety of laws so that we follow, without the core-and-rule is non-intervention , so that countries operating under the laws of nature. Obviously the latter is more advanced there is also more difficult to achieve, the big uncertainty.

This example saying is supervised and unsupervised methods representative of sociology, from supervised to unsupervised is progress, then we look at the growth of intelligent systems.

"AI perplexed territory" Data press how ruthless artificial intelligence there is more success

(1) the most basic intelligence system, in fact, is to use the machine to use the knowledge of experts, relying on a lot of experience of experts in a certain field. From the 1960s to the beginning of the second wave of the 1980s artificial intelligence, expert systems are very popular, you may be interested to know.

"AI perplexed territory" Data press how ruthless artificial intelligence there is more success

(2) 随着技术的发展，研究者发现专家系统实在是过于简单和脆弱，于是研究出了一系列的模型，包括人工神经网络/SVM等等。通过专家的经验对数据进行预处理，完成知识的初步抽象(提取特征)，之后丢给模型进行进一步的学习。与专家系统相比模型的复杂度大大提升，因此也可以开始解决更加复杂的问题，比如人脸的检测，语音的识别。在20世纪末和21世纪初，有监督的机器学习方法得到了非常广泛的应用和研究。

"AI perplexed territory" Data press how ruthless artificial intelligence there is more success

(3) 随着大数据的爆发以及科学家的不断探索，研究人员开始认识到通过专家的经验对数据进行预处理是不合适的，数据的维度太高，专家不可能知道每一个任务到底需要怎样的预处理，所以无监督特征学习方法诞生。对于一个无监督的特征学习系统，它的输入应该尽可能是原始的数据，最大程度上保证信息的完整。至于学习的规则，仍然由专家来制定。

于是专家设计出各种各样的模型架构和优化目标来指导系统从数据中进行学习，与有监督的特征工程的最大区别在于使用数据的方式，这一类方法也被称为特征学习，于是我们有了传统的机器学习算法和深度学习算法之分。

"AI perplexed territory" Data press how ruthless artificial intelligence there is more success

(4) 再往后发展，就需要机器自己创造模型，人类专家在其中所起的作用很小，甚至没有，这也是人工智能的未来，或许社会发展到一定的阶段，真的会有创造生命的那一天吧。

"AI perplexed territory" Data press how ruthless artificial intelligence there is more success

3 深度学习第一阶段-学习特征

在深度学习发展的第一阶段中，重点就是专家设计模型和优化策略，从数据中学习特征表达。

深度学习的成功很大程度上归功于卷积神经网络CNN模型架构，在图像，语音等领域都取得了大大突破。CNN是一种无监督的特征学习模型，输入原始数据，然后完成学习。关于CNN的基础，大家可以去阅读公众号的相关文章。

在这个过程中，模型的架构固然会影响最终的结果，但是更重要的却是数据集，没有一个好的数据集，怎么都不可能训练出好的模型。关于数据集的重要性，可以阅读往期文章。

「数据集」深度学习从“数据集”开始

4 深度学习第二阶段-学习模型

在深度学习发展的第二阶段中，重点就是学习网络模型本身和各种相关的策略。

在第一阶段，典型的工作流程是准备数据，选择模型框架，定义各类优化参数，然后开始训练。

模型的架构需要研究人员手动设计，模型的各类训练参数包括归一化方法，初始化方法，激活函数等等也需要研究人员根据经验进行调试。数据的使用，包括预处理，增强策略也需要研究人员进行尝试。

但是技术发展到今天，研究人员开始从数据中学习模型本身。

4.1、AutoML自动模型结构设计技术

在深度学习发展的这些年里，研究人员用尽了各种手段去探索和设计各种各样的网络，研究网络的深度，宽度，卷积的方式，浅层深层的信息流动和融合等，可以参见文章。

https://dwz.cn/t8oWtKTg

然而到了今天，新的网络设计方法开始流行，以Google Brain提出的AutoML为代表的技术，让机器根据不同的任务(数据)，自动搜索最佳的模型架构，数据驱动了模型的学习。

4.2、AutoAugment自动数据增强策略

曾几何时，我们采用各种各样的几何变换，颜色变换策略来进行数据增强。随机裁剪，颜色扰动，都对提升模型的泛化能力起着至关重要的作用。

而如今，是时候寻找更好的方法了。以Google Brain提出的AutoAugment为代表的方法，使用增强学习对不同的任务学习到了各自最合适的增强方法，可以参考往期文章：

「技术综述」深度学习中的数据增强方法都有哪些？

4.3、自动优化参数选择

曾几何时，我们设计，比较，分析sigmoid，tanh，relu等激活函数对网络性能的影响。

而Google Brain提出的以Swish为代表的方法，在一系列一元函数和二元函数组成的搜索空间中，进行了组合搜索实验，利用数据学习到了比ReLU更好的激活函数，可以参考往期文章：

「AI初识境」激活函数：从人工设计到自动搜索

曾几何时，我们还在争论是最大池化好还是平均池化好，如今基于数据的池化策略已经被广泛研究。

曾几何时，我们还在不知道选择什么样的归一化方法好，如今，基于数据的归一化策略也在被研究。

Once upon a time, we still do not know what a good choice of optimization method, and now, based on data optimization method is also being studied.

The content, the public can refer to the number of "AI acquaintance territory," we will be back to do more detailed interpretation.

It can be said, from the use of policy design model, select the model parameter optimization, data, deep learning is in full towards automation.

to sum up

A long time ago, we only use good data abstraction. Later, we learned from their own data abstraction features. Later, we invented a system to let it go abstract features. Then later, we want the data to learn the system.

"AI perplexed territory" Data press how ruthless artificial intelligence there is more success

1 Data and Learning

2 Supervised Feature Project to unsupervised learning feature

3 深度学习第一阶段-学习特征

4 深度学习第二阶段-学习模型

to sum up

Guess you like