独热向量编码原理

其他 2018-08-14 05:08:41 阅读次数: 0

`# -- coding: utf-8 --
from future import unicode_literals
import numpy as np
import sklearn.preprocessing as sp

raw_samples = np.array([
[1, 3, 2],
[7, 5, 4],
[1, 8, 6],
[7, 3, 9]])
print(raw_samples)

底层原理

code_tables = []
先创建一个字典，来保存原始数据，编译的结果数据，用键值对来表示最好
一个列一个字典，那么就是一个字典列表了
for col in raw_samples.T:
code_table = {}
for val in col:
code_table[val] = None
code_tables.append(code_table)

给字典取值
for code_table in code_tables:
size = len(code_table)
for one, key in enumerate(sorted(code_table.keys())):
code_table[key] = np.zeros(shape=size, dtype=int)
code_table[key][one] = 1
ohe_samples = []
下面用到了列表里面嵌套列表
开始编码
for raw_sample in raw_samples:
ohe_sample = np.array([], dtype=int)
for i, key in enumerate(raw_sample):
ohe_sample = np.hstack(
(ohe_sample, code_tables[i][key]))
对每一行进行编码，
ohe_samples.append(ohe_sample)
把列表转化为数组
ohe_samples = np.array(ohe_samples)
print(ohe_samples)

独热编码器

ohe = sp.OneHotEncoder(sparse=False, dtype=int)
ohe_samples = ohe.fit_transform(raw_samples)
print(ohe_samples)
`
伪代码：
目的是把一个非全0和1 二维数组变化为只有0和1 的一个二维数组
需要用到字典容器
1 3 2 :101001000
7 5 4 :010100100
1 8 6 …
7 3 9
1:10 3:100 2:1000
7:01 5:010 4:0100
8:001 6:0010
9:0001

猜你喜欢

转载自blog.csdn.net/lc574260570/article/details/81625294

独热向量编码原理

独热编码函数

独热编码

快速独热编码

机器学习sklearn中独热编码与向量计数的应用

05 神经网络语言模型（独热编码+词向量的起源）

实现独热编码的方法

sklearn preprocessing 独热编码

Python实现独热编码

LabelBinarizer 函数与独热编码

关联规则&&独热编码

one-hot编码（独热编码）

独热编码 one-hot Encoding

数据预处理—独热编码

机器学习之独热编码

One-Hot Encoding独热编码

One-Hot Encoding 独热编码

独热编码（One-Hot）的理解

数据预批处理-独热编码

独热编码处理文本属性

pandas中独热编码的使用（理论）

独热编码（One-Hot Encoding）

数据预处理-onehot独热编码

one-hot编码/哑编码/独热编码

OneHotEncoder独热编码和 LabelEncoder标签编码

独热编码(one-hot encoding)与哑编码

OneHotEncoder独热编码与哑编码易混点分析

OneHotEncoder独热编码和LabelEncoder标签编码

[机器学习]One-Hot编码总结(独热编码)

tf.one_hot()进行独热编码

今日推荐

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

周排行

Metasploit文件目录与入侵基本概念

跨域(CORS)请求问题[No 'Access-Control-Allow-Origin' header is present on the requested resource]常见解决方案

CodeIgniter 源码解读之 CodeIgniter.php（二）

SAS入门之（四）改变数据类型

初识元组

[数学建模]数学建模算法和模型（B站视频）（二）

Nginx 服务器源码安装配置流程

C#实现语音视频录制【基于MCapture + MFile】

开发进度4

下载安装vue的方法网址

每日归档

更多

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)