tsfresh文档地址:https://tsfresh.readthedocs.io/en/latest/
tsfresh github地址:https://github.com/blue-yonder/tsfresh
tsfresh 安装方法:pip install tfresh
使用示例
import pandas as pd
import numpy as np
from tsfresh import extract_features
def convert_to_extract_df(dataframe:pd.DataFrame):
"""把dataframe格式转变为extract_features需要的格式"""
covert_df = pd.DataFrame(columns=['value', 'id'])
for _col, col_series in dataframe.iteritems():
_col_df = pd.DataFrame(data=[col_series.values]).T
_col_df.columns = ['value']
_col_df['id'] = _col
covert_df = pd.concat([covert_df, _col_df], axis=0, ignore_index=True)
covert_df['value'] = covert_df['value'].astype("float")
return covert_df
def get_line_features(dataframe: pd.DataFrame):
"""得到曲线的特征"""
from tsfresh import extract_features # todo 费时间
ext_feature = extract_features(dataframe, column_id="id")
return ext_feature
构造一个时间序列:
time_df = pd.DataFrame(np.arange(400).reshape((100, 4)), index=pd.date_range(start="20190101", periods=100, freq="10D"),columns=["col1","col2","col3","col4"])
构造完成时间序列后,需要将序列统一转换为tsfresh
提取特征时接受的输入格式:
ext_df = convert_to_extract_df(time_df)
ext_df.head()
现在的格式是:
value id
0 0.0 col1
1 4.0 col1
2 8.0 col1
3 12.0 col1
4 16.0 col1
... ... ...
395 383.0 col4
396 387.0 col4
397 391.0 col4
398 395.0 col4
399 399.0 col4
可以看到一个id
代表一条序列,每一个相同的id都是相同的序列值,不同的id是不同的时间序列
然后开始提取特征:
ext_feature = get_line_features(ext_df)
就可以得到结果了
ext_feature.shape # (4, 787)
这里的4与序列个数一样,一个序列(表示为不一样的id)会有1行,787是不同的特征
value__variance_larger_than_standard_deviation value__has_duplicate_max value__has_duplicate_min value__has_duplicate value__sum_values value__abs_energy value__mean_abs_change value__mean_change value__mean_second_derivative_central value__median ... value__permutation_entropy__dimension_5__tau_1 value__permutation_entropy__dimension_6__tau_1 value__permutation_entropy__dimension_7__tau_1 value__query_similarity_count__query_None__threshold_0.0 value__matrix_profile__feature_"min"__threshold_0.98 value__matrix_profile__feature_"max"__threshold_0.98 value__matrix_profile__feature_"mean"__threshold_0.98 value__matrix_profile__feature_"median"__threshold_0.98 value__matrix_profile__feature_"25"__threshold_0.98 value__matrix_profile__feature_"75"__threshold_0.98
col1 1.0 0.0 0.0 0.0 19800.0 5253600.0 4.0 4.0 0.0 198.0 ... -0.0 -0.0 -0.0 NaN 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07
col2 1.0 0.0 0.0 0.0 19900.0 5293300.0 4.0 4.0 0.0 199.0 ... -0.0 -0.0 -0.0 NaN 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07
col3 1.0 0.0 0.0 0.0 20000.0 5333200.0 4.0 4.0 0.0 200.0 ... -0.0 -0.0 -0.0 NaN 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07
col4 1.0 0.0 0.0 0.0 20100.0 5373300.0 4.0 4.0 0.0 201.0 ... -0.0 -0.0 -0.0 NaN 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07 1.685874e-07