MATLAB Tensor 和 N-Way 工具箱张量补全功能评测

MATLAB Tensor 和 N-Way 工具箱张量补全功能评测


在 MATLAB 中,比较成熟的工具箱有 tensor toolbox、n-way toolbox 和 TT-Toolbox (TT=Tensor Train) 。那么,在这些工具箱中,具有通过张量分解进行张量补全功能的有:

  • cp_wopt(CP 基于优化):张量工具箱中用于处理缺失数据的加权直接优化 (WOPT) 方法。它是 cp_opt 基于缺失数据的一种改进。
  • parafac(CP 基于 ALS):nway 工具箱中的 CP 分解函数,缺失数据设置为 NaN,使用 EM 算法进行处理。它是基于 cp_als 的一种算法的改进。
  • tucker(TUCKER 分解):nway 工具箱中的 tucker 分解函数,缺失数据设置为 NaN 即可。

各种乘积表示

Kronecker 乘积就是拿坐标矩阵的每一个元素去分别乘右边的矩阵,张成一个大矩阵。即
A ⊗ B = [ a 11 B a 12 B ⋯ a 1 J B a 21 B a 22 B ⋯ a 2 J B ⋮ ⋮ ⋱ ⋮ a I 1 B a I 2 B ⋯ a I J B ] \mathbf{A} \otimes \mathbf{B}=\left[\begin{array}{cccc} a_{11} \mathbf{B} & a_{12} \mathbf{B} & \cdots & a_{1 J} \mathbf{B} \\ a_{21} \mathbf{B} & a_{22} \mathbf{B} & \cdots & a_{2 J} \mathbf{B} \\ \vdots & \vdots & \ddots & \vdots \\ a_{I 1} \mathbf{B} & a_{I 2} \mathbf{B} & \cdots & a_{I J} \mathbf{B} \end{array}\right] AB= a11Ba21BaI1Ba12Ba22BaI2Ba1JBa2JBaIJB
Khatri-Rao 乘积就是把相同列数的两个矩阵的对应列分别做 Kronecker 乘积,即
A ⊙ B = [ a 1 ⊗ b 1 a 2 ⊗ b 2 ⋯ a K ⊗ b K ] \mathbf{A} \odot \mathbf{B}=\left[\begin{array}{llll} \mathbf{a}_1 \otimes \mathbf{b}_1 & \mathbf{a}_2 \otimes \mathbf{b}_2 & \cdots & \mathbf{a}_K \otimes \mathbf{b}_K \end{array}\right] AB=[a1b1a2b2aKbK]
Hadamard 乘积就是两个相同规模的矩阵,矩阵对应相乘
A ∗ B = [ a 11 b 11 a 12 b 12 ⋯ a 1 J b 1 J a 21 b 21 a 22 b 22 ⋯ a 2 J b 2 J ⋮ ⋮ ⋱ ⋮ a I 1 b I 1 a I 2 b I 2 ⋯ a I J b I J ] \mathbf{A} * \mathbf{B}=\left[\begin{array}{cccc} a_{11} b_{11} & a_{12} b_{12} & \cdots & a_{1 J} b_{1 J} \\ a_{21} b_{21} & a_{22} b_{22} & \cdots & a_{2 J} b_{2 J} \\ \vdots & \vdots & \ddots & \vdots \\ a_{I 1} b_{I 1} & a_{I 2} b_{I 2} & \cdots & a_{I J} b_{I J} \end{array}\right] AB= a11b11a21b21aI1bI1a12b12a22b22aI2bI2a1Jb1Ja2Jb2JaIJbIJ
乘积满足如下法则:
( A ⊗ B ) ( C ⊗ D ) = A C ⊗ B D ( A ⊗ B ) † = A † ⊗ B † A ⊙ B ⊙ C = ( A ⊙ B ) ⊙ C = A ⊙ ( B ⊙ C ) ( A ⊙ B ) ⊤ ( A ⊙ B ) = A ⊤ A ∗ B ⊤ B ( A ⊙ B ) † = ( ( A ⊤ A ) ∗ ( B ⊤ B ) ) † ( A ⊙ B ) ⊤ \begin{aligned} (\mathbf{A} \otimes \mathbf{B})(\mathbf{C} \otimes \mathbf{D}) &=\mathbf{A} \mathbf{C} \otimes \mathbf{B} \mathbf{D} \\ (\mathbf{A} \otimes \mathbf{B})^{\dagger} &=\mathbf{A}^{\dagger} \otimes \mathbf{B}^{\dagger} \\ \mathbf{A} \odot \mathbf{B} \odot \mathbf{C} &=(\mathbf{A} \odot \mathbf{B}) \odot \mathbf{C}=\mathbf{A} \odot(\mathbf{B} \odot \mathbf{C}) \\ (\mathbf{A} \odot \mathbf{B})^{\top}(\mathbf{A} \odot \mathbf{B}) &=\mathbf{A}^{\top} \mathbf{A} * \mathbf{B}^{\top} \mathbf{B} \\ (\mathbf{A} \odot \mathbf{B})^{\dagger} &=\left(\left(\mathbf{A}^{\top} \mathbf{A}\right) *\left(\mathbf{B}^{\top} \mathbf{B}\right)\right)^{\dagger}(\mathbf{A} \odot \mathbf{B})^{\top} \end{aligned} (AB)(CD)(AB)ABC(AB)(AB)(AB)=ACBD=AB=(AB)C=A(BC)=AABB=((AA)(BB))(AB)

ALS 算法

交替最小二乘法是解决张量 CP 分解问题比较有效的方法,它的基本思想如下。我们以三维的张量为例,它优化的其实是:

min ⁡ X ^ ∥ X − X ^ ∥  with  X ^ = ∑ r = 1 R λ r a r ∘ b r ∘ c r = [ λ ; A , B , C ] \min _{\hat{\boldsymbol{X}}}\|\boldsymbol{X}-\hat{\boldsymbol{X}}\| \quad \text { with } \quad \hat{\boldsymbol{X}}=\sum_{r=1}^R \lambda_r \mathbf{a}_r \circ \mathbf{b}_r \circ \mathbf{c}_r=[\boldsymbol{\lambda} ; \mathbf{A}, \mathbf{B}, \mathbf{C} ] X^minXX^ with X^=r=1Rλrarbrcr=[λ;A,B,C]

ALS 方法就是先固定 B 和 C 以求解 A,然后再固定 A 和 C 求解 B,接着固定 A 和 B 以求解 C,并继续重复整个过程,直到收敛标准被满足。

我们取出其中一个,比如 先固定 B 和 C 以求解 A,那么其实就是在优化
min ⁡ A ^ ∥ X ( 1 ) − A ^ ( C ⊙ B ) ⊤ ∥ F \min _{\hat{\mathbf{A}}}\left\|\mathbf{X}_{(1)}-\hat{\mathbf{A}}(\mathbf{C} \odot \mathbf{B})^{\top}\right\|_F A^min X(1)A^(CB) F
这里的 A ^ = A ⋅ diag ⁡ ( λ ) \hat{\mathbf{A}}=\mathbf{A} \cdot \operatorname{diag}(\boldsymbol{\lambda}) A^=Adiag(λ) X ( 1 ) \mathbf{X}_{(1)} X(1) 表示在第一个维度上把张量拉成矩阵。举个例子,对于一个 X ∈ R 3 × 4 × 2 X \in \mathbb{R}^{3 \times 4 \times 2} XR3×4×2 的张量,
X 1 = [ 1 4 7 10 2 5 8 11 3 6 9 12 ] , X 2 = [ 13 16 19 22 14 17 20 23 15 18 21 24 ] \mathbf{X}_1=\left[\begin{array}{llll} 1 & 4 & 7 & 10 \\ 2 & 5 & 8 & 11 \\ 3 & 6 & 9 & 12 \end{array}\right], \quad \mathbf{X}_2=\left[\begin{array}{llll} 13 & 16 & 19 & 22 \\ 14 & 17 & 20 & 23 \\ 15 & 18 & 21 & 24 \end{array}\right] X1= 123456789101112 ,X2= 131415161718192021222324
在不同维度优先,拉成的矩阵就是(mode-n unfolding),
X ( 1 ) = [ 1 4 7 10 13 16 19 22 2 5 8 11 14 17 20 23 3 6 9 12 15 18 21 24 ] X ( 2 ) = [ 1 2 3 13 14 15 4 5 6 16 17 18 7 8 9 19 20 21 10 11 12 22 23 24 ] ,  X ( 3 ) = [ 1 2 3 4 5 ⋯ 9 10 11 12 13 14 15 16 17 ⋯ 21 22 23 24 ] \begin{aligned} &\mathbf{X}_{(1)}=\left[\begin{array}{llllllll} 1 & 4 & 7 & 10 & 13 & 16 & 19 & 22 \\ 2 & 5 & 8 & 11 & 14 & 17 & 20 & 23 \\ 3 & 6 & 9 & 12 & 15 & 18 & 21 & 24 \end{array}\right]\\ &\mathbf{X}_{(2)}=\left[\begin{array}{cccccc} 1 & 2 & 3 & 13 & 14 & 15 \\ 4 & 5 & 6 & 16 & 17 & 18 \\ 7 & 8 & 9 & 19 & 20 & 21 \\ 10 & 11 & 12 & 22 & 23 & 24 \end{array}\right] \text {, }\\ &\mathbf{X}_{(3)}=\left[\begin{array}{cccccccccc} 1 & 2 & 3 & 4 & 5 & \cdots & 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 & 17 & \cdots & 21 & 22 & 23 & 24 \end{array}\right] \end{aligned} X(1)= 123456789101112131415161718192021222324 X(2)= 147102581136912131619221417202315182124 X(3)=[113214315416517921102211231224]
而由
Y = X × 1 A ( 1 ) × 2 A ( 2 ) ⋯ × N A ( N ) ⇔ Y ( n ) = A ( n ) X ( n ) ( A ( N ) ⊗ ⋯ ⊗ A ( n + 1 ) ⊗ A ( n − 1 ) ⊗ ⋯ ⊗ A ( 1 ) ) ⊤ . \begin{aligned} &\mathbf{Y}=\boldsymbol{X} \times{ }_1 \mathbf{A}^{(1)} \times{ }_2 \mathbf{A}^{(2)} \cdots \times_N \mathbf{A}^{(N)} \Leftrightarrow \\ &\mathbf{Y}_{(n)}=\mathbf{A}^{(n)} \mathbf{X}_{(n)}\left(\mathbf{A}^{(N)} \otimes \cdots \otimes \mathbf{A}^{(n+1)} \otimes \mathbf{A}^{(n-1)} \otimes \cdots \otimes \mathbf{A}^{(1)}\right)^{\top} . \end{aligned} Y=X×1A(1)×2A(2)×NA(N)Y(n)=A(n)X(n)(A(N)A(n+1)A(n1)A(1)).
那么,上述的 mode-n unfolding 就可以写为
X ( 1 ) ≈ A ( C ⊙ B ) ⊤ , X ( 2 ) ≈ B ( C ⊙ A ) ⊤ , X ( 3 ) ≈ C ( B ⊙ A ) ⊤ . \begin{aligned} &\mathbf{X}_{(1)} \approx \mathbf{A}(\mathbf{C} \odot \mathbf{B})^{\top}, \\ &\mathbf{X}_{(2)} \approx \mathbf{B}(\mathbf{C} \odot \mathbf{A})^{\top}, \\ &\mathbf{X}_{(3)} \approx \mathbf{C}(\mathbf{B} \odot \mathbf{A})^{\top} . \end{aligned} X(1)A(CB),X(2)B(CA),X(3)C(BA).
故而我们有了上述的矩阵优化问题。上述矩阵优化问题,可以理解为简单的最小二乘问题。它的一个解就是
A ^ = X ( 1 ) [ ( C ⊙ B ) ⊤ ] † \hat{\mathbf{A}}=\mathbf{X}_{(1)}\left[(\mathbf{C} \odot \mathbf{B})^{\top}\right]^{\dagger} A^=X(1)[(CB)]
利用上述的乘积法则,可得
A ^ = X ( 1 ) ( C ⊙ B ) ( C ⊤ C ∗ B ⊤ B ) † \hat{\mathbf{A}}=\mathbf{X}_{(1)}(\mathbf{C} \odot \mathbf{B})\left(\mathbf{C}^{\top} \mathbf{C} * \mathbf{B}^{\top} \mathbf{B}\right)^{\dagger} A^=X(1)(CB)(CCBB)
计算出 A ^ \hat{\mathbf{A}} A^ 后,再通过单位化,把 λ r = ∥ a ^ r ∥ \lambda_r=\left\|\hat{\mathbf{a}}_r\right\| λr=a^r 提取出来。由此,我们可以看出,我们最后只要求一个 R × R R \times R R×R 矩阵的伪逆就可以了。每步迭代计算量如此之小。

对于更高维的情况
在这里插入图片描述

cp_als VS parafac VS parafac with NaN

下面我们来比较一下 ALS 算法及其改进的算法的效率和精度比较。cp_als 是在 tensor 工具箱中的,parafac 是在 nway 工具箱中的。我们以数据 data274D5333T100.mat 作为测试的基准。用 totalT 表示视角消耗。 ϵ \epsilon ϵ 表示相对误差。

对于 cp_als,Rank 设置为 100。totalT = 0.428248, ϵ = 5.6475 e − 04 \epsilon = 5.6475e-04 ϵ=5.6475e04

对于 Nway 工具箱,设置 Rank = 10(太大 MATLAB 数组张不开)。parafac:totalT = 9.4, ϵ = 0.016 \epsilon = 0.016 ϵ=0.016。考虑 parafac with 0.85 NaN,按理来说应该更快,但是结果更离谱了。parafac:totalT = 58.95, ϵ = 0.0125 \epsilon = 0.0125 ϵ=0.0125。如果是带 slice missing 的情况,结果更糟糕,parafac:totalT = 183, ϵ = 0.279 \epsilon = 0.279 ϵ=0.279

Nway 工具箱实现的非常 stupid,使得速度非常慢,为什么 Nway 实现得这么 stupid,举个例子,它算一堆矩阵 KR 除某个之外的乘积的时候,一个一个算的,一点都不明智。

结论是 Nway toolbox 对于大型张量,实现得很蠢,速度很慢。tucker 估计也好不到哪里去,就不试了。parafac 就没法用啊这里。N-way 工具箱对于高维张量,显得好笨重啊。

cp_opt VS cp_wopt

优化迭代方法总是需要花很长的时间,而且结果不一定好。cp_wopt 还会发生过拟合。

这种基于优化方法真的是时间长,精度差,直接放弃好了。

改进 cp_als

既然现有的方法这么垃圾,我们就基于比较快的分解方法,做一个改进。一个小改进就是在做 cp_als 最小二乘交替步的时候,直接在已知数值的位置上,每步迭代用已知值去替代它。这样就得到了 cp_als 的改进算法,我命名为 cp_tcals。

我们把它和已有的 SPC 方法做一个对比。以数据 data274D5333T100.mat 为基准。

对于 cp_tcals:

ϵ \epsilon ϵ 0.0985 0.0467 0.0212 0.0104 0.0050 0.0031
TIME COST 1.335 2.281213 3.44 4.891780 6.813247 8.718322

对于 SPC 来说:达到的精度 ϵ = 0.0067 \epsilon = 0.0067 ϵ=0.0067,所需的时间就要 111。

对于 TT,Tucker 我认为也可以类似地用这种 straightforward 的方式推广到补全上。

代码如下:

function [P,Uinit,output] = cp_tcals(X,R,Tind,Tval,tolerance,varargin)
%CP_ALS Compute a CP decomposition of any type of tensor.
%
%   M = CP_ALS(X,R) computes an estimate of the best rank-R
%   CP model of a tensor X using an alternating least-squares
%   algorithm.  The input X can be a tensor, sptensor, ktensor, or
%   ttensor. The result M is a ktensor.
%
%   M = CP_ALS(X,R,'param',value,...) specifies optional parameters and
%   values. Valid parameters and their default values are:
%      'tol' - Tolerance on difference in fit {1.0e-4}
%      'maxiters' - Maximum number of iterations {50}
%      'dimorder' - Order to loop through dimensions {1:ndims(A)}
%      'init' - Initial guess [{'random'}|'nvecs'|cell array]
%      'printitn' - Print fit every n iterations; 0 for no printing {1}
%      'fixsigns' - Call fixsigns at end of iterations {true}
%
%   [M,U0] = CP_ALS(...) also returns the initial guess.
%
%   [M,U0,out] = CP_ALS(...) also returns additional output that contains
%   the input parameters.
%
%   Note: The "fit" is defined as 1 - norm(X-full(M))/norm(X) and is
%   loosely the proportion of the data described by the CP model, i.e., a
%   fit of 1 is perfect.
%
%   NOTE: Updated in various minor ways per work of Phan Anh Huy. See Anh
%   Huy Phan, Petr Tichavsk�, Andrzej Cichocki, On Fast Computation of
%   Gradients for CANDECOMP/PARAFAC Algorithms, arXiv:1204.1586, 2012.
%
%   Examples:
%   X = sptenrand([5 4 3], 10);
%   M = cp_als(X,2);
%   M = cp_als(X,2,'dimorder',[3 2 1]);
%   M = cp_als(X,2,'dimorder',[3 2 1],'init','nvecs');
%   U0 = {rand(5,2),rand(4,2),[]}; %<-- Initial guess for factors of M
%   [M,U0,out] = cp_als(X,2,'dimorder',[3 2 1],'init',U0);
%   M = cp_als(X,2,out.params); %<-- Same params as previous run
%
%   <a href="matlab:web(strcat('file://',fullfile(getfield(what('tensor_toolbox'),'path'),'doc','html','cp_als_doc.html')))">Documentation page for CP-ALS</a>
%
%   See also KTENSOR, TENSOR, SPTENSOR, TTENSOR.
%
%Tensor Toolbox for MATLAB: <a href="https://www.tensortoolbox.org">www.tensortoolbox.org</a>



%% Extract number of dimensions and norm of X.
N = ndims(X);
normX = norm(X);

%% Set algorithm parameters from input or by using defaults
params = inputParser;
params.addParameter('tol',1e-4,@isscalar);
params.addParameter('maxiters',50,@(x) isscalar(x) & x > 0);
params.addParameter('dimorder',1:N,@(x) isequal(sort(x),1:N));
params.addParameter('init', 'random', @(x) (iscell(x) || ismember(x,{'random','nvecs'})));
params.addParameter('printitn',1,@isscalar);
params.addParameter('fixsigns',true,@islogical);
params.parse(varargin{:});

%% Copy from params object
fitchangetol = params.Results.tol;
maxiters = params.Results.maxiters;
dimorder = params.Results.dimorder;
init = params.Results.init;
printitn = params.Results.printitn;

%% Error checking 

%% Set up and error checking on initial guess for U.
if iscell(init)
    Uinit = init;
    if numel(Uinit) ~= N
        error('OPTS.init does not have %d cells',N);
    end
    for n = dimorder(2:end)
        if ~isequal(size(Uinit{n}),[size(X,n) R])
            error('OPTS.init{%d} is the wrong size',n);
        end
    end
else
    % Observe that we don't need to calculate an initial guess for the
    % first index in dimorder because that will be solved for in the first
    % inner iteration.
    if strcmp(init,'random')
        Uinit = cell(N,1);
        for n = dimorder(2:end)
            Uinit{n} = rand(size(X,n),R);
        end
    elseif strcmp(init,'nvecs') || strcmp(init,'eigs') 
        Uinit = cell(N,1);
        for n = dimorder(2:end)
            Uinit{n} = nvecs(X,n,R);
        end
    else
        error('The selected initialization method is not supported');
    end
end

%% Set up for iterations - initializing U and the fit.
U = Uinit;
fit = 0;

% Store the last MTTKRP result to accelerate fitness computation.
U_mttkrp = zeros(size(X, dimorder(end)), R);

if printitn>0
  fprintf('\nTC_CP_ALS:\n');
end

%% Main Loop: Iterate until convergence

if (isa(X,'sptensor') || isa(X,'tensor')) && (exist('cpals_core','file') == 3)
 
    %fprintf('Using C++ code\n');
    [lambda,U] = cpals_core(X, Uinit, fitchangetol, maxiters, dimorder);
    P = ktensor(lambda,U);
    
else
    
    UtU = zeros(R,R,N);
    for n = 1:N
        if ~isempty(U{n})
            UtU(:,:,n) = U{n}'*U{n};
        end
    end
    
    for iter = 1:maxiters
        
        fitold = fit;
        
        % Iterate over all N modes of the tensor
        for n = dimorder(1:end)
            %if(exist('lambda','var'))
%             if(0)
%                 X = tensor(ktensor(lambda,U));
%                 X(Tind) = Tval;%modified als for missing data
%             end
            
            % Calculate Unew = X_(n) * khatrirao(all U except n, 'r').
            Unew = mttkrp(X,U,n);
            % Save the last MTTKRP result for fitness check.
            if n == dimorder(end)
              U_mttkrp = Unew;
            end
            
            % Compute the matrix of coefficients for linear system
            Y = prod(UtU(:,:,[1:n-1 n+1:N]),3);
            Unew = Unew / Y;
            if issparse(Unew)
                Unew = full(Unew);   % for the case R=1
            end
                        
            % Normalize each vector to prevent singularities in coefmatrix
            if iter == 1
                lambda = sqrt(sum(Unew.^2,1))'; %2-norm
            else
                lambda = max( max(abs(Unew),[],1), 1 )'; %max-norm
            end            
            
            Unew = bsxfun(@rdivide, Unew, lambda');

            U{n} = Unew;
            UtU(:,:,n) = U{n}'*U{n};
            
            P = ktensor(lambda,U);
            X = tensor(P);
            X(Tind) = Tval;%modified als for missing data
            normX = norm(X);
            %cal_acc(double(X),double(Y_true))
            %cal_acc(double(X(Tind)),double(Y_true(Tind)))
        end
        
        P = ktensor(lambda,U);
        
%         X = tensor(P);
%         X(Tind) = Tval;%modified als for missing data   
%         normX = norm(X);
        
        

        % This is equivalent to innerprod(X,P).
        iprod = sum(sum(P.U{dimorder(end)} .* U_mttkrp) .* lambda');
        if normX == 0
            fit = norm(P)^2 - 2 * iprod;
        else
            normresidual = sqrt( normX^2 + norm(P)^2 - 2 * iprod );
            fit = 1 - (normresidual / normX); %fraction explained by model
        end
        fitchange = abs(fitold - fit);
        
        % Check for convergence
        if (iter > 1) && (fitchange < fitchangetol)
            flag = 0;
        else
            flag = 1;
        end
        
        if (mod(iter,printitn)==0) || ((printitn>0) && (flag==0))
            fprintf(' Iter %2d: f = %e f-delta = %7.1e\n', iter, fit, fitchange);
        end
        
        global Y_true
        err = cal_acc(double(X),double(Y_true))
        if(err<tolerance)
            break;
        end
        
        % Check for convergence
        if (flag == 0)
            break;
        end        
    end   
end


%% Clean up final result
% Arrange the final tensor so that the columns are normalized.
P = arrange(P);
% Fix the signs
if params.Results.fixsigns
    P = fixsigns(P);
end

if printitn>0
    if normX == 0
        fit = norm(P)^2 - 2 * innerprod(X,P);
    else
        normresidual = sqrt( normX^2 + norm(P)^2 - 2 * innerprod(X,P) );
        fit = 1 - (normresidual / normX); %fraction explained by model
    end
  fprintf(' Final f = %e \n', fit);
end

output = struct;
output.params = params.Results;
output.iters = iter;

猜你喜欢

转载自blog.csdn.net/lusongno1/article/details/127261546
今日推荐