MATLAB Tensor 和 N-Way 工具箱张量补全功能评测

文章目录

- MATLAB Tensor 和 N-Way 工具箱张量补全功能评测

在 MATLAB 中，比较成熟的工具箱有 tensor toolbox、n-way toolbox 和 TT-Toolbox (TT=Tensor Train) 。那么，在这些工具箱中，具有通过张量分解进行张量补全功能的有：

cp_wopt（CP 基于优化）：张量工具箱中用于处理缺失数据的加权直接优化 (WOPT) 方法。它是 cp_opt 基于缺失数据的一种改进。
parafac（CP 基于 ALS）：nway 工具箱中的 CP 分解函数，缺失数据设置为 NaN，使用 EM 算法进行处理。它是基于 cp_als 的一种算法的改进。
tucker（TUCKER 分解）：nway 工具箱中的 tucker 分解函数，缺失数据设置为 NaN 即可。

各种乘积表示

Kronecker 乘积就是拿坐标矩阵的每一个元素去分别乘右边的矩阵，张成一个大矩阵。即
$\mathbf{A} \otimes \mathbf{B}=\left[\begin{array}{cccc} a_{11} \mathbf{B} & a_{12} \mathbf{B} & \cdots & a_{1 J} \mathbf{B} \\ a_{21} \mathbf{B} & a_{22} \mathbf{B} & \cdots & a_{2 J} \mathbf{B} \\ \vdots & \vdots & \ddots & \vdots \\ a_{I 1} \mathbf{B} & a_{I 2} \mathbf{B} & \cdots & a_{I J} \mathbf{B} \end{array}\right]$
Khatri-Rao 乘积就是把相同列数的两个矩阵的对应列分别做 Kronecker 乘积，即
$\mathbf{A} \odot \mathbf{B}=\left[\begin{array}{llll} \mathbf{a}_1 \otimes \mathbf{b}_1 & \mathbf{a}_2 \otimes \mathbf{b}_2 & \cdots & \mathbf{a}_K \otimes \mathbf{b}_K \end{array}\right]$
Hadamard 乘积就是两个相同规模的矩阵，矩阵对应相乘
$\mathbf{A} * \mathbf{B}=\left[\begin{array}{cccc} a_{11} b_{11} & a_{12} b_{12} & \cdots & a_{1 J} b_{1 J} \\ a_{21} b_{21} & a_{22} b_{22} & \cdots & a_{2 J} b_{2 J} \\ \vdots & \vdots & \ddots & \vdots \\ a_{I 1} b_{I 1} & a_{I 2} b_{I 2} & \cdots & a_{I J} b_{I J} \end{array}\right]$
乘积满足如下法则：
$\begin{aligned} (\mathbf{A} \otimes \mathbf{B})(\mathbf{C} \otimes \mathbf{D}) &=\mathbf{A} \mathbf{C} \otimes \mathbf{B} \mathbf{D} \\ (\mathbf{A} \otimes \mathbf{B})^{\dagger} &=\mathbf{A}^{\dagger} \otimes \mathbf{B}^{\dagger} \\ \mathbf{A} \odot \mathbf{B} \odot \mathbf{C} &=(\mathbf{A} \odot \mathbf{B}) \odot \mathbf{C}=\mathbf{A} \odot(\mathbf{B} \odot \mathbf{C}) \\ (\mathbf{A} \odot \mathbf{B})^{\top}(\mathbf{A} \odot \mathbf{B}) &=\mathbf{A}^{\top} \mathbf{A} * \mathbf{B}^{\top} \mathbf{B} \\ (\mathbf{A} \odot \mathbf{B})^{\dagger} &=\left(\left(\mathbf{A}^{\top} \mathbf{A}\right) *\left(\mathbf{B}^{\top} \mathbf{B}\right)\right)^{\dagger}(\mathbf{A} \odot \mathbf{B})^{\top} \end{aligned}$

ALS 算法

交替最小二乘法是解决张量 CP 分解问题比较有效的方法，它的基本思想如下。我们以三维的张量为例，它优化的其实是：

$\min _{\hat{\boldsymbol{X}}}\|\boldsymbol{X}-\hat{\boldsymbol{X}}\| \quad \text { with } \quad \hat{\boldsymbol{X}}=\sum_{r=1}^R \lambda_r \mathbf{a}_r \circ \mathbf{b}_r \circ \mathbf{c}_r=[\boldsymbol{\lambda} ; \mathbf{A}, \mathbf{B}, \mathbf{C} ]$

ALS 方法就是先固定 B 和 C 以求解 A，然后再固定 A 和 C 求解 B，接着固定 A 和 B 以求解 C，并继续重复整个过程，直到收敛标准被满足。

我们取出其中一个，比如先固定 B 和 C 以求解 A，那么其实就是在优化
$\min _{\hat{\mathbf{A}}}\left\|\mathbf{X}_{(1)}-\hat{\mathbf{A}}(\mathbf{C} \odot \mathbf{B})^{\top}\right\|_F$
这里的 $\hat{\mathbf{A}}=\mathbf{A} \cdot \operatorname{diag}(\boldsymbol{\lambda})$ ， $\mathbf{X}_{(1)}$ 表示在第一个维度上把张量拉成矩阵。举个例子，对于一个 $\in \mathbb{R}^{3 \times 4 \times 2}$ 的张量，
$\mathbf{X}_1=\left[\begin{array}{llll} 1 & 4 & 7 & 10 \\ 2 & 5 & 8 & 11 \\ 3 & 6 & 9 & 12 \end{array}\right], \quad \mathbf{X}_2=\left[\begin{array}{llll} 13 & 16 & 19 & 22 \\ 14 & 17 & 20 & 23 \\ 15 & 18 & 21 & 24 \end{array}\right]$
在不同维度优先，拉成的矩阵就是（mode-n unfolding），
$\begin{aligned} &\mathbf{X}_{(1)}=\left[\begin{array}{llllllll} 1 & 4 & 7 & 10 & 13 & 16 & 19 & 22 \\ 2 & 5 & 8 & 11 & 14 & 17 & 20 & 23 \\ 3 & 6 & 9 & 12 & 15 & 18 & 21 & 24 \end{array}\right]\\ &\mathbf{X}_{(2)}=\left[\begin{array}{cccccc} 1 & 2 & 3 & 13 & 14 & 15 \\ 4 & 5 & 6 & 16 & 17 & 18 \\ 7 & 8 & 9 & 19 & 20 & 21 \\ 10 & 11 & 12 & 22 & 23 & 24 \end{array}\right] \text {, }\\ &\mathbf{X}_{(3)}=\left[\begin{array}{cccccccccc} 1 & 2 & 3 & 4 & 5 & \cdots & 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 & 17 & \cdots & 21 & 22 & 23 & 24 \end{array}\right] \end{aligned}$
而由
$\begin{aligned} &\mathbf{Y}=\boldsymbol{X} \times{ }_1 \mathbf{A}^{(1)} \times{ }_2 \mathbf{A}^{(2)} \cdots \times_N \mathbf{A}^{(N)} \Leftrightarrow \\ &\mathbf{Y}_{(n)}=\mathbf{A}^{(n)} \mathbf{X}_{(n)}\left(\mathbf{A}^{(N)} \otimes \cdots \otimes \mathbf{A}^{(n+1)} \otimes \mathbf{A}^{(n-1)} \otimes \cdots \otimes \mathbf{A}^{(1)}\right)^{\top} . \end{aligned}$
那么，上述的 mode-n unfolding 就可以写为
$\begin{aligned} &\mathbf{X}_{(1)} \approx \mathbf{A}(\mathbf{C} \odot \mathbf{B})^{\top}, \\ &\mathbf{X}_{(2)} \approx \mathbf{B}(\mathbf{C} \odot \mathbf{A})^{\top}, \\ &\mathbf{X}_{(3)} \approx \mathbf{C}(\mathbf{B} \odot \mathbf{A})^{\top} . \end{aligned}$
故而我们有了上述的矩阵优化问题。上述矩阵优化问题，可以理解为简单的最小二乘问题。它的一个解就是
$\hat{\mathbf{A}}=\mathbf{X}_{(1)}\left[(\mathbf{C} \odot \mathbf{B})^{\top}\right]^{\dagger}$
利用上述的乘积法则，可得
$\hat{\mathbf{A}}=\mathbf{X}_{(1)}(\mathbf{C} \odot \mathbf{B})\left(\mathbf{C}^{\top} \mathbf{C} * \mathbf{B}^{\top} \mathbf{B}\right)^{\dagger}$
计算出 $\hat{\mathbf{A}}$ 后，再通过单位化，把 $\lambda_r=\left\|\hat{\mathbf{a}}_r\right\|$ 提取出来。由此，我们可以看出，我们最后只要求一个 $\times R$ 矩阵的伪逆就可以了。每步迭代计算量如此之小。

对于更高维的情况
在这里插入图片描述

cp_als VS parafac VS parafac with NaN

下面我们来比较一下 ALS 算法及其改进的算法的效率和精度比较。cp_als 是在 tensor 工具箱中的，parafac 是在 nway 工具箱中的。我们以数据 data274D5333T100.mat 作为测试的基准。用 totalT 表示视角消耗。 $\epsilon$ 表示相对误差。

对于 cp_als，Rank 设置为 100。totalT = 0.428248， $\epsilon = 5.6475e-04$ 。

对于 Nway 工具箱，设置 Rank = 10（太大 MATLAB 数组张不开）。parafac：totalT = 9.4， $\epsilon = 0.016$ 。考虑 parafac with 0.85 NaN，按理来说应该更快，但是结果更离谱了。parafac：totalT = 58.95， $\epsilon = 0.0125$ 。如果是带 slice missing 的情况，结果更糟糕，parafac：totalT = 183， $\epsilon = 0.279$ 。

Nway 工具箱实现的非常 stupid，使得速度非常慢，为什么 Nway 实现得这么 stupid，举个例子，它算一堆矩阵 KR 除某个之外的乘积的时候，一个一个算的，一点都不明智。

结论是 Nway toolbox 对于大型张量，实现得很蠢，速度很慢。tucker 估计也好不到哪里去，就不试了。parafac 就没法用啊这里。N-way 工具箱对于高维张量，显得好笨重啊。

cp_opt VS cp_wopt

优化迭代方法总是需要花很长的时间，而且结果不一定好。cp_wopt 还会发生过拟合。

这种基于优化方法真的是时间长，精度差，直接放弃好了。

改进 cp_als

既然现有的方法这么垃圾，我们就基于比较快的分解方法，做一个改进。一个小改进就是在做 cp_als 最小二乘交替步的时候，直接在已知数值的位置上，每步迭代用已知值去替代它。这样就得到了 cp_als 的改进算法，我命名为 cp_tcals。

我们把它和已有的 SPC 方法做一个对比。以数据 data274D5333T100.mat 为基准。

对于 cp_tcals：

$\epsilon$	0.0985	0.0467	0.0212	0.0104	0.0050	0.0031
TIME COST	1.335	2.281213	3.44	4.891780	6.813247	8.718322

对于 SPC 来说：达到的精度 $\epsilon = 0.0067$ ，所需的时间就要 111。

对于 TT，Tucker 我认为也可以类似地用这种 straightforward 的方式推广到补全上。

代码如下：

function [P,Uinit,output] = cp_tcals(X,R,Tind,Tval,tolerance,varargin)
%CP_ALS Compute a CP decomposition of any type of tensor.
%
%   M = CP_ALS(X,R) computes an estimate of the best rank-R
%   CP model of a tensor X using an alternating least-squares
%   algorithm.  The input X can be a tensor, sptensor, ktensor, or
%   ttensor. The result M is a ktensor.
%
%   M = CP_ALS(X,R,'param',value,...) specifies optional parameters and
%   values. Valid parameters and their default values are:
%      'tol' - Tolerance on difference in fit {1.0e-4}
%      'maxiters' - Maximum number of iterations {50}
%      'dimorder' - Order to loop through dimensions {1:ndims(A)}
%      'init' - Initial guess [{'random'}|'nvecs'|cell array]
%      'printitn' - Print fit every n iterations; 0 for no printing {1}
%      'fixsigns' - Call fixsigns at end of iterations {true}
%
%   [M,U0] = CP_ALS(...) also returns the initial guess.
%
%   [M,U0,out] = CP_ALS(...) also returns additional output that contains
%   the input parameters.
%
%   Note: The "fit" is defined as 1 - norm(X-full(M))/norm(X) and is
%   loosely the proportion of the data described by the CP model, i.e., a
%   fit of 1 is perfect.
%
%   NOTE: Updated in various minor ways per work of Phan Anh Huy. See Anh
%   Huy Phan, Petr Tichavsk�, Andrzej Cichocki, On Fast Computation of
%   Gradients for CANDECOMP/PARAFAC Algorithms, arXiv:1204.1586, 2012.
%
%   Examples:
%   X = sptenrand([5 4 3], 10);
%   M = cp_als(X,2);
%   M = cp_als(X,2,'dimorder',[3 2 1]);
%   M = cp_als(X,2,'dimorder',[3 2 1],'init','nvecs');
%   U0 = {rand(5,2),rand(4,2),[]}; %<-- Initial guess for factors of M
%   [M,U0,out] = cp_als(X,2,'dimorder',[3 2 1],'init',U0);
%   M = cp_als(X,2,out.params); %<-- Same params as previous run
%
%   <a href="matlab:web(strcat('file://',fullfile(getfield(what('tensor_toolbox'),'path'),'doc','html','cp_als_doc.html')))">Documentation page for CP-ALS</a>
%
%   See also KTENSOR, TENSOR, SPTENSOR, TTENSOR.
%
%Tensor Toolbox for MATLAB: <a href="https://www.tensortoolbox.org">www.tensortoolbox.org</a>



%% Extract number of dimensions and norm of X.
N = ndims(X);
normX = norm(X);

%% Set algorithm parameters from input or by using defaults
params = inputParser;
params.addParameter('tol',1e-4,@isscalar);
params.addParameter('maxiters',50,@(x) isscalar(x) & x > 0);
params.addParameter('dimorder',1:N,@(x) isequal(sort(x),1:N));
params.addParameter('init', 'random', @(x) (iscell(x) || ismember(x,{'random','nvecs'})));
params.addParameter('printitn',1,@isscalar);
params.addParameter('fixsigns',true,@islogical);
params.parse(varargin{:});

%% Copy from params object
fitchangetol = params.Results.tol;
maxiters = params.Results.maxiters;
dimorder = params.Results.dimorder;
init = params.Results.init;
printitn = params.Results.printitn;

%% Error checking 

%% Set up and error checking on initial guess for U.
if iscell(init)
    Uinit = init;
    if numel(Uinit) ~= N
        error('OPTS.init does not have %d cells',N);
    end
    for n = dimorder(2:end)
        if ~isequal(size(Uinit{n}),[size(X,n) R])
            error('OPTS.init{%d} is the wrong size',n);
        end
    end
else
    % Observe that we don't need to calculate an initial guess for the
    % first index in dimorder because that will be solved for in the first
    % inner iteration.
    if strcmp(init,'random')
        Uinit = cell(N,1);
        for n = dimorder(2:end)
            Uinit{n} = rand(size(X,n),R);
        end
    elseif strcmp(init,'nvecs') || strcmp(init,'eigs') 
        Uinit = cell(N,1);
        for n = dimorder(2:end)
            Uinit{n} = nvecs(X,n,R);
        end
    else
        error('The selected initialization method is not supported');
    end
end

%% Set up for iterations - initializing U and the fit.
U = Uinit;
fit = 0;

% Store the last MTTKRP result to accelerate fitness computation.
U_mttkrp = zeros(size(X, dimorder(end)), R);

if printitn>0
  fprintf('\nTC_CP_ALS:\n');
end

%% Main Loop: Iterate until convergence

if (isa(X,'sptensor') || isa(X,'tensor')) && (exist('cpals_core','file') == 3)
 
    %fprintf('Using C++ code\n');
    [lambda,U] = cpals_core(X, Uinit, fitchangetol, maxiters, dimorder);
    P = ktensor(lambda,U);
    
else
    
    UtU = zeros(R,R,N);
    for n = 1:N
        if ~isempty(U{n})
            UtU(:,:,n) = U{n}'*U{n};
        end
    end
    
    for iter = 1:maxiters
        
        fitold = fit;
        
        % Iterate over all N modes of the tensor
        for n = dimorder(1:end)
            %if(exist('lambda','var'))
%             if(0)
%                 X = tensor(ktensor(lambda,U));
%                 X(Tind) = Tval;%modified als for missing data
%             end
            
            % Calculate Unew = X_(n) * khatrirao(all U except n, 'r').
            Unew = mttkrp(X,U,n);
            % Save the last MTTKRP result for fitness check.
            if n == dimorder(end)
              U_mttkrp = Unew;
            end
            
            % Compute the matrix of coefficients for linear system
            Y = prod(UtU(:,:,[1:n-1 n+1:N]),3);
            Unew = Unew / Y;
            if issparse(Unew)
                Unew = full(Unew);   % for the case R=1
            end
                        
            % Normalize each vector to prevent singularities in coefmatrix
            if iter == 1
                lambda = sqrt(sum(Unew.^2,1))'; %2-norm
            else
                lambda = max( max(abs(Unew),[],1), 1 )'; %max-norm
            end            
            
            Unew = bsxfun(@rdivide, Unew, lambda');

            U{n} = Unew;
            UtU(:,:,n) = U{n}'*U{n};
            
            P = ktensor(lambda,U);
            X = tensor(P);
            X(Tind) = Tval;%modified als for missing data
            normX = norm(X);
            %cal_acc(double(X),double(Y_true))
            %cal_acc(double(X(Tind)),double(Y_true(Tind)))
        end
        
        P = ktensor(lambda,U);
        
%         X = tensor(P);
%         X(Tind) = Tval;%modified als for missing data   
%         normX = norm(X);
        
        

        % This is equivalent to innerprod(X,P).
        iprod = sum(sum(P.U{dimorder(end)} .* U_mttkrp) .* lambda');
        if normX == 0
            fit = norm(P)^2 - 2 * iprod;
        else
            normresidual = sqrt( normX^2 + norm(P)^2 - 2 * iprod );
            fit = 1 - (normresidual / normX); %fraction explained by model
        end
        fitchange = abs(fitold - fit);
        
        % Check for convergence
        if (iter > 1) && (fitchange < fitchangetol)
            flag = 0;
        else
            flag = 1;
        end
        
        if (mod(iter,printitn)==0) || ((printitn>0) && (flag==0))
            fprintf(' Iter %2d: f = %e f-delta = %7.1e\n', iter, fit, fitchange);
        end
        
        global Y_true
        err = cal_acc(double(X),double(Y_true))
        if(err<tolerance)
            break;
        end
        
        % Check for convergence
        if (flag == 0)
            break;
        end        
    end   
end


%% Clean up final result
% Arrange the final tensor so that the columns are normalized.
P = arrange(P);
% Fix the signs
if params.Results.fixsigns
    P = fixsigns(P);
end

if printitn>0
    if normX == 0
        fit = norm(P)^2 - 2 * innerprod(X,P);
    else
        normresidual = sqrt( normX^2 + norm(P)^2 - 2 * innerprod(X,P) );
        fit = 1 - (normresidual / normX); %fraction explained by model
    end
  fprintf(' Final f = %e \n', fit);
end

output = struct;
output.params = params.Results;
output.iters = iter;