Is there a way to work out a formula over a column using previous rows in Python without using a for loop?

M.E. :

I have a numpy array of size 5000 by 7. I want to compute column 7 for all rows from position 200 to 4999 with the following formula:

dataset[i,7] = sum(dataset[i-200,0] + dataset[i-199,0] + dataset[i-198,0] + ... + dataset[i-1,0])/200

I have tried the following --which works--:

import numpy as np

dataset = np.random.rand(numberOfDataItems, 7)
for i in range(200,numberOfDataItems):
    dataset[i,6] = np.sum(dataset[i-200:i,3])/200

My doubt is if the outer loop can be eliminated by using another approach using numpy.

Lepakk :

Here is a solution using np.r_, slicing and skimage.util.view_as_windows.

For simplicity, I just take np.arange as data. In your case of more than one Series of Data, you could repeat this for all rows for which you would like this backward averaging:

from skimage.util import view_as_windows

numberOfDataItems=500
sumwindow=100
data=np.arange(numberOfDataItems)

Using np.r_, I can roll that data stepwise to make it array-shaped with dimension len(data)xlen(data)

b = np.r_[data,np.full(len(data)-1,data[:-1])]
c=view_as_windows(b,len(data))
c
Out[]: 
array([[ 0,  1,  2, ..., 47, 48, 49],
       [ 1,  2,  3, ..., 48, 49,  0],
       [ 2,  3,  4, ..., 49,  0,  1],
       ...,
       [47, 48, 49, ..., 44, 45, 46],
       [48, 49,  0, ..., 45, 46, 47],
       [49,  0,  1, ..., 46, 47, 48]])

Basically that is np.roll(data) with stepsize i in column i, but without a loop. Now I could sum the first, say 10 elements in column 0 as the values to work with for column 10, and so on for the further columns.

d=c[:sumwindow,:-sumwindow].sum(axis=0)/sumwindow
Out[]: 
array([ 4.5,  5.5,  6.5,  7.5,  8.5,  9.5, 10.5, 11.5, 12.5, 13.5, 14.5,
       15.5, 16.5, 17.5, 18.5, 19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5,
       26.5, 27.5, 28.5, 29.5, 30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5,
       37.5, 38.5, 39.5, 40.5, 41.5, 42.5, 43.5])

Easy to see now, that if we wanted to take the mean of the last 10 elements in every column, then 4.5 would be the value for row 10, and the value would of course increase by 1 every column.

e=np.array([data,data],dtype=float)
e[1,sumwindow:]=d
Out[]: 
array([[ 0. ,  1. ,  2. ,  3. ,  4. ,  5. ,  6. ,  7. ,  8. ,  9. , 10. ,
        11. , 12. , 13. , 14. , 15. , 16. , 17. , 18. , 19. , 20. , 21. ,
        22. , 23. , 24. , 25. , 26. , 27. , 28. , 29. , 30. , 31. , 32. ,
        33. , 34. , 35. , 36. , 37. , 38. , 39. , 40. , 41. , 42. , 43. ,
        44. , 45. , 46. , 47. , 48. , 49. ],
       [ 0. ,  1. ,  2. ,  3. ,  4. ,  5. ,  6. ,  7. ,  8. ,  9. ,  4.5,
         5.5,  6.5,  7.5,  8.5,  9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5,
        16.5, 17.5, 18.5, 19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5, 26.5,
        27.5, 28.5, 29.5, 30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5,
        38.5, 39.5, 40.5, 41.5, 42.5, 43.5]])

Viewing initial data and result together, the first (sumwindow) values are the same, and from then it is always the avg of the (sumwindow) values before, just as you had for window 200 in your example.

I hope, that solution fits your needs.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=279337&siteId=1