table of Contents
36, two correlation seeking numpy.ndarray
37. Determine whether there is a null value in numpy.ndarray
38. Use the specified value to replace the default value in numpy.ndarray
39. Calculate the frequency of numpy.ndarray elements
40. Convert numpy.ndarray elements from numerical type to sub-type
41. Get a new column from the known column of numpy.ndarray
42. numpy.ndarray probability sampling
43. Find the second largest element after numpy.ndarray is classified according to a certain index
44. Sort by a column of numpy.ndarray
45. Pick the element with the highest frequency in numpy.ndarray
47. Replace the elements that meet the conditions in numpy.ndarray with the given value
48.Get the element position and element of the top n in numpy.ndarray
49、求numpy.ndarray的row wise counts
50, multiple numpy.ndarray into one
51、计算numpy.ndarray的one-hot encodings numpy.ndarray
52、create row numbers grouped by a categorical variable
53、create groud ids based on a given categorical variable
54. numpy.ndarray (one-dimensional) element rank
55, numpy.ndarray (multi-dimensional) element rank
56. Output the largest element of each row of numpy.ndarray
57. Output the ratio of the minimum value to the maximum value of each row of numpy.ndarray
58. Determine whether the element in numpy.ndarray appears for the first time
59, find the mean of each group of elements in numpy.ndarray
60. Convert PIL image to numpy.ndarray
61. Discard all default values in numpy.ndarray
62. Calculate the Euclidean distance of two numpy.ndarrays
63, find the local maximum position of numpy.ndarray
64, numpy.ndarray subtraction operation
65. Output the nth repeated position of the element in numpy.ndarray
66. Convert numpy.ndarray data format from datetime64 to datetime
67. Calculate the size of the numpy.ndarray data window
68. Specify the start, end, and step length to build a numpy.ndarray
69, complete non-continuous time series numpy.ndarray
70. Construct a numpy.ndarray with a sliding window according to the specified step length
36, find the correlation coefficient of the two columns of numpy.ndarray
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
#方法1
np.corrcoef(iris[:, 0], iris[:, 2])[0, 1]
#方法2
from scipy.stats.stats import pearsonr
corr, p_value = pearsonr(iris[:, 0], iris[:, 2])
print(corr)
37. Determine whether there is a null value in numpy.ndarray
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
np.isnan(iris_2d).any()
38. Use the specified value to replace the default value in numpy.ndarray
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
iris_2d[np.random.randint(150, size=20), np.random.randint(4, size=20)] = np.nan
iris_2d[np.isnan(iris_2d)] = 0#使用0替代缺省值
iris_2d[:4]
39. Calculate the frequency of numpy.ndarray elements
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
species = np.array([row.tolist()[4] for row in iris])
# Get the unique values and the counts
np.unique(species, return_counts=True)
40. Convert numpy.ndarray elements from numerical type to sub-type
'''
需求:
Less than 3 --> 'small'
3-5 --> 'medium'
'>=5 --> 'large'
'''
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# Bin petallength
petal_length_bin = np.digitize(iris[:, 2].astype('float'), [0, 3, 5, 10])
# Map it to respective category
label_map = {1: 'small', 2: 'medium', 3: 'large', 4: np.nan}
petal_length_cat = [label_map[x] for x in petal_length_bin]
# View
petal_length_cat[:4]
41. Get a new column from the known column of numpy.ndarray
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')
#计算新列
sepallength = iris_2d[:, 0].astype('float')
petallength = iris_2d[:, 2].astype('float')
volume = (np.pi * petallength * (sepallength**2))/3
# 转换为iris_2d大小
volume = volume[:, np.newaxis]
#添加新列
out = np.hstack([iris_2d, volume])
out[:4]
42. numpy.ndarray probability sampling
#需求:抽样结果使得species中setose is twice the number of versicolor and virginica
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
# Get the species column
species = iris[:, 4]
#方法1
np.random.seed(100)
a = np.array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])
species_out = np.random.choice(a, 150, p=[0.5, 0.25, 0.25])
#方法2
np.random.seed(100)
probs = np.r_[np.linspace(0, 0.500, num=50), np.linspace(0.501, .750, num=50), np.linspace(.751, 1.0, num=50)]
index = np.searchsorted(probs, np.random.random(150))
species_out = species[index]
print(np.unique(species_out, return_counts=True))
43. Find the second largest element after numpy.ndarray is classified according to a certain index
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
# Get the species and petal length columns
petal_len_setosa = iris[iris[:, 4] == b'Iris-setosa', [2]].astype('float')
# Get the second last value
np.unique(np.sort(petal_len_setosa))[-2]
44. Sort by a column of numpy.ndarray
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
print(iris[iris[:,0].argsort()][:20])#按第一列排序
45. Pick the element with the highest frequency in numpy.ndarray
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
vals, counts = np.unique(iris[:, 2], return_counts=True)
print(vals[np.argmax(counts)])
46. Output the position of the numpy.ndarray that is greater than the given element for the first time
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
np.argwhere(iris[:, 3].astype(float) > 1.0)[0]
47. Replace the elements that meet the conditions in numpy.ndarray with the given value
#需求:numpy.ndarray中大于30的用30替换、小于10的用10替换
np.set_printoptions(precision=2)
np.random.seed(100)
a = np.random.uniform(1,50, 20)
#方法1
np.clip(a, a_min=10, a_max=30)
#方法2
print(np.where(a < 10, 10, np.where(a > 30, 30, a)))
48.Get the element position and element of the top n in numpy.ndarray
np.random.seed(100)
a = np.random.uniform(1,50, 20)
##获取numpy.ndarray中大小排前5的元素位置
#方法1
print(a.argsort())
#方法2
np.argpartition(-a, 5)[:5]
##获取numpy.ndarray中大小排前5的元素
#方法1
a[a.argsort()][-5:]
#方法2
np.sort(a)[-5:]
#方法3
np.partition(a, kth=-5)[-5:]
#方法4
a[np.argpartition(-a, 5)][:5]
49、求numpy.ndarray的row wise counts
np.random.seed(100)
arr = np.random.randint(1,11,size=(6, 10))
print(arr)
def counts_of_all_values_rowwise(arr2d):
# Unique values and its counts row wise
num_counts_array = [np.unique(row, return_counts=True) for row in arr2d]
# Counts of all values row wise
return([[int(b[a==i]) if i in a else 0 for i in np.unique(arr2d)] for a, b in num_counts_array])
print(np.arange(1,11))
counts_of_all_values_rowwise(arr)
50, multiple numpy.ndarray into one
arr1 = np.arange(3)
arr2 = np.arange(3,7)
arr3 = np.arange(7,10)
array_of_arrays = np.array([arr1, arr2, arr3])
print('array_of_arrays: ', array_of_arrays)
#方法
arr_2d = np.array([a for arr in array_of_arrays for a in arr])
#方法2
arr_2d = np.concatenate(array_of_arrays)
print(arr_2d)
51、计算numpy.ndarray的one-hot encodings numpy.ndarray
np.random.seed(101)
arr = np.random.randint(1,4, size=6)
arr
print(arr)
# Solution:
def one_hot_encodings(arr):
uniqs = np.unique(arr)
out = np.zeros((arr.shape[0], uniqs.shape[0]))
for i, k in enumerate(arr):
out[i, k-1] = 1
return out
one_hot_encodings(arr)
52、create row numbers grouped by a categorical variable
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))
print(species_small)
print([i for val in np.unique(species_small) for i, grp in enumerate(species_small[species_small==val])])
53、create groud ids based on a given categorical variable
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
species = np.genfromtxt(url, delimiter=',', dtype='str', usecols=4)
np.random.seed(100)
species_small = np.sort(np.random.choice(species, size=20))
print(species_small)
output = [np.argwhere(np.unique(species_small) == s).tolist()[0][0] for val in np.unique(species_small) for s in species_small[species_small==val]]
output
54. numpy.ndarray (one-dimensional) element rank
np.random.seed(10)
a = np.random.randint(20, size=10)
print('Array: ', a)
print(a.argsort().argsort())
55, numpy.ndarray (multi-dimensional) element rank
np.random.seed(10)
a = np.random.randint(20, size=[2,5])
print(a)
print(a.ravel().argsort().argsort().reshape(a.shape))
56. Output the largest element of each row of numpy.ndarray
np.random.seed(100)
a = np.random.randint(1,10, [5,3])
print(a)
# 方法1
np.amax(a, axis=1)
#方法2
np.apply_along_axis(np.max, arr=a, axis=1)
57. Output the ratio of the minimum value to the maximum value of each row of numpy.ndarray
np.random.seed(100)
a = np.random.randint(1,10, [5,3])
print(a)
np.apply_along_axis(lambda x: np.min(x)/np.max(x), arr=a, axis=1)
58. Determine whether the element in numpy.ndarray appears for the first time
np.random.seed(100)
a = np.random.randint(0, 5, 10)
# There is no direct function to do this as of 1.13.3
# Create an all True array
out = np.full(a.shape[0], True)
# Find the index positions of unique elements
unique_positions = np.unique(a, return_index=True)[1]
# Mark those positions as False
out[unique_positions] = False
print(out)
59, find the mean of each group of elements in numpy.ndarray
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength', 'sepalwidth', 'petallength', 'petalwidth', 'species')
# No direct way to implement this. Just a version of a workaround.
numeric_column = iris[:, 1].astype('float') # sepalwidth
grouping_column = iris[:, 4] # species
# List comprehension version
[[group_val, numeric_column[grouping_column==group_val].mean()] for group_val in np.unique(grouping_column)]
# For Loop version
output = []
for group_val in np.unique(grouping_column):
output.append([group_val, numeric_column[grouping_column==group_val].mean()])
output
60. Convert PIL image to numpy.ndarray
from io import BytesIO
from PIL import Image
import PIL, requests
# Import image from URL
URL = 'https://upload.wikimedia.org/wikipedia/commons/8/8b/Denali_Mt_McKinley.jpg'
response = requests.get(URL)
# Read it as Image
I = Image.open(BytesIO(response.content))
# Optionally resize
I = I.resize([150,150])
# Convert to numpy array
arr = np.asarray(I)
# Optionaly Convert it back to an image and show
im = PIL.Image.fromarray(np.uint8(arr))
Image.Image.show(im)
61. Discard all default values in numpy.ndarray
a = np.array([1,2,3,np.nan,5,6,7,np.nan])
print(a)
a[~np.isnan(a)]
62. Calculate the Euclidean distance of two numpy.ndarrays
a = np.array([1,2,3,4,5])
b = np.array([4,5,6,7,8])
# Solution
dist = np.linalg.norm(a-b)
dist
63, find the local maximum position of numpy.ndarray
a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
doublediff = np.diff(np.sign(np.diff(a)))
peak_locations = np.where(doublediff == -2)[0] + 1
peak_locations
64, numpy.ndarray subtraction operation
#需求:Subtract the 1d array b_1d from the 2d array a_2d, such that each item of b_1d subtracts from respective row of a_2d.
a_2d = np.array([[3,3,3],[4,4,4],[5,5,5]])
b_1d = np.array([1,2,3])
print(a_2d - b_1d[:,None])
65. Output the nth repeated position of the element in numpy.ndarray
x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])
print(x)
n = 5
#方法1:列表推导式
[i for i, v in enumerate(x) if v == 1][n-1]#输出元素1第5次重复的位置
#方法2
np.where(x == 1)[0][n-1]
66. Convert numpy.ndarray data format from datetime64 to datetime
dt64 = np.datetime64('2018-02-25 22:10:10')
#方法1
from datetime import datetime
dt64.tolist()
#方法2
dt64.astype(datetime)
67. Calculate the size of the numpy.ndarray data window
def moving_average(a, n=3) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
np.random.seed(100)
Z = np.random.randint(10, size=10)
print('array: ', Z)
#方法1
moving_average(Z, n=3).round(2)
#方法2
np.convolve(Z, np.ones(3)/3, mode='valid')
68. Specify the start, end, and step length to build a numpy.ndarray
length = 10
start = 5
step = 3
def seq(start, length, step):
end = start + (step*length)
return np.arange(start, end, step)
seq(start, length, step)
69, complete non-continuous time series numpy.ndarray
dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2)
print(dates)
#方法1
filled_in = np.array([
np.arange(date, (date + d)) for date, d in zip(dates, np.diff(dates))
]).reshape(-1)
output = np.hstack([filled_in, dates[-1]])
output
#方法2
out = []
for date, d in zip(dates, np.diff(dates)):
out.append(np.arange(date, (date + d)))
filled_in = np.array(out).reshape(-1)
output = np.hstack([filled_in, dates[-1]])
output
70. Construct a numpy.ndarray with a sliding window according to the specified step length
import numpy as np
def gen_strides(a, stride_len=5, window_len=5):
n_strides = ((a.size - window_len) // stride_len) + 1
# return np.array([a[s:(s+window_len)] for s in np.arange(0, a.size, stride_len)[:n_strides]])
return np.array([
a[s:(s + window_len)]
for s in np.arange(0, n_strides * stride_len, stride_len)
])
print(gen_strides(np.arange(15), stride_len=2, window_len=4))
Exquisite stamps in the past: NGS advancement | statistical advancement | py basics | py drawing | perl basics | R drawing