Df.map(), a function or a dictionary object with mapping relationship can be passed in the parameter
Df.replace(, ), replace a value with another value, you can replace multiple values at once, and each value can have a different replacement value, and the passed-in parameter can also be a dictionary
Pd.rename(index = str.title, columns = str.upper), rename the axis index, here the index is set to the original index first letter in uppercase, columns is set to the original all uppercase
Pd.cut(bins, labels), divide the data, and the division is based on bins. For example, if bins is [18, 25, 35, 60, 100], then it is opened before and closed. The labels parameter is used to set For the face element name, if the number of face elements is passed in, the equal-length split will be calculated according to the minimum and maximum values of the data
Pd.qcut(), can divide the data according to the quantile, this is that each division has the same frequency, the number of divisions that need to be passed in, and it can also support the input of custom quantiles, such as [0, 0.1, 0.5, 0.9, 1.]
Np.sign(), sign function
Np.random.permutation(), produces an integer array representing the new order
Df.take(), get data
Df.sample(n = 3, replace = True), select a random subset, replace the parameter, and whether there is any replacement data
Pd.get_dummies(df['key'], prefix ='key'), convert categorical variables into "dummy variables", the prefix parameter is to add a prefix to the DataFrame column, df_with_dummy = df[['data1']]. join(dummies)
Pd.unique(), returns a unique value
Pd.get_dummies(pd.cut(values, bins)), get_dummies and cut combined operation
'::'.join(pieces), connect all elements with two colons
Python built-in string methods:
Count: returns the number of occurrences of the substring in the string
Endswith, startswith: If the string ends with a suffix, return True
Join: Join strings to other string sequences
Index: If a substring is found in the string, return the position of the first character, if not return -1
Find: Returns the position of the first character of the first found substring, or -1 if not
Rfind: Returns the position of the first character of the last found substring, without returning -1
Repalce: replace the specified substring with another string
Strip, rstrip, lstrip, out of white space (including line breaks)
Split, split into a string of substrings by the specified separator
lower, upper, convert the string to uppercase and lowercase, respectively
Ljust, rjust, fill the blanks of the string with spaces
Ser.str.contains('gmail'), judge whether it contains a string
Hierarchical index,
Df.unstack(), unlock the hierarchical index
df.stack(), converted into a hierarchical index
Df.swaplevel('key1','key2'), change the order of these two levels
Df.sort_index(level = 1), sort according to level 1
Frame.swaplevel(0, 1).sort_index(level = 0)
Frame.sum(level ='key2'), summarize statistics according to a certain level
Df.set_index(['a','d'], drop = True), convert one or more columns to row index, and create a new DataFrame, the drop parameter is whether to delete those columns, False means not Delete, keep
Df.reset_index(), transfer the hierarchical index to the column