How to use Seaborn to implement advanced statistical charts

This article is shared from Huawei Cloud Community " Using Seaborn to implement advanced statistical charts from boxplots to multi-variable relationship exploration ", author: Lemony Hug.

In the field of data science and data visualization, Seaborn is a popular Python visualization library. It is built on the basis of Matplotlib, providing a simpler and more beautiful graphical interface, and also has some advanced statistical chart functions. This article will introduce how to use Seaborn to implement some advanced statistical charts and provide corresponding code examples.

Install Seaborn

First, make sure you have Seaborn installed. You can install it using pip:

pip install seaborn

Import necessary libraries

Before starting, we need to import Seaborn and some other commonly used data processing and visualization libraries:

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

Box Plot

The boxplot is a commonly used statistical chart used to display the distribution of data. Seaborn provides a simple and easy-to-use interface for drawing box plots.

# Generate random data
np.random.seed(0)
data = np.random.normal(loc=0, scale=1, size=100)

# Draw box plot
sns.boxplot(data=data)
plt.title('Box Plot of Random Data')
plt.show()

In this example, we generated a random set of data and plotted a boxplot using the function. Through this chart, we can intuitively understand the distribution of data, including the median, quartiles, and outliers. sns.boxplot() 

Violin Plot

The violin plot is a chart that combines boxplots and kernel density estimation to more intuitively display the distribution of data.

# Generate random data
np.random.seed(0)
data1 = np.random.normal(loc=0, scale=1, size=100)
data2 = np.random.normal(loc=2, scale=1.5, size=100)
data = np.concatenate([data1, data2])
labels = ['Group 1'] * 100 + ['Group 2'] * 100

# Draw violin plot
sns.violinplot(x=labels, y=data)
plt.title('Violin Plot of Two Groups')
plt.show()

In this example, we generated two different sets of random data and plotted a violin plot using the function. Through this chart, we can compare the distribution of the two sets of data and observe the differences. sns.violinplot() 

Heatmap

A heat map is a chart that uses color to represent a data matrix, often used to show correlation or data density.

# Generate random data
np.random.seed(0)
data = np.random.rand(10, 10)

# Draw heat map
sns.heatmap(data, annot=True, cmap='green')
plt.title('Heatmap of Random Data')
plt.show()

In this example, we generated a random 10x10 matrix and used the function to draw a heat map. Through this chart, we can intuitively understand the correlation between data and the distribution of data. sns.heatmap() 

Kernel Density Estimation Plot

Kernel density estimation plot is a non-parametric method for estimating the data density function by smoothing the observed data to generate a continuous probability density curve.

# Generate random data
np.random.seed(0)
data = np.random.normal(loc=0, scale=1, size=100)

# Draw kernel density estimation map
sns.kdeplot(data, shade=True)
plt.title('Kernel Density Estimation Plot of Random Data')
plt.show()

In this example, we generated a set of random data and plotted the kernel density estimate using the function. This chart shows the probability density distribution of the data and helps us better understand the distribution characteristics of the data. sns.kdeplot() 

Pair Plot

A pairwise relationship diagram is a type of diagram used to visualize relationships between variables in a dataset and is useful for exploratory data analysis.

# Generate random data set
np.random.seed(0)
data = pd.DataFrame(np.random.randn(100, 4), columns=['A', 'B', 'C', 'D'])

# Draw a pairwise relationship diagram
sns.pairplot(data)
plt.suptitle('Pair Plot of Random Data', y=1.02)
plt.show()

In this example, we generated a random data set and used the function to plot the pairwise relationship. This chart shows the relationship between every two variables in the data set, including scatter plots and univariate distribution plots, helping to discover patterns and correlations between variables. sns.pairplot() 

Cluster Map

A cluster diagram is a diagram used to display the similarities between variables in a data set. Similar variables are displayed in groups through a clustering algorithm.

# Generate random data set
np.random.seed(0)
data = pd.DataFrame(np.random.rand(10, 10), columns=['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'])

# Draw clustering diagram
sns.clustermap(data, cmap='viridis')
plt.title('Cluster Map of Random Data')
plt.show()

In this example, we generated a random data set and plotted the clusters using the function. This chart shows the similarities between variables in the data set, and displays similar variables in groups through a clustering algorithm, helping to discover patterns and structures between variables. sns.clustermap() 

Clustermap

Clustering heat map is a heat map based on hierarchical clustering algorithm, which clusters data and rearranges it according to the clustering results to better display the structure and correlation of the data.

# Generate random data
np.random.seed(0)
data = np.random.rand(10, 10)

# Draw clustering heatmap
sns.clustermap(data, cmap='coolwarm')
plt.title('Clustermap of Random Data')
plt.show()

In this example, we generated a random 10x10 matrix and plotted a cluster heatmap using the function. Through this chart, we can clearly see the clustering relationships between data and the similarities between different data points. sns.clustermap() 

Classification Scatter Plot (Pairplot)

A categorical scatter plot is a chart that displays the relationship between multiple variables at the same time. It is often used to explore the correlation between different variables in a data set.

#Load sample data set
iris = sns.load_dataset('iris')

# Draw a classification scatter plot
sns.pairplot(iris, hue='species', markers=['o', 's', 'D'])
plt.title('Pairplot of Iris Dataset')
plt.show()

In this example, we use the iris data set that comes with Seaborn and use the function to draw a classification scatter plot. With this chart, we can see at a glance the differences in characteristics between different iris species, as well as the correlations between different characteristics. sns.pairplot() 

Time Series Plot

A time series chart is a chart used to display time series data and is often used to analyze trends and periodicity of data over time.

# Generate time series data
dates = pd.date_range(start='2022-01-01', end='2022-12-31')
data = np.random.randn(len(dates))

# Create DataFrame
df = pd.DataFrame({'Date': dates, 'Value': data})

# Draw time series graph
sns.lineplot(x='Date', y='Value', data=df)
plt.title('Time Series Plot of Random Data')
plt.xlabel('Date')
plt.ylabel('Value')
plt.show()

In this example, we generated a random time series data and plotted the time series using the function. Through this chart, we can clearly see the trends and periodicity of the data over time. sns.lineplot() 

Advanced Color Palettes

Seaborn provides a rich palette function that can help users choose appropriate color schemes in charts to highlight key points or enhance visualization effects.

# Use advanced palette
current_palette = sns.color_palette('husl', 5)

# Draw a histogram
sns.barplot(x=np.arange(5), y=np.random.rand(5), palette=current_palette)
plt.title('Bar Plot with Advanced Color Palette')
plt.show()

In this example, we used a function to select a Husl palette and generated a palette of 5 colors. We then used this palette to draw a histogram to demonstrate the effect. sns.color_palette() 

Customized Plot Styles

Seaborn allows users to customize the appearance of charts by setting different styles to meet individual needs.

# Set custom style
sns.set_style('whitegrid')

# Draw a scatter plot
sns.scatterplot(x=np.random.randn(100), y=np.random.randn(100))
plt.title('Scatter Plot with Customized Style')
plt.show()

In this example, we use a function to set the chart style to a white grid and draw a scatter plot to demonstrate its effect. sns.set_style() 

Multi-panel drawing (Facet Grids)

Seaborn provides the function of multi-panel plotting, which can display multiple subplots at the same time to better compare the relationship between different subsets.

#Load sample data set
tips = sns.load_dataset('tips')

#Create FacetGrid object
g = sns.FacetGrid(tips, col='time', row='smoker')

# Draw violin plot
g.map(sns.violinplot, 'total_bill')
plt.show()

In this example, we use functions to create a FacetGrid object and draw violin plots in different subgraphs to show the distribution of data between different subsets. sns.FacetGrid() 

Data distribution comparison (Distribution Comparison)

Seaborn provides several ways to compare differences between different data distributions, such as using kernel density estimation or histograms.

#Load sample data set
iris = sns.load_dataset('iris')

# Draw kernel density estimation map
sns.kdeplot(data=iris, x='sepal_length', hue='species', fill=True)
plt.title('Distribution Comparison of Sepal Length')
plt.show()

In this example, we use the function to plot kernel density estimates of sepal lengths between species in the iris dataset to compare their distributions. sns.kdeplot() 

Grouped Visualization

Seaborn can easily display the grouping of data, such as using categorical variables to group and visualize data.

#Load sample data set
titanic = sns.load_dataset('titanic')

# Draw classification box plots
sns.boxplot(data=titanic, x='class', y='age', hue='sex')
plt.title('Grouped Box Plot of Age by Class and Sex')
plt.show()

In this example, we use functions to plot the effects of age on different cabin classes and genders in the Titanic dataset to compare their distributions. sns.boxplot() 

Exploring Multivariate Relationships

Seaborn provides several ways to explore relationships between multiple variables, such as using scatterplot matrices or pairwise relationship plots.

#Load sample data set
iris = sns.load_dataset('iris')

# Draw a scatterplot matrix
sns.pairplot(data=iris, hue='species')
plt.title('Pairplot for Exploring Multivariate Relationships')
plt.show()

In this example, we use functions to plot pairwise relationships between different features in the iris dataset to explore the multivariate relationships between them. sns.pairplot() 

Summarize

This article introduces how to use Seaborn to implement advanced statistical charts and provides rich code examples. First, we learned how to draw common statistical charts, including box plots, violin plots, heat maps, etc., through which the distribution and correlation of data can be visually displayed. Next, we explored advanced features, such as color palettes, custom chart styles, multi-panel drawings, etc., which can help users customize the appearance of charts and enhance visualization effects. Then, we introduced some advanced applications, such as data distribution comparison, data group display, multi-variable relationship exploration, etc. These methods can help users gain a deeper understanding of the relationships and patterns between data. By studying this article, readers can master the basic skills of using Seaborn for data visualization, and be able to use its rich functions and flexible interfaces for data analysis and exploration. Seaborn's powerful features and simple interface make it one of the indispensable tools for data scientists and analysts.

Click to follow and learn about Huawei Cloud’s new technologies as soon as possible~

The pirated resources of "Celebrating More Than Years 2" were uploaded to npm, causing npmmirror to have to suspend the unpkg service. Microsoft's China AI team collectively packed up and went to the United States, involving hundreds of people. The founder of the first front-end visualization library and Baidu's well-known open source project ECharts - "going to the sea" to support Fish scammers used TeamViewer to transfer 3.98 million! What should remote desktop vendors do? Zhou Hongyi: There is not much time left for Google. It is recommended that all products be open source. A former employee of a well-known open source company broke the news: After being challenged by his subordinates, the technical leader became furious and fired the pregnant female employee. Google showed how to run ChromeOS in an Android virtual machine. Please give me some advice. , what role does time.sleep(6) here play? Microsoft responds to rumors that China's AI team is "packing for the United States" People's Daily Online comments on office software's matryoshka-like charging: Only by actively solving "sets" can we have a future
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/11179361