AWS Redshift 学习笔记

Amazon Redshift is a fast, powerful, fully managed, petabyte-scale data warehouse service in the cloud. Amazon Redshift is a relational database designed for OLAP scenarios and optimized for high-performance analysis and reporting of very large datasets.

Clusters and Nodes

The key component of an Amazon Redshift data warehouse is a cluster.A cluster is composed of a leader node and one or more compute nodes.The client application interacts directly only with the leader node, and the compute nodes are transparent to external applications.

The Dense Compute node types support clusters up to 326TB using fast SSDs, while the Dense Storage nodes support clusters up to 2PB using large magnetic disks.

Each cluster contains one or more databases. User data for each table is distributed across the compute nodes.Your application or SQL client communicates with Amazon Redshift using standard JDBC or ODBC connections with the leader node.Your application does not interact directly with the compute nodes.

Your application does not interact directly with the compute nodes.The number of slices
per node depends on the node size of the cluster and typically varies between 2 and 16.The nodes all participate in parallel query execution, working on data that is distributed as evenly as possible across the slices.

Amazon Redshift allows you to resize a cluster to add storage and compute capacity over time as your needs evolve. You can also change the node type of a cluster and keep the overall size the same. Whenever you perform a resize operation, Amazon Redshift will create a new cluster and migrate data from the old cluster to the new one. During a resize operation, the database will become read-only until the operation is finished.

Compression Encoding

One of the key performance optimizations used by Amazon Redshift is data compression.

Distribution Strategy

One of the primary decisions when creating a table in Amazon Redshift is how to distribute the records across the nodes and slices in a cluster.
When creating a table, you can choose between one of three distribution styles: EVEN, KEY, or ALL.

EVEN distribution This is the default option and results in the data being distributed across the slices in a uniform fashion regardless of the data.

KEY distribution With KEY distribution, the rows are distributed according to the values in one column. The leader node will store matching values close together and increase query performance for joins.

ALL distribution With ALL, a full copy of the entire table is distributed to every node. This is useful for lookup tables and other large tables that are not updated frequently.

Sort Keys

Sorting enables efficient handling of range-restricted predicates.
The sort keys for a table can be either compound or interleaved.

Loading Data

Amazon Redshift provides the COPY command as a much more efficient alternative than repeatedly calling INSERT.
A COPY command can load data into a table in the most efficient manner, and it
COPY supports multiple types of input data sources. The fastest way to load data into Amazon Redshift is doing bulk data loads from flat files stored in an Amazon Simple Storage Service (Amazon S3) bucket or from an Amazon DynamoDB table.

When loading data from Amazon S3, the command can read from multiple files at the
COPY same time.

you will need to perform a command to reorganize your data and reclaim space after deletes. It is also VACUUM recommended to run an command to update table statistics
ANALYZE.

Data can also be exported out of Amazon Redshift using the command. This command
UNLOAD can be used to generate delimited text files and store them in Amazon S3.

Querying Data

you can configure Workload Management (WLM) to queue and prioritize queries.
WLM allows you define multiple queues and set the concurrency level for each queue. For example, you might want to have one queue set up for long-running queries and limit the concurrency and another queue for short-running queries and allow higher levels of concurrency.

Snapshots

Amazon Redshift supports both automated snapshots and manual snapshots.
You can also perform manual snapshots and share them across regions or even with other AWS accounts.Manual snapshots are retained until you explicitly delete them.

Security

Amazon Redshift supports encryption of data in transit using SSL-encrypted connections, and also encryption of data at rest using multiple techniques. To encrypt data at rest, Amazon Redshift integrates with KMS and AWS CloudHSM for encryption key management services.