Flink1.10 features analysis

Source: zhisheng

Flink 1.10 release document describes some of the more important points, such as configuration, operation, dependent on the difference between version 1.9 and version 1.10, if you are ready to upgrade to version 1.10 Flink, recommend carefully reading the following pages.

Clustering and deployment

• File system needs to be loaded by means of plug-in • Flink client load loading strategy based on class configuration, parent-first, and child-first in two ways • Allows evenly distributed tasks on all the TaskManager, need to  flink-conf.yaml configure the profile  cluster.evenly-spread-out-slots: true parameters • High availability storage directory was revised in  HA_STORAGE_DIR/HA_CLUSTER_ID the next, HA_STORAGE_DIR the path through  high-availability.storageDir the configuration parameters, HA_CLUSTER_ID the path through  high-availability.cluster-id parameter configuration • when using the  -yarnship parameter command, resource directories and jar files will be added to the classpath • remove the  --yn/--yarncontainer command parameter • remove the  --yst/--yarnstreaming command • Flink Mesos parameters will reject out all the expired requests • reconstructed Flink scheduler, which aims to make scheduling policy in the future can be customized • support Java 11, when starting Flink use Java 11, will remind some WARNING log, Note: Cassandra, Hive, HBase and other connector does not use Java 11 tested

Memory Management

• New Task Executor memory model, will affect the deployment of standalone, YARN, Mesos, K8S of, JobManager memory model is not modified. If you're in the absence of adjustment, reuse previous Flink configuration, the new memory model may result in different calculation JVM memory parameters, resulting in changes in performance.

The following options have been removed, no longer work:

The following options have been replaced with other options:

 

• RocksDB State Backend memory can be controlled, the user can adjust the read / write memory RocksDB ratio of  state.backend.rocksdb.memory.write-buffer-ratio(0.5 default case) and a portion of memory for the index / retained by the filter  state.backend.rocksdb.memory.high-prio-pool-ratio(default 0.1) • granular operator (the Operator) resources management, configuration options  table.exec.resource.external-buffer-memory, table.exec.resource.hash-agg.memory, table.exec.resource.hash-join.memory, and  table.exec.resource.sort.memory has been deprecated

Table API sum SQL

• The ANY type rename RAW type, the identifier raw now is a reserved keyword, when used as a field or SQL function name must be escaped • Rename Table Connector property, in order to provide a better user experience when writing DDL statements such as Kafka Connector properties  connector.properties and  connector.specific-offsets, elasticsearch Connector properties  connector.hostsinteractive method • before the temporary tables and views have been abandoned, currently used createTemporaryView () • remove the ExternalCatalog API (ExternalCatalog, SchematicDescriptor, MetadataDescriptor , StatisticsDescriptor), it recommended the use of new the Catalog API

Configuration

• ConfigOptions If the value can not be configured to parse the type of need, will throw an IllegalArgumentException, before returning to the default value will increase the default • restart delay time strategy (fixed-delay and failure-rate is already the default 1s, before is 0) • restart simplify cluster level policy configuration, now restart the cluster-level policy is only configured by the restart-strategy and determine whether to open Checkpoint disabled by default • memory-mapped based on the case BoundedBlockingSubpartition • remove unauthorized network traffic control • shift in addition to the configuration HighAvailabilityOptions in HA_JOB_DELAY

State (State)

• The default TTL open state abandoned background cleanup •  StateTtlConfig#Builder#cleanupInBackground()• When using RocksDBStateBackend, the default will be stored in RocksDB the timer before is stored • in the heap memory (Heap) StateTtlConfig#TimeCharacteristic has been removed, currently used  StateTtlConfig#TtlTimeCharacteristic• New  MapState#isEmpty() method to check MapState is empty, this method than using a  mapState.keys().iterator().hasNext() faster 40% • RocksDB speed upgrade, released their FRocksDB (based RocksDB 5.17.2 version), mainly because of the high version of the RocksDB in some cases, performance decreases • disabled by default RocksDB logging, you need to enable then need to use RocksDBOptionsFactory create DBOptions instance and set iNFO_LEVEL • optimization mechanism to recover from RocksDB Savepoint by setInfoLogLevel method, if the recovery from the previous RocksDB Savepoint contains a large KV right, users may experience OOM. Have been introduced to limit the memory can be configured, RocksDBWriteBatchWrapper default is 2MB. RocksDB of WriteBatch will refresh before reaching the memory limit. You can  flink-conf.yml modify the  state.backend.rocksdb.write-batch-size configuration

PyFlink

• no longer supported Python2

monitor

• InfluxdbReporter skips Inf and NaN (type InfluxDB does not support, such as  Double.POSITIVE_INFINITYDouble.NEGATIVE_INFINITYDouble.NaN)

Connector (Connectors)

• Change Kinesis connector License

Interface Changes

ExecutionConfig#getGlobalJobParameters() not return null • MasterTriggerRestoreHook in triggerCheckpoint non-blocking method when necessary • HA service client / server separation, HighAvailabilityServices has been separated into client and cluster ClientHighAvailabilityServices end HighAvailabilityServices • HighAvailabilityServices#getWebMonitorLeaderElectionService() token expires • LeaderElectionService made changes to the interface • deprecated Checkpoint lock • abandoned OptionsFactory and ConfigurableOptionsFactory Interface

Reference: https: //github.com/apache/flink/blob/master/docs/release-notes/flink-1.10.zh.md


Read this new version at the official presentation, I feel pretty much describes the lack of new features, such as:

• In version 1.10 which features in the integrated version of Blink came • Flink is not even written an introduction to native Kubernetes integrated • PyFlink is serious? • Hive production level of integration, there is no mention ah • Table API / SQL optimization point not speak too much

Probably because of space problems, there are many features that are not explain it, we have to find their own source of learning!

Published 277 original articles · won praise 65 · views 380 000 +

Guess you like

Origin blog.csdn.net/ailiandeziwei/article/details/104476191