Source: zhisheng
Flink 1.10 release document describes some of the more important points, such as configuration, operation, dependent on the difference between version 1.9 and version 1.10, if you are ready to upgrade to version 1.10 Flink, recommend carefully reading the following pages.
Clustering and deployment
• File system needs to be loaded by means of plug-in • Flink client load loading strategy based on class configuration, parent-first, and child-first in two ways • Allows evenly distributed tasks on all the TaskManager, need to flink-conf.yaml
configure the profile cluster.evenly-spread-out-slots: true
parameters • High availability storage directory was revised in HA_STORAGE_DIR/HA_CLUSTER_ID
the next, HA_STORAGE_DIR
the path through high-availability.storageDir
the configuration parameters, HA_CLUSTER_ID
the path through high-availability.cluster-id
parameter configuration • when using the -yarnship
parameter command, resource directories and jar files will be added to the classpath • remove the --yn/--yarncontainer
command parameter • remove the --yst/--yarnstreaming
command • Flink Mesos parameters will reject out all the expired requests • reconstructed Flink scheduler, which aims to make scheduling policy in the future can be customized • support Java 11, when starting Flink use Java 11, will remind some WARNING log, Note: Cassandra, Hive, HBase and other connector does not use Java 11 tested
Memory Management
• New Task Executor memory model, will affect the deployment of standalone, YARN, Mesos, K8S of, JobManager memory model is not modified. If you're in the absence of adjustment, reuse previous Flink configuration, the new memory model may result in different calculation JVM memory parameters, resulting in changes in performance.
The following options have been removed, no longer work:
The following options have been replaced with other options:
• RocksDB State Backend memory can be controlled, the user can adjust the read / write memory RocksDB ratio of state.backend.rocksdb.memory.write-buffer-ratio
(0.5 default case) and a portion of memory for the index / retained by the filter state.backend.rocksdb.memory.high-prio-pool-ratio
(default 0.1) • granular operator (the Operator) resources management, configuration options table.exec.resource.external-buffer-memory
, table.exec.resource.hash-agg.memory
, table.exec.resource.hash-join.memory
, and table.exec.resource.sort.memory
has been deprecated
Table API sum SQL
• The ANY type rename RAW type, the identifier raw now is a reserved keyword, when used as a field or SQL function name must be escaped • Rename Table Connector property, in order to provide a better user experience when writing DDL statements such as Kafka Connector properties connector.properties
and connector.specific-offsets
, elasticsearch Connector properties connector.hosts
interactive method • before the temporary tables and views have been abandoned, currently used createTemporaryView () • remove the ExternalCatalog API (ExternalCatalog, SchematicDescriptor, MetadataDescriptor , StatisticsDescriptor), it recommended the use of new the Catalog API
Configuration
• ConfigOptions If the value can not be configured to parse the type of need, will throw an IllegalArgumentException, before returning to the default value will increase the default • restart delay time strategy (fixed-delay and failure-rate is already the default 1s, before is 0) • restart simplify cluster level policy configuration, now restart the cluster-level policy is only configured by the restart-strategy and determine whether to open Checkpoint disabled by default • memory-mapped based on the case BoundedBlockingSubpartition • remove unauthorized network traffic control • shift in addition to the configuration HighAvailabilityOptions in HA_JOB_DELAY
State (State)
• The default TTL open state abandoned background cleanup • StateTtlConfig#Builder#cleanupInBackground()
• When using RocksDBStateBackend, the default will be stored in RocksDB the timer before is stored • in the heap memory (Heap) StateTtlConfig#TimeCharacteristic
has been removed, currently used StateTtlConfig#TtlTimeCharacteristic
• New MapState#isEmpty()
method to check MapState is empty, this method than using a mapState.keys().iterator().hasNext()
faster 40% • RocksDB speed upgrade, released their FRocksDB (based RocksDB 5.17.2 version), mainly because of the high version of the RocksDB in some cases, performance decreases • disabled by default RocksDB logging, you need to enable then need to use RocksDBOptionsFactory create DBOptions instance and set iNFO_LEVEL • optimization mechanism to recover from RocksDB Savepoint by setInfoLogLevel method, if the recovery from the previous RocksDB Savepoint contains a large KV right, users may experience OOM. Have been introduced to limit the memory can be configured, RocksDBWriteBatchWrapper default is 2MB. RocksDB of WriteBatch will refresh before reaching the memory limit. You can flink-conf.yml
modify the state.backend.rocksdb.write-batch-size
configuration
PyFlink
• no longer supported Python2
monitor
• InfluxdbReporter skips Inf and NaN (type InfluxDB does not support, such as Double.POSITIVE_INFINITY
, Double.NEGATIVE_INFINITY
, Double.NaN
)
Connector (Connectors)
• Change Kinesis connector License
Interface Changes
• ExecutionConfig#getGlobalJobParameters()
not return null • MasterTriggerRestoreHook in triggerCheckpoint non-blocking method when necessary • HA service client / server separation, HighAvailabilityServices has been separated into client and cluster ClientHighAvailabilityServices end HighAvailabilityServices • HighAvailabilityServices#getWebMonitorLeaderElectionService()
token expires • LeaderElectionService made changes to the interface • deprecated Checkpoint lock • abandoned OptionsFactory and ConfigurableOptionsFactory Interface
Reference: https: //github.com/apache/flink/blob/master/docs/release-notes/flink-1.10.zh.md
Read this new version at the official presentation, I feel pretty much describes the lack of new features, such as:
• In version 1.10 which features in the integrated version of Blink came • Flink is not even written an introduction to native Kubernetes integrated • PyFlink is serious? • Hive production level of integration, there is no mention ah • Table API / SQL optimization point not speak too much
Probably because of space problems, there are many features that are not explain it, we have to find their own source of learning!