atlas是hadoop数据治理和元数据框架。
Atlas是一组可伸缩和可扩展的核心基础治理服务。
使企业能够有效地满足Hadoop中的遵从性需求,并允许与整个企业数据生态系统集成。
Apache Atlas为组织提供开放的元数据管理和治理功能,以构建数据资产的目录,对这些资产进行分类和治理,并为数据科学家、分析师和数据治理团队提供围绕这些数据资产的协作功能。
Features:
Metadata types & instances:
* 各种Hadoop和非Hadoop元数据的预定义types
* 为要管理的元数据定义新types的能力
* types 可以有原始属性、复杂属性、对象引用;可以从其他types 继承
* instances of types 类型的实例(称为实体)捕获元数据对象细节及其关系
* 用于处理类型和实例的REST api允许更容易的集成
Classification:
能够动态创建Classifications-像PII, EXPIRES_ON, DATA_QUALITY, SENSITIVE
Classifications can include attributes - like expiry_date attribute in EXPIRES_ON classification
实体可以与多个分类相关联,从而更容易地发现和安全实施
Entities can be associated with multiple classifications, enabling easier discovery and security enforcement
Propagation of classifications via lineage - automatically ensures that classifications follow the data as it goes through various processing
Lineage:
- Intuitive UI to view lineage of data as it moves through various processes
- REST APIs to access and update lineage
Search/Discovery:
- Intuitive UI to search entities by type, classification, attribute value or free-text
- Rich REST APIs to search by complex criteria
- SQL like query language to search entities - Domain Specific Language (DSL)
Security & Data Masking
- Fine grained security for metadata access, enabling controls on access to entity instances and operations like add/update/remove classifications
- Integration with Apache Ranger enables authorization/data-masking on data access based on classifications associated with entities in Apache Atlas. For example:
- who can access data classified as PII, SENSITIVE
- customer-service users can only see last 4 digits of columns classified as NATIONAL_ID