DistributedFileSystem这个类在包package org.apache.hadoop.hdfs中,为用户开发基于HDFS的应用程序提供了API,这个类有几个成员变量:
private Path workingDir;
private URI uri;
private String homeDirPrefix = DFSConfigKeys.DFS_USER_HOME_DIR_PREFIX_DEFAULT;
DFSClient dfs;
private boolean verifyChecksum = true;
DistributedFileSystem类的继承关系如下:
既然DistributedFileSystem类是用来对开发人员提供api接口服务的,那么开发人员该如何去使用它呢?我们看一下下面的例子:
//读取配置文件
Configuration conf = new Configuration();
//获取文件系统
FileSystem fs = FileSystem.get(URI.create("hdfs://hadoop1:9000"),conf);
Path srcPath = new Path(path);
//调用mkdir()创建目录,(可以一次性创建,以及不存在的父目录)
boolean flag = fs.mkdirs(srcPath);
if(flag) {
System.out.println("create dir ok!");
}else {
System.out.println("create dir failure");
}
//关闭文件系统
fs.close();
我们发现,例子中直接使用了DistributedFileSystem父类FileSystem,而没有使用DistributedFileSystem,这个是为什么呢?接下来我们到FileSystem类的get方法中:
/** Returns the FileSystem for this URI's scheme and authority. The scheme
* of the URI determines a configuration property name,
* <tt>fs.<i>scheme</i>.class</tt> whose value names the FileSystem class.
* The entire URI is passed to the FileSystem instance's initialize method.
*/
public static FileSystem get(URI uri, Configuration conf) throws IOException {
//uri是hdfs文件的路径
//下面用到了URI类,关于这个类的使用,可以到https://blog.csdn.net/weixin_39935887/article/details/81432814和https://www.jianshu.com/p/58b9245a6f16中了解详情
String scheme = uri.getScheme();//获取一个url中的协议,比如https或者http等
String authority = uri.getAuthority();
if (scheme == null && authority == null) { // use default FS
return get(conf);//如果协议和域名都为null,那么就采用默认的FS
}
if (scheme != null && authority == null) { // no authority
URI defaultUri = getDefaultUri(conf);//如果有协议,但是没有authority,那么就调用getDefaultUri函数
if (scheme.equals(defaultUri.getScheme()) // if scheme matches default
&& defaultUri.getAuthority() != null) { // & default has authority
return get(defaultUri, conf); // return default
}
}
String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme);
if (conf.getBoolean(disableCacheName, false)) {
return createFileSystem(uri, conf);
}
return CACHE.get(uri, conf);
}
我们进入到get(conf)函数中:
/**
* Returns the configured filesystem implementation.
* @param conf the configuration to use
*/
public static FileSystem get(Configuration conf) throws IOException {
return get(getDefaultUri(conf), conf);
}
这里面调用了getDefaultUri函数,我们进入到这个函数中看看
/** Get the default filesystem URI from a configuration.
* @param conf the configuration to use
* @return the uri of the default filesystem
*/
public static URI getDefaultUri(Configuration conf) {
//conf.get(FS_DEFAULT_NAME_KEY, DEFAULT_FS)用来获取名称,同时如果该名称是被废弃的,那么就通过fixName来进行修 //复,并提示
return URI.create(fixName(conf.get(FS_DEFAULT_NAME_KEY, DEFAULT_FS)));
}
这里面首先调用Configuration类中的get函数,用来获取一个url地址,其中:
FS_DEFAULT_NAME_KEY值为fs.defaultFS
DEFAULT_FS值为file:///
我们现在进入Configuration类的get函数中:
/**
* Get the value of the <code>name</code>. If the key is deprecated,
* it returns the value of the first key which replaces the deprecated key
* and is not null.
* If no such property exists,
* then <code>defaultValue</code> is returned.
*
* @param name property name, will be trimmed before get value.
* @param defaultValue default value.
* @return property value, or <code>defaultValue</code> if the property
* doesn't exist.
*/
public String get(String name, String defaultValue) {
//返回name在可能被废弃的情况下可以采用的新的key名称
String[] names = handleDeprecation(deprecationContext.get(), name);
String result = null;
for(String n : names) {
result = substituteVars(getProps().getProperty(n, defaultValue));
}
return result;
}
deprecationContext.get()会返回一个DeprecationContext类型对象,关于该类的详细描述可以看Configuration类相关介绍,我们现在来分析hadleDeprecation函数,代码如下:
/**
* Checks for the presence of the property <code>name</code> in the
* deprecation map. Returns the first of the list of new keys if present
* in the deprecation map or the <code>name</code> itself. If the property
* is not presently set but the property map contains an entry for the
* deprecated key, the value of the deprecated key is set as the value for
* the provided property name.
*
* @param name the property name
* @return the first property in the list of properties mapping
* the <code>name</code> or the <code>name</code> itself.
*/
private String[] handleDeprecation(DeprecationContext deprecations,
String name) {
if (null != name) {
name = name.trim();
}
ArrayList<String > names = new ArrayList<String>();
//判断name是否存在于deprecations中,即name是否是被废弃key
if (isDeprecated(name)) {
//如果是,那么就获取到替换key和相应的描述
DeprecatedKeyInfo keyInfo = deprecations.getDeprecatedKeyMap().get(name);
//打印警告日志,提示该key属于被废弃的,应该用新key来替换
warnOnceIfDeprecated(deprecations, name);
//遍历替换的新key
for (String newKey : keyInfo.newKeys) {
if(newKey != null) {
//将新key添加到names队列中
names.add(newKey);
}
}
}
if(names.size() == 0) {
//如果不属于废弃的key,那么就直接添加到names中
names.add(name);
}
//遍历可以替换的新key
for(String n : names) {
//获取新key要替换的废弃key
String deprecatedKey = deprecations.getReverseDeprecatedKeyMap().get(n);
//如果要废弃的key不为null,而且Properties类对象overlay中不包含新key,同时overlay中包含废弃key,那么就将新key作为名称,值为 //老key对应的值分别保存到properties和overlay中。
if (deprecatedKey != null && !getOverlay().containsKey(n) &&
getOverlay().containsKey(deprecatedKey)) {
getProps().setProperty(n, getOverlay().getProperty(deprecatedKey));
getOverlay().setProperty(n, getOverlay().getProperty(deprecatedKey));
}
}
//将新key数组返回。
return names.toArray(new String[names.size()]);
}
说白了handleDeprecation函数就是判断name是否是被废弃的key,如果是那么就从deprecations中找到替代的新key并返回。接下来执行代码:
for(String n : names) {
//根据key值
result = substituteVars(getProps().getProperty(n, defaultValue));
}
对新的替代key进行遍历,通过getProps().getProperty(n, defaultValue)获取到n(类似于xml中的<name></name>)对应的value(类似于xml中的<value></value>),如果没有找到那么就采用defaultValue默认值。然后看看substituteVars函数,
private String substituteVars(String expr) {
if (expr == null) {
return null;
}
Matcher match = VAR_PATTERN.matcher("");
String eval = expr;
for(int s=0; s<MAX_SUBST; s++) {
match.reset(eval);
if (!match.find()) {
return eval;
}
String var = match.group();
var = var.substring(2, var.length()-1); // remove ${ .. }
String val = null;
try {
val = System.getProperty(var);
} catch(SecurityException se) {
LOG.warn("Unexpected SecurityException in Configuration", se);
}
if (val == null) {
val = getRaw(var);
}
if (val == null) {
return eval; // return literal ${var}: var is unbound
}
// substitute
eval = eval.substring(0, match.start())+val+eval.substring(match.end());
}
throw new IllegalStateException("Variable substitution depth too large: "
+ MAX_SUBST + " " + expr);
}
substituteVars主要是用来将key转换一下,例如key值为${hadoop.tmp.dir}/dfs/name,那么会将${hadoop.tmp.dir}转换成相应的实际值,然后替换${hadoop.tmp.dir}并返回,关于这方面的详解,请看访问
我们回到FileSystem类的getDefaultUri函数中,return URI.create(fixName(conf.get(FS_DEFAULT_NAME_KEY, DEFAULT_FS)));
这行代码中conf.get(FS_DEFAULT_NAME_KEY, DEFAULT_FS))返回FS_DEFAULT_NAME_KEY这个key的对应的值,也就是xml中的value,然后调用fixName函数,如果get中返回的值为local,那么说明为本地路径,那么fixName函数返回file:///,如果包含/字符,那么说明是一个远程路径,在前面添加hdfs://,比如值为hadoop/dfs/name,那么fixName函数调用后返回hdfs://hadoop/dfs/name。然后调用create函数创建URI,路径为hdfs://hadoop/dfs/name。回到FileSystem get(URI uri, Configuration conf)函数中,继续执行下面的代码:
//如果URI路径为hdfs://hadoop/dfs/name,那么scheme就为hdfs,disableCacheName就为fs.hdfs.impl.disable.cache
String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme);
//那么此时disableCacheName为fs.hdfs.impl.disable.cache
//到配置类对象conf中去找name为fs.hdfs.impl.disable.cache(相当于xml中的<name></name>)对应的值,如果找到了那么返回相应的值,否则返回false。
if (conf.getBoolean(disableCacheName, false)) {
//如果找到了,那么开始创建文件对象
return createFileSystem(uri, conf);
}
return CACHE.get(uri, conf);
//我们开始进入createFileSystem函数中,代码如下:
private static FileSystem createFileSystem(URI uri, Configuration conf
) throws IOException {
Class<?> clazz = getFileSystemClass(uri.getScheme(), conf);
if (clazz == null) {
throw new IOException("No FileSystem for scheme: " + uri.getScheme());
}
FileSystem fs = (FileSystem)ReflectionUtils.newInstance(clazz, conf);
fs.initialize(uri, conf);
return fs;
}
例子中"hdfs://hadoop1:9000"通过调用getScheme()函数,返回值是hdfs,我们进入到getFileSystemClass函数中:
public static Class<? extends FileSystem> getFileSystemClass(String scheme,Configuration conf) throws IOException {
if (!FILE_SYSTEMS_LOADED) {
loadFileSystems();
}
Class<? extends FileSystem> clazz = null;
if (conf != null) {
clazz = (Class<? extends FileSystem>) conf.getClass("fs." + scheme + ".impl", null);
}
if (clazz == null) {
clazz = SERVICE_FILE_SYSTEMS.get(scheme);
}
if (clazz == null) {
throw new IOException("No FileSystem for scheme: " + scheme);
}
return clazz;
}
从代码中可以看出,该函数会首先从Configuration获取到fs.hdfs.impl对应的class,由于Configuration加载了配置文件,所以会到配置文件中找到fs.hdfs.impl对应的class,而该class为org.apache.hadoop.hdfs.DistributedFileSystem,所以在createFileSystem函数中:
FileSystem fs = (FileSystem)ReflectionUtils.newInstance(clazz, conf);
fs.initialize(uri, conf);
先创建该clazz,然后再进行初始化,我们进入到org.apache.hadoop.hdfs.DistributedFileSystem下的initialize函数,
@Override
public void initialize(URI uri, Configuration conf) throws IOException {
super.initialize(uri, conf);
setConf(conf);
//host为hadoop1
String host = uri.getHost();
if (host == null) {
throw new IOException("Incomplete HDFS URI, no host: "+ uri);
}
homeDirPrefix = conf.get(
DFSConfigKeys.DFS_USER_HOME_DIR_PREFIX_KEY,
DFSConfigKeys.DFS_USER_HOME_DIR_PREFIX_DEFAULT);
//创建DFSClient对象
this.dfs = new DFSClient(uri, conf, statistics);
//创建URI,值为hdfs://hadoop1:9000
this.uri = URI.create(uri.getScheme()+"://"+uri.getAuthority());
//设置工作目录
this.workingDir = getHomeDirectory();
}
我们先进入super.initialize(uri,conf)中,代码如下:
/** Called after a new FileSystem instance is constructed.
* @param name a uri whose authority section names the host, port, etc.
* for this FileSystem
* @param conf the configuration
*/
public void initialize(URI name, Configuration conf) throws IOException {
//调用父类的初始化
statistics = getStatistics(name.getScheme(), getClass());
resolveSymlinks = conf.getBoolean(
CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_KEY,
CommonConfigurationKeys.FS_CLIENT_RESOLVE_REMOTE_SYMLINKS_DEFAULT);
}
至此,DistributedFileSystem就结束了,现在进入到DFSClient中,开始文件数据的读写操作。