Druid自身为各种外围功能定义了很多接口,比如存储就定义了:
- DataSegmentArchiver:用于对segment文件进行archive与restore,可用在s3之类的存储上,将暂时不用的segment放入到别的bucket中。
- DataSegmentFinder:用于在特定的目录下查找Druid segment,有时会根据正确的loadSpec更新deep storage上所有的descriptor.json文件。
- DataSegmentKiller:对segment文件进行删除操作。
- DataSegmentMover:用于移动segment文件。
- DataSegmentPuller:将特定segment的数据提取到特定的目录中。
- DataSegementPusher:将特定的segment的数据从特定的目录中提取出来。
其中HDFS存储实现了其中的4个接口:
- HdfsDataSegmentFinder
- HdfsDataSegementKiller
- HdfsDataSegmentPuller
- HdfsDataSegmentPusher
1. HdfsDataSegmentFinder:
@Override public Set<DataSegment> findSegments(String workingDirPathStr, boolean updateDescriptor) throws SegmentLoadingException { final Set<DataSegment> segments = Sets.newHashSet(); final Path workingDirPath = new Path(workingDirPathStr); FileSystem fs; try { fs = workingDirPath.getFileSystem(config); log.info(fs.getScheme()); log.info("FileSystem URI:" + fs.getUri().toString()); if (!fs.exists(workingDirPath)) { throw new SegmentLoadingException("Working directory [%s] doesn't exist.", workingDirPath); } if (!fs.isDirectory(workingDirPath)) { throw new SegmentLoadingException("Working directory [%s] is not a directory!?", workingDirPath); } final RemoteIterator<LocatedFileStatus> it = fs.listFiles(workingDirPath, true); while (it.hasNext()) { final LocatedFileStatus locatedFileStatus = it.next(); final Path path = locatedFileStatus.getPath(); if (path.getName().equals("descriptor.json")) { final Path indexZip = new Path(path.getParent(), "index.zip"); if (fs.exists(indexZip)) { final DataSegment dataSegment = mapper.readValue(fs.open(path), DataSegment.class); log.info("Found segment [%s] located at [%s]", dataSegment.getIdentifier(), indexZip); final Map<String, Object> loadSpec = dataSegment.getLoadSpec(); final String pathWithoutScheme = indexZip.toUri().getPath(); if (!loadSpec.get("type").equals(HdfsStorageDruidModule.SCHEME) || !loadSpec.get("path") .equals(pathWithoutScheme)) { loadSpec.put("type", HdfsStorageDruidModule.SCHEME); loadSpec.put("path", pathWithoutScheme); if (updateDescriptor) { log.info("Updating loadSpec in descriptor.json at [%s] with new path [%s]", path, pathWithoutScheme); mapper.writeValue(fs.create(path, true), dataSegment); } } segments.add(dataSegment); } else { throw new SegmentLoadingException( "index.zip didn't exist at [%s] while descripter.json exists!?", indexZip ); } } } } catch (IOException e) { throw new SegmentLoadingException(e, "Problems interacting with filesystem[%s].", workingDirPath); } return segments; }
从以上代码中可以看出,该方法是从特定的hdfs目录中获取符合条件的segments。如果updateDescriptor参数为true,将更新其descriptor.json文件。
2. HdfsDataSegmentKiller:用于在hdfs上将特定的segment删除。代码如下:
@Override public void kill(DataSegment segment) throws SegmentLoadingException { final Path path = getPath(segment); log.info("killing segment[%s] mapped to path[%s]", segment.getIdentifier(), path); try { if (path.getName().endsWith(".zip")) { final FileSystem fs = path.getFileSystem(config); if (!fs.exists(path)) { log.warn("Segment Path [%s] does not exist. It appears to have been deleted already.", path); return ; } // path format -- > .../dataSource/interval/version/partitionNum/xxx.zip Path partitionNumDir = path.getParent(); if (!fs.delete(partitionNumDir, true)) { throw new SegmentLoadingException( "Unable to kill segment, failed to delete dir [%s]", partitionNumDir.toString() ); } //try to delete other directories if possible Path versionDir = partitionNumDir.getParent(); if (safeNonRecursiveDelete(fs, versionDir)) { Path intervalDir = versionDir.getParent(); if (safeNonRecursiveDelete(fs, intervalDir)) { Path dataSourceDir = intervalDir.getParent(); safeNonRecursiveDelete(fs, dataSourceDir); } } } else { throw new SegmentLoadingException("Unknown file type[%s]", path); } } catch (IOException e) { throw new SegmentLoadingException(e, "Unable to kill segment"); } } private boolean safeNonRecursiveDelete(FileSystem fs, Path path) { try { return fs.delete(path, false); } catch (Exception ex) { return false; } } private Path getPath(DataSegment segment) { return new Path(String.valueOf(segment.getLoadSpec().get(PATH_KEY))); }
3. HdfsDataSegmentPuller:用于将hdfs上的Segment提取到本地目录中。
@Override public void getSegmentFiles(DataSegment segment, File dir) throws SegmentLoadingException { getSegmentFiles(getPath(segment), dir); } public FileUtils.FileCopyResult getSegmentFiles(final Path path, final File outDir) throws SegmentLoadingException { final LocalFileSystem localFileSystem = new LocalFileSystem(); try { final FileSystem fs = path.getFileSystem(config); if (fs.isDirectory(path)) { // -------- directory --------- try { return RetryUtils.retry( new Callable<FileUtils.FileCopyResult>() { @Override public FileUtils.FileCopyResult call() throws Exception { if (!fs.exists(path)) { throw new SegmentLoadingException("No files found at [%s]", path.toString()); } final RemoteIterator<LocatedFileStatus> children = fs.listFiles(path, false); final ArrayList<FileUtils.FileCopyResult> localChildren = new ArrayList<>(); final FileUtils.FileCopyResult result = new FileUtils.FileCopyResult(); while (children.hasNext()) { final LocatedFileStatus child = children.next(); final Path childPath = child.getPath(); final String fname = childPath.getName(); if (fs.isDirectory(childPath)) { log.warn("[%s] is a child directory, skipping", childPath.toString()); } else { final File outFile = new File(outDir, fname); // Actual copy fs.copyToLocalFile(childPath, new Path(outFile.toURI())); result.addFile(outFile); } } log.info( "Copied %d bytes from [%s] to [%s]", result.size(), path.toString(), outDir.getAbsolutePath() ); return result; } }, shouldRetryPredicate(), DEFAULT_RETRY_COUNT ); } catch (Exception e) { throw Throwables.propagate(e); } } else if (CompressionUtils.isZip(path.getName())) { // -------- zip --------- final FileUtils.FileCopyResult result = CompressionUtils.unzip( new ByteSource() { @Override public InputStream openStream() throws IOException { return getInputStream(path); } }, outDir, shouldRetryPredicate(), false ); log.info( "Unzipped %d bytes from [%s] to [%s]", result.size(), path.toString(), outDir.getAbsolutePath() ); return result; } else if (CompressionUtils.isGz(path.getName())) { // -------- gzip --------- final String fname = path.getName(); final File outFile = new File(outDir, CompressionUtils.getGzBaseName(fname)); final FileUtils.FileCopyResult result = CompressionUtils.gunzip( new ByteSource() { @Override public InputStream openStream() throws IOException { return getInputStream(path); } }, outFile ); log.info( "Gunzipped %d bytes from [%s] to [%s]", result.size(), path.toString(), outFile.getAbsolutePath() ); return result; } else { throw new SegmentLoadingException("Do not know how to handle file type at [%s]", path.toString()); } } catch (IOException e) { throw new SegmentLoadingException(e, "Error loading [%s]", path.toString()); } }
4. HdfsDataSegmentPusher:将文件从特定的目录下copy到hdfs上:
@Override public DataSegment push(File inDir, DataSegment segment) throws IOException { final String storageDir = DataSegmentPusherUtil.getHdfsStorageDir(segment); log.info( "Copying segment[%s] to HDFS at location[%s/%s]", segment.getIdentifier(), config.getStorageDirectory(), storageDir ); Path outFile = new Path(String.format("%s/%s/index.zip", config.getStorageDirectory(), storageDir)); FileSystem fs = outFile.getFileSystem(hadoopConfig); fs.mkdirs(outFile.getParent()); log.info("Compressing files from[%s] to [%s]", inDir, outFile); final long size; try (FSDataOutputStream out = fs.create(outFile)) { size = CompressionUtils.zip(inDir, out); } return createDescriptorFile( segment.withLoadSpec(makeLoadSpec(outFile)) .withSize(size) .withBinaryVersion(SegmentUtils.getVersionFromDir(inDir)), outFile.getParent(), fs ); } private DataSegment createDescriptorFile(DataSegment segment, Path outDir, final FileSystem fs) throws IOException { final Path descriptorFile = new Path(outDir, "descriptor.json"); log.info("Creating descriptor file at[%s]", descriptorFile); ByteSource .wrap(jsonMapper.writeValueAsBytes(segment)) .copyTo(new HdfsOutputStreamSupplier(fs, descriptorFile)); return segment; } private ImmutableMap<String, Object> makeLoadSpec(Path outFile) { return ImmutableMap.<String, Object>of("type", "hdfs", "path", outFile.toString()); }