flink Transitive Closure algorithm to achieve up to find a new path

 

1, Transitive Closure translation closures passed? I think the literal translation is not accurate, translation should be passed until the characteristic features turned off, also in line with this example, the transmission path, find the path up to, up until the path does not exist (ie closed).

2, the code is very simple, which some concepts directed at the core principle, detailed look at the comment.

/ ** 
 * @author: xu.dm 
 * @date: 2019/7/3 11:41 
 * @Version: 1.0 
 * @Description: transitive closure algorithms, according to the present embodiment, the pair of paths is to find and generate new up to paths 
 * for example: 1-2,2-4 two pairs of data, you can draw up a new path 1-4. 
 * 
 * Iterative algorithm steps: 
 * 1, paired data sets acquired Edges, which comprises paths, such as 1-> 2,2-> 4,2-> 5, etc., if the edge is undirected, the data set may also be reversed union data before. According to this embodiment have edge treatment 
 * 2, can generate paths head iterations iterative dataset 
 * 3, and the original set of data paths with edges do join connection, identify data nextPaths connected end to end, i.e. the like 1-> 2,2- > 4 this, then generate a new path 1-> 4. 
 * 4, and the new set of paths nextPaths iterative dataset header paths and set operations, union operation i.e., generate a new nextPaths, this time it contains both old and new data 
 * here always nextPaths> = paths 
 *. 5, to re-operation, the first iteration will not be repeated, but the second iteration start data will be duplicated by groupBy whole field, the first packet to go to achieve the effect of weight 
 * 6, or more complete core body iterations, required later forming a closed loop iteration, the iteration is determined exit criteria 
 * 7, exit it works: after each iteration, you need to check whether the new path is generated, if not it means the end of the iteration can
 * 8, the pathfinding up step is completed, and by comparing nextPaths Paths, if nextPaths> paths, path represents a new generation, it is necessary to continue the iteration until Paths = nextPaths 
 *. 9, an important concept here iteration Paths and nextPaths is continuously updated by the iterative loop 
 * 10, the present embodiment the data flow in the head and tail of iterations iteration:. paths-> nextPaths-> Paths 
 *. 11, the present embodiment to achieve the effect of delta iteration in an iterative manner by bulk 
 * * / 
public  class {TransitiveClosureNaive
     public  static  void main (String args []) throws Exception {
         // Checking INPUT Parameters 
        Final ParameterTool the params = ParameterTool.fromArgs (args); 

        // SET up Execution Environment 
        ExecutionEnvironment the env = ExecutionEnvironment.getExecutionEnvironment (); 

        // the make Available Parameters The Web interface in
        env.getConfig().setGlobalJobParameters(params);

        final int maxIterations = params.getInt("iterations", 10);

        DataSet<Tuple2<Long, Long>> edges;
        if(params.has("edges")){
            edges = env.readCsvFile(params.get("edges")).fieldDelimiter(" ").types(Long.class, Long.class);
        }else {
            System.out.println("Executing TransitiveClosureNaive example with default edges data set.");
            System.out.println("Use --edges to specify file input.");
            edges = ConnectedComponentsData.getDefaultEdgeDataSet(env);
        }

        IterativeDataSet<Tuple2<Long,Long>> paths = edges.iterate(maxIterations);

        DataSet<Tuple2<Long,Long>> nextPaths = paths
                .join(edges)
                .where(1)
                .equalTo(0)
                .with(new JoinFunction<Tuple2<Long, Long>, Tuple2<Long, Long>, Tuple2<Long, Long>>() {
                    /**
                     left: Path (z,x) - 通过z可达x
                     right: Edge (x, y) - x up by Y 
                     OUT: the Path (z, Y) - up to a final output z Y 
                     * / 
                    @Override 
                    public Tuple2 <Long, Long> the Join (Tuple2 <Long, Long> left, Tuple2 <Long, Long> right) throws Exception {
                         return  new new Tuple2 <> (left.f0, right.f1); 
                    } 
                }) 
                // similar withForwardedFieldsFirst lossless forwarding this statement semantics, is optional, it helps to improve the optimization flink generate more efficient execution plans
                 // forwarding a first input Tuple2 <Long, Long> in the first field, second field of the second forward input Tuple2 <Long, Long> in 
                .withForwardedFieldsFirst ( "0" ) .withForwardedFieldsSecond ( "1")
                //The combined original path 
                .union (Paths)
                 // here groupBy two fields are going to go heavy use reduceGroup 
                .groupBy (0,1 ) 
                .reduceGroup ( new new GroupReduceFunction <Tuple2 <Long, Long>, Tuple2 <Long, Long >> () { 
                    @Override 
                    public  void the reduce (the Iterable < Tuple2 <Long, Long >> values, Collector <Tuple2 <Long, Long >> OUT) throws Exception { 
                        out.collect (values.iterator () Next ());. 
                    } 
                }) 
                .withForwardedFields ( "0;. 1" ) ; 

        // Comparative paths and the new generation nextPaths, acquiring multi-paths than the path nextPaths
         // can be seen from the above operators, nextPaths always greater than or equal paths 
        the DataSet <Tuple2 <Long,Long>> newPaths = paths
                .coGroup(nextPaths)
                .where(0).equalTo(0)
                .with(new CoGroupFunction<Tuple2<Long, Long>, Tuple2<Long, Long>, Tuple2<Long, Long>>() {
                    Set<Tuple2<Long, Long>> prevSet = new HashSet<>();
                    @Override
                    public void coGroup(Iterable<Tuple2<Long, Long>> prevPaths, Iterable<Tuple2<Long, Long>> nextPaths, Collector<Tuple2<Long, Long>> out) throws Exception {
                        for(Tuple2 <Long, Long>  prev:prevPaths){
                            prevSet.add(prev);
                        } 
                        // check if any new data generated, if there is to continue iteration, otherwise the iteration termination 
                        for (Tuple2 <Long, Long> the Next: nextPaths) {
                             IF ! ( PrevSet.contains (the Next)) { 
                               out.collect (Next); 
                            } 
                        } 
                    } 
                .}) withForwardedFieldsFirst ( . "0") withForwardedFieldsSecond ( "0" ); 

        // iteration tail, where a closed loop, nextPaths feedback channel, nextPaths data set is re-transferred to the the first iteration paths, and then continue to perform iterative operator.
        // newPaths is empty or the maximum number of iterations, the iteration end. newPaths here indicates whether there is a new path.
        // dataset iterative loop: paths-> nextPaths-> paths
        DataSet<Tuple2<Long, Long>> transitiveClosure = paths.closeWith(nextPaths, newPaths);

        // emit result
        if (params.has("output")) {
            transitiveClosure.writeAsCsv(params.get("output"), "\n", " ");

            // execute program explicitly, because file sinks are lazy
            env.execute("Transitive Closure Example");
        } else {
            System.out.println("Printing result to stdout. Use --output to specify output path.");
            transitiveClosure.print();
        }
    }
}

 

3, raw data

public class ConnectedComponentsData {
    public static final long[] VERTICES  = new long[] {
            1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16};

    public static DataSet<Long> getDefaultVertexDataSet(ExecutionEnvironment env) {
        List<Long> verticesList = new LinkedList<Long>();
        for (long vertexId : VERTICES) {
            verticesList.add(vertexId);
        }
        return env.fromCollection(verticesList);
    }

    public static final Object[][] EDGES = new Object[][] {
            new Object[]{1L, 2L},
            new Object[]{2L, 3L},
            new Object[]{2L, 4L},
            new Object[]{3L, 5L},
            new Object[]{6L, 7L},
            new Object[]{8L, 9L},
            new Object[]{8L, 10L},
            new Object[]{5L, 11L},
            new Object[]{11L, 12L},
            new Object[]{10L, 13L},
            new Object[]{9L, 14L},
            new Object[]{13L, 14L},
            new Object[]{1L, 15L},
            new Object[]{16L, 1L}
    };

    public static DataSet<Tuple2<Long, Long>> getDefaultEdgeDataSet(ExecutionEnvironment env) {

        List<Tuple2<Long, Long>> edgeList = new LinkedList<Tuple2<Long, Long>>();
        for (Object[] edge : EDGES) {
            edgeList.add(new Tuple2<Long, Long>((Long) edge[0], (Long) edge[1]));
        }
        returnenv.fromCollection (edge list); 
    } 

}

 

 

 

Guess you like

Origin www.cnblogs.com/asker009/p/11131069.html