Using RxJava to write an infinite stream of grouped events to rotating files

Sheinbergon :

I'm trying to achieve the following behavior:

  • Have a stream of events periodically polled/generated (short duration, say 1 second)
  • Events are then grouped according to some internal trait.
  • Each group of events is written to a matching file immediately (this is a crucial of the behavior I want to maintain)
  • Files are expected to be reused for matching groups (have the same key) on subsequent events until they are sealed/rotated
  • Upon a longer duration (say, 5 seconds) file are sealed/rotated and acted upon using additional subscribers

I wrote the following sample code to achieve the above behavior:


    private static final Integer EVENTS = 3;
    private static final Long SHORTER = 1L;
    private static final Long LONGER = 5L;
    private static final Long SLEEP = 100000L;

    public static void main(final String[] args) throws Exception {

        val files = new DualHashBidiMap<Integer, File>();

        Observable.just(EVENTS)
                .flatMap(num -> Observable.fromIterable(ThreadLocalRandom.current().ints(num).boxed().collect(Collectors.toList())))
                .groupBy(num -> Math.abs(num % 2))
                .repeatWhen(completed -> completed.delay(SHORTER, TimeUnit.SECONDS))
                .map(group -> {
                    val file = files.computeIfAbsent(group.getKey(), Unchecked.function(key -> File.createTempFile(String.format("%03d-", key), ".txt")));
                    group.map(Object::toString).toList().subscribe(lines -> FileUtils.writeLines(file, StandardCharsets.UTF_8.name(), lines, true));
                    return file;
                })
                .buffer(LONGER, TimeUnit.SECONDS)
                .flatMap(Observable::fromIterable)
                .distinct(File::getName)
                .doOnNext(files::removeValue)
                .doOnNext(file -> System.out.println("File - '" + file + "', Lines - " + FileUtils.readLines(file, StandardCharsets.UTF_8)))
                .subscribe();
        Thread.sleep(SLEEP);
    }

While it works as expected (set aside thread safety issue for map access for now, I'm using the bidi-map from commons-collections4 just for the sake of demonstration), I was wondering if there's a way to implement the above functionality in a pure RX form, without relying on external map access?

Note that it's crucial for the files to be written immediately upon on group creation, meaning we must make the file live beyond the scope of generated event groups

Thanks in advance.

TrogDor :

Interesting question.. I could be wrong, but I don't think you can avoid having a Map of Files somewhere in the pipeline.

I think my solution could be further cleaned up, but it seems to accomplish the following:

  • Removes the need for bidirectional mapping
  • Avoids the need to call Map.remove(...)

I'm proposing you treat the Map of Files being written as a distinct Observable, emitting a brand new Map at the slower interval:

    Observable<HashMap<Integer, File>> fileObservable = Observable.fromCallable(
                () -> new HashMap<Integer, File>() )
            .repeatWhen( completed -> completed.delay( LONGER, TimeUnit.SECONDS ));

Then in your event Observable, you can call .withLatestFrom( fileObservable, ( group, files ) -> {...} ) (note: this block is incomplete still):

    Observable.just( EVENTS )
        .flatMap( num -> Observable.fromIterable(
                ThreadLocalRandom.current().ints( num ).boxed().collect( Collectors.toList() )))
        .groupBy( num -> Math.abs( num % 2 ))
        .repeatWhen( completed -> completed.delay( SHORTER, TimeUnit.SECONDS ))
        .withLatestFrom( fileObservable, ( group, files ) -> {

            File file = files.computeIfAbsent(
                    group.getKey(),
                    Unchecked.function( key -> File.createTempFile( String.format( "%03d-", key ), ".txt" )));

            group.map( Object::toString ).toList()
                .subscribe( lines -> FileUtils.writeLines(file, StandardCharsets.UTF_8.name(), lines, true ));

            return files;
        } )

So far so good, you're getting your latest set of Files supplied alongside your events. Next you've got to process the Files. I think you can do that using distinctUntilChanged(). It should be pretty efficient since it will call HashMap.equals(...) under the covers and the Map object's identity isn't changing most of the time. HashMap.equals(...) first checks for same identity.

Since at this point you're really interested in processing the previous set of emitted Files rather than the current, you could use the .scan(( prev, current ) -> {...} ) operator. With that, here's the completed code block from above:

    Observable.just( EVENTS )
        .flatMap( num -> Observable.fromIterable(
                ThreadLocalRandom.current().ints( num ).boxed().collect( Collectors.toList() )))
        .groupBy( num -> Math.abs( num % 2 ))
        .repeatWhen( completed -> completed.delay( SHORTER, TimeUnit.SECONDS ))
        .withLatestFrom( fileObservable, ( group, files ) -> {

            File file = files.computeIfAbsent(
                    group.getKey(),
                    Unchecked.function( key -> File.createTempFile( String.format( "%03d-", key ), ".txt" )));

            group.map( Object::toString ).toList()
                .subscribe( lines -> FileUtils.writeLines(file, StandardCharsets.UTF_8.name(), lines, true ));

            return files;
        } )
        .distinctUntilChanged()
        .scan(( prev, current ) -> {

            Observable.fromIterable( prev.entrySet() )
                .map( Entry::getValue )
                .subscribe( file -> System.out.println( "File - '" + file + "', Lines - " +
                                FileUtils.readLines( file, StandardCharsets.UTF_8 )));

            return current;
        } )
        .subscribe();

    Thread.sleep( SLEEP );

A little lengthier than your original solution, but might solve a couple of issues.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=299520&siteId=1