Java print unicode glitch

Ian Rehwinkel :

I am currently writing a program to read java class files. At the moment, I am reading the Constant-Pool of the class file (read here) and printing it to console. But when It get's printed, some of the unicode seems to mess up my terminal in such a way, that It looks like this (in case it matters, the class-file i'm reading is a compiled from Kotlin, and the Terminal I am using is the IntelliJ IDEA terminal, though it doesn't seem to glitch out when using the regular Ubuntu terminal.): Messed up terminal on IntelliJ IDEA The thing I noticed is a weird Unicode-Sequence which might be some kind of escape-sequence, I think.

Here is the entire output without the strange unicode sequence:

{1=UTF8: (42)'deerangle/decompiler/main/DecompilerMainKt', 2=Class index: 1, 3=UTF8: (16)'java/lang/Object', 4=Class index: 3, 5=UTF8: (4)'main', 6=UTF8: (22)'([Ljava/lang/String;)V', 7=UTF8: (35)'Lorg/jetbrains/annotations/NotNull;', 8=UTF8: (4)'args', 9=String index: 8, 10=UTF8: (30)'kotlin/jvm/internal/Intrinsics', 11=Class index: 10, 12=UTF8: (23)'checkParameterIsNotNull', 13=UTF8: (39)'(Ljava/lang/Object;Ljava/lang/String;)V', 14=Method name index: 12; Type descriptor index: 13, 15=Bootstrap method attribute index: 11; NameType index: 14, 16=UTF8: (12)'java/io/File', 17=Class index: 16, 18=UTF8: (6)'<init>', 19=UTF8: (21)'(Ljava/lang/String;)V', 20=Method name index: 18; Type descriptor index: 19, 21=Bootstrap method attribute index: 17; NameType index: 20, 22=UTF8: (15)'getAbsolutePath', 23=UTF8: (20)'()Ljava/lang/String;', 24=Method name index: 22; Type descriptor index: 23, 25=Bootstrap method attribute index: 17; NameType index: 24, 26=UTF8: (16)'java/lang/System', 27=Class index: 26, 28=UTF8: (3)'out', 29=UTF8: (21)'Ljava/io/PrintStream;', 30=Method name index: 28; Type descriptor index: 29, 31=Bootstrap method attribute index: 27; NameType index: 30, 32=UTF8: (19)'java/io/PrintStream', 33=Class index: 32, 34=UTF8: (5)'print', 35=UTF8: (21)'(Ljava/lang/Object;)V', 36=Method name index: 34; Type descriptor index: 35, 37=Bootstrap method attribute index: 33; NameType index: 36, 38=UTF8: (19)'[Ljava/lang/String;', 39=Class index: 38, 40=UTF8: (17)'Lkotlin/Metadata;', 41=UTF8: (2)'mv', 42=Int: 1, 43=Int: 11, 44=UTF8: (2)'bv', 45=Int: 0, 46=Int: 2, 47=UTF8: (1)'k', 48=UTF8: (2)'d1', 49=UTF8: (58)'WEIRD_UNICODE_SEQUENCE', 50=UTF8: (2)'d2', 51=UTF8: (0)'', 52=UTF8: (10)'Decompiler', 53=UTF8: (17)'DecompilerMain.kt', 54=UTF8: (4)'Code', 55=UTF8: (18)'LocalVariableTable', 56=UTF8: (15)'LineNumberTable', 57=UTF8: (13)'StackMapTable', 58=UTF8: (36)'RuntimeInvisibleParameterAnnotations', 59=UTF8: (10)'SourceFile', 60=UTF8: (20)'SourceDebugExtension', 61=UTF8: (25)'RuntimeVisibleAnnotations'}
AccessFlags: {ACC_PUBLIC, ACC_FINAL, ACC_SUPER}

And here is the Unicode-Sequence opened in Sublime Text: Strange unicode in sublime text

My Questions about this whole thing are: Why is this Unicode breaking the console in IntelliJ IDEA, is this common in Kotlin-Class-Files, and what could one do to remove all such "escape sequences" from a String before printing it?

Holger :

IntelliJ’s console most likely interprets certain characters of the string as control characters (compare to Colorize console output in Intellij products).

Most likely, it will be an ANSI terminal emulation, which you can verify easily by executing

System.out.println("Hello "
    + "\33[31mc\33[32mo\33[33ml\33[34mo\33[35mr\33[36me\33[37md"
    + " \33[30mtext");

If you see this text printed using different colors, it’s an ANSI terminal compatible interpretation.

But it’s always a good idea to remove control characters when printing strings from an unknown source. The string constants from a class file are not required to have human readable content.

A simple way to do this, is

System.out.println(string.replaceAll("\\p{IsControl}", "."));

which will replace all control characters with a dot before printing.

If you want to get some diagnostic regarding the actual char value, you could use, e.g.

System.out.println(Pattern.compile("\\p{IsControl}").matcher(string)
    .replaceAll(mr -> String.format("{%02X}", (int)string.charAt(mr.start()))));

This requires Java 9, but of course, the same logic can be implemented for earlier Java version as well. It would only require a bit more verbose code.

The Pattern instance returned by Pattern.compile("\\p{IsControl}") can be stored and reused.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=85112&siteId=1