获取Bt种子的HashInfo

如何获得种子（.torrent）的HashInfo

无论是利用种子进行下载，还是种子转链接的服务，都离不开HashInfo。

在此之前需要对种子的编码格式，Bencoding，有一定的了解，详细的资料自行搜索，推荐文章：BT是怎么下载的。

这里简单说一下Bencoding。

String：字符串由数字加冒号开头，例如5:nodes，表示一个字符串nodes，5则是指冒号后的字符串长度，这样可以唯一确定一个字符串。

List：列表由小写 “l”开头，以“e”结尾，列表中又可以包含其他的格式，例如：l13:www.baidu.com15:www.sina.com.cne，表示{“www.baidu.com”,” www.sina.com.cn”}

Integer：整形由“i”开头，以“e”结尾，例如：i12345e，表示一个整数12345

Dict：字典由“d”开头。以“e”结尾，相邻的两个数据表示一组键值对，其中，键必须要是String类型，值可以是其他类型，例如d3:urll13:www.baidu.com15:www.sina.com.cnee，表示一个含一组键值对的字典，{“url”: [“www.baidu.com”,” www.sina.com.cn”]}

说回正题，如何由Bt种子解析出特征码HashInfo。

将种子中”4:info”字段后，到”5:nodes”前的字节提取出来，做一个HEX SHA-1即可，第一次写还是踩了一个坑，最初为了图简单，我将整个文件读取为byte[]，并转换为String，然后使用正则表达式匹配，4:info([\s\S]+)5:nodes ，将结果再转换成byte，我确定匹配的结果是正确的，但这里有一个编码问题，现在也没弄清楚是怎么回事，即便byte[]转String和String转byte[]采用的是同一套编码集，最终拿到的byte[]去做SHA-1的结果就是不正确。

最后还是采用了直接比较byte的方式去定位4:info和5:nodes，实际测试还没发现有什么问题，但从代码上看有需要改进的地方，例如若是该种子文件没有5:nodes项，则会抛出异常，另外，匹配的算法可以参照KMP算法进行改进。

package bEncoder;

import java.io.*;
import java.security.MessageDigest;

public class BInfoEncoder {

    //转十六进制所需要的对照数组
    private static final char[] HEX_CHAR = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'};

    /**
     * 获取某文件的hashinfo，结果为小写，且未作文件格式检查和文件大小控制
     * @param file  目标文件
     * @return  若为.torrent则返回hashinfo
     */
    public static String getInfoSHA(File file){
        return getByteArraySHA1(getInfoBytes(file));
    }

    /**
     * 获取种子文件中4:info到5:nodes之间的字节
     * @param file  目标文件
     * @return  byte[] 有效字节数组
     */
    private static byte[] getInfoBytes(File file){
        FileInputStream fis;
        try {
            fis = new FileInputStream(file);
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            byte[] buffer = new byte[512];
            int len;
            while ((len = fis.read(buffer)) != -1)
                bos.write(buffer, 0, len);
            fis.close();
            bos.close();
            int start = 0;
            int end = 0;
            byte[] byt = bos.toByteArray();
            for(int i=0 ;i<byt.length ;i++){
                if(start == 0){
                    if(checkIsBeginningOfInfo(byt ,i))
                        start = i;
                }
                if(end == 0){
                    if(checkIsBeginningOfNodes(byt ,i))
                        end = i;
                }
            }
            if(start == 0 || end == 0 || end - start < 0)
                throw new IOException("can not find start index of '4:info' or '5:nodes'");
            int copyLen = end - start - 6;
            byte[] result = new byte[copyLen];
            System.arraycopy(byt ,start+6 ,result , 0 ,copyLen);
            return result;
        }catch (Exception e){
            e.printStackTrace();
            return null;
        }
    }

    /**
     * 该算法和下方检查nodes的算法都不是最优的匹配算法<br>可以参照字符串快速匹配算法进行优化
     * @param byt   待检查数组
     * @param i 当前检查索引
     * @return  以该索引开头的接下来是否为4:info
     */
    private static boolean checkIsBeginningOfInfo(byte[] byt ,int i){
        return byt[i] == 52 && byt[i+1] == 58 && byt[i+2] == 105 && byt[i+3] == 110 && byt[i+4] == 102 && byt[i+5] == 111;
    }

    /**
     * @param byt   待检查数组
     * @param i 当前检查索引
     * @return  以该索引开头的接下来是否为5:node
     */
    private static boolean checkIsBeginningOfNodes(byte[] byt ,int i){
        return byt[i] == 53 && byt[i+1] == 58 && byt[i+2] == 110 && byt[i+3] == 111 && byt[i+4] == 100 && byt[i+5] == 101;
    }

    /**
     * 对输入字节数组进行SHA-1，再转换为十六进制
     * <br>若使用commons-codec，则代码只有最下方被注释掉的一行
     * @param input 输入字节数组
     * @return  十六进制小写SHA-1
     */
    private static String getByteArraySHA1(byte[] input) {
        try {
            MessageDigest sha_1 = MessageDigest.getInstance("SHA1");
            byte[] dig = sha_1.digest(input);
            char[] dig_ch = new char[dig.length*2];
            int index = 0;
            int temp = 0;
            for(byte b : dig){
                if(b < 0)
                    temp = b + 256;
                else
                    temp = b;
                dig_ch[index++] = HEX_CHAR[temp / 16];
                dig_ch[index++] = HEX_CHAR[temp % 16];
            }
            return new String(dig_ch);
        }catch (Exception e) {
            e.printStackTrace();
            return "";
        }
//        return DigestUtils.sha1Hex(input);
    }
}

获取Bt种子的HashInfo

猜你喜欢