java爬虫技术(IP代理)

爬虫是一门很重要的技术,在数据爬取的过程,IP需要经常变更,防备被爬取网站forbidden。本文主要介绍如何适用api获取代理ip,进行数据抓取。

下面的demo中代理ip来自于服务商ipidea,其他服务商使用方法基本类似。

(1)注册账号

请在服务商http://sem.ipidea.net/ 网站注册账号,并认证。

(2)根据要求添加IP白名单(自己服务器的公网IP)

http://sem.ipidea.net/getapi/

(3)获取 IP和端口

获取到一个IP和端口

(4)将得到的ip和port更换到demo 里,并执行。

https://mvnrepository.com/ mvn仓库

需要引用的jar包:httpcore5-5.1.jar,httpclient5-5.0.3.jar

package com.game.test;

import org.apache.hc.client5.http.classic.methods.HttpGet;
import org.apache.hc.client5.http.config.RequestConfig;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.CloseableHttpResponse;
import org.apache.hc.client5.http.impl.classic.HttpClientBuilder;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHost;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;

import java.io.IOException;
import java.nio.charset.StandardCharsets;

/**
 * Create by ipidea on 2021/2/6
 * <p>
 * 依赖 compile 'org.apache.httpcomponents.client5:httpclient5:5.0.3'
 *
 * @see <a href="http://hc.apache.org/httpcomponents-client-5.0.x/httpclient5/dependency-info.html">httpcomponents</a>
 */
class HttpProxy {
    public static void httpProxy() {
        HttpGet request = new HttpGet("http://httpbin.org/get");
        RequestConfig requestConfig = RequestConfig.custom()
                .setProxy(new HttpHost("58.218.205.47", 13706))
                .build();
        request.setConfig(requestConfig);

        try {
            CloseableHttpClient httpClient = HttpClientBuilder.create()
                    .disableRedirectHandling()
                    .build();
            CloseableHttpResponse response = httpClient.execute(request);

            // Get HttpResponse Status
            System.out.println(response.getVersion());
            System.out.println(response.getCode());
            System.out.println(response.getReasonPhrase());

            HttpEntity entity = response.getEntity();
            if (entity != null) {
                // return it as a String
                String result = EntityUtils.toString(entity, StandardCharsets.UTF_8);
                System.out.println(result);
            }

        } catch (ParseException | IOException e) {
            e.printStackTrace();
        }
    }


    public static void main(String[] args) {
        httpProxy();
    }
}

运行结果显示是一个新的代理IP

猜你喜欢

转载自blog.csdn.net/u011628753/article/details/116026914
今日推荐