JAVA 获取网站资源

在工作中可能会遇到去某某网站上抓取相应数据的需求,有2种简单的工具可以使用:httpclient和Jsoup。
依赖:
httpclient:

	<dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.1</version>
        </dependency>

jsoup:

	<dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.8.3</version>
        </dependency>

一、httpclient发送请求获取

自定义了一个工具类,具体用法如下:

public class HttpClientUtil {

	/**
	 * get请求
	 * 
	 * @return
	 */
	public static String doGet(String url) {
		// 初始化一个httpclient
		CloseableHttpClient httpClient = HttpClients.createDefault();
		try {
			// 发送get请求
			HttpGet httpGet = new HttpGet(url);
			CloseableHttpResponse response = httpClient.execute(httpGet);
			/** 请求发送成功,并得到响应 **/
			if (response.getStatusLine().getStatusCode() == HttpStatus.SC_OK) {
				/** 读取服务器返回过来的json字符串数据 **/
				// 4.处理结果,这里将结果返回为字符串
				String result = null;
				HttpEntity entity = response.getEntity();
				if (entity != null) {
					result = EntityUtils.toString(entity);
				}
				return result;
			}
		} catch (IOException e) {
			e.printStackTrace();
			System.out.println("调用get请求出错:" + url);
		} finally {
			try {
				httpClient.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		return null;
	}

	/**
	 * post请求(用于key-value格式的参数)
	 * 
	 * @param url
	 * @param params
	 * @return
	 */
	public static String doPost(String url, Map<String, Object> params) {
		BufferedReader in = null;
		// 初始化一个httpclient
		CloseableHttpClient httpClient = HttpClients.createDefault();
		try {
			// 实例化HTTP方法
			HttpPost request = new HttpPost();
			request.setURI(new URI(url));
			// 设置参数
			List<NameValuePair> nvps = new ArrayList<NameValuePair>();
			for (Iterator<String> iterator = params.keySet().iterator(); iterator.hasNext();) {
				String name = (String) iterator.next();
				String value = String.valueOf(params.get(name));
				nvps.add(new BasicNameValuePair(name, value));
			}
			request.setEntity(new UrlEncodedFormEntity(nvps, "UTF-8"));
			CloseableHttpResponse response = httpClient.execute(request);
			int code = response.getStatusLine().getStatusCode();
			if (code == 200) { // 请求成功
				in = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "utf-8"));
				StringBuffer sb = new StringBuffer("");
				String line = "";
				String separator = System.getProperty("line.separator");
				while ((line = in.readLine()) != null) {
					sb.append(line + separator);
				}
				in.close();
				return sb.toString();
			} else {// 请求失败
				System.out.println("状态码:" + code);
				return null;
			}
		} catch (Exception e) {
			e.printStackTrace();
			return null;
		} finally {
			try {
				httpClient.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}

	/**
	 * post请求(用于请求json格式的参数)
	 * 
	 * @param url
	 * @param params
	 * @return
	 */
	public static String doPost(String url, String params) throws Exception {
		CloseableHttpClient httpclient = HttpClients.createDefault();
		HttpPost httpPost = new HttpPost(url);// 创建httpPost
		httpPost.setHeader("Accept", "application/json");
		httpPost.setHeader("Content-Type", "application/json");
		StringEntity entity = new StringEntity(params, "UTF-8");
		httpPost.setEntity(entity);
		CloseableHttpResponse response = null;
		try {
			response = httpclient.execute(httpPost);
			StatusLine status = response.getStatusLine();
			int state = status.getStatusCode();
			if (state == HttpStatus.SC_OK) {
				HttpEntity responseEntity = response.getEntity();
				String jsonString = EntityUtils.toString(responseEntity);
				return jsonString;
			} else {
				System.out.println("请求返回:" + state + "(" + url + ")");
			}
		} finally {
			if (response != null) {
				try {
					response.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
			try {
				httpclient.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		return null;
	}

}

使用时只需要调用相应的方法,传入参数即可得到String类型的返回值(json格式),可以通过JSON工具来获取需要的内容。如:

String url = "www.baidu.com";
String res = HttpClientUtil.doGet(url);
JSONObject data= JSONObject.parseObject(res);
String value = data.getString("value");

使用Jsoup获取页面内容

使用jsoup可以将整个页面获取过来,找到自己要拿的资源位置,jsoup可以根据页面上的标签或者id等等来获取里面的内容。

        String url = "www.baidu.com";
        Document doc = Jsoup.connect(url).get();
        //根据id获取
        Element e = doc.getElementById("id");
        //根据标签获取
        Elements tag = e.getElementsByTag("tagName");
        //根据class获取
        Elements classes = e.getElementsByClass("className");
        //根据attribute获取
        Elements attribute = e.getElementsByAttribute("key");

关于jsoup的其他用法,可以参考:
http://www.open-open.com/jsoup/

猜你喜欢

转载自blog.csdn.net/coolwindd/article/details/84306607