Watij，配合JRuby 截取网易公开课下载列表

网易公开课例有许多翻译的公开课内容，可是我家里使用有线通，看在线总是卡卡的，于是就想着将公开课下载到本地。网易提供了下载，不过一个课程20多集课，一个一个手动，太麻烦，就想使用jruby是不是可以将某节课批量将下载地址截取下面。

原本以为使用open_uri加上hpricot解析html就可以批量获取url，不过一分析下载页面，发现html的下载地址是通过js动态写入的，无法通过解析html获得，通过搜索找到了JRuby下的一款浏览器自动化测试工具Watij。

Watij的作用：

Watij作用和Ruby下大名鼎鼎的工具Watir一样，使用Java开发，包含一套JRuby的接口，非常好用。支持模拟IE、Firefox，通过它可以自动使用浏览器做很多事，获取下载地址简直是小事一桩。

Watij官网地址：http://watij.com/webspec-api/ 包含基本的使用接口和方法。

下载公开课的相关代码（代码属于“即用即抛”脚本，仅供参考）：

1.获取公开课的下载地址列表：

classurl="http://download.v.163.com/dl/open/00DL0QDR0QDS0QeB.html"

WebSpec.debug false
WebSpec.silent_mode true


spec = WebSpec.new.ie
spec.open classurl
spec.pause(1000)
File.open("link.html","w") do |f|
  tag=spec.jquery("#download ul li:first-child a")
  0.upto(tag.all.length-1) do |i|
    f.puts(tag.at(i).get("href"))
  end
end
puts "finished"
spec.closeAll()

2.获取每堂课的介绍文字

classurl="http://download.v.163.com/dl/open/00DL0QDR0QDS0QeB.html"

WebSpec.debug false
WebSpec.silent_mode true

spec = WebSpec.new.ie
spec.open classurl
spec.pause(1000)
File.open("health_library.txt","w") do |f|
  classnametag=spec.jquery("#h1title")
  classname=classnametag.innerText
  if classname=~/《(.+?)》/
	classname=$1
  end
  f.puts ("Course Name: "+classname)
  tag=spec.jquery("#download .k1")
  0.upto(tag.all.length-1) do |i|
    f.puts(tag.at(i).innerText)
  end
end
puts "finished"
spec.closeAll()

OK，写完了，。

Watij，配合JRuby 截取网易公开课下载列表

猜你喜欢