一、原理
- 读取字典文件 拼接URL
- 发送HTTP GET请求URL
- 判断返回状态码信息,输出存在的目录
二、字典文件读取
with open(“filename.txt”,”r”) as f:
#使用with .. as ..会自动关闭文件流;使用f = open(“filename.txt”,”r”)
#需要close关闭文件流。
#至于为什么非要关闭io流:https://www.zhihu.com/question/46263042
.
f.readline() #每次读取1行
f.readlines() #一次读取完
f.read(X) #读取X个字节
结果:
代码:
f = open("dir.txt","r")
for line in f.readlines():
print(line.strip())
#strip()函数是去除字符串开头或者结尾的字符(默认为空格或者换行符)
f.close()
补充一(读取文件测试):
f = open("dir.txt","r")
line1 = f.readline()
print(line1)
line2 = f.read(2)
print(line2)
line = f.readlines()
print(line)
f.close()
补充二(文件写测试):
覆盖写:
f1 = open("dir.txt","w")
f1.write('ddd')
f1.close()
f2 = open("dir.txt") #不加操作,默认是读取
for line in f2.readlines():
print(line.strip())
f2.close()
追加写:
f1 = open("dir.txt","a")
f1.write('eee')
f1.close()
f2 = open("dir.txt") #不加操作,默认是读取
for line in f2.readlines():
print(line.strip())
f2.close()
补充三(with…as…结构):
with open("dir.txt","r") as f:
for line in f.readlines():
print(line.strip())
三、工具编写:
思路:
- 读取字典内容
- HTTP GET请求
- 参数优化
结果:
import requests
url = "http://127.0.0.1/" #最后的“ / ”不要忘记添加
with open("dir.txt","r") as f:
for line in f.readlines():
line = line.strip()
r = requests.get(url+line) #也可以这么写 r = requests.get(url = (url+line))
if r.status_code == 200:
print("url: " + r.url + " 存在")
四、优化工具
参数优化之从命令行获取参数:
此时,已经可以做到目录扫描。但是这样的目录扫描,所支持的网址是死的。
合格的目录扫描应该是直接获取用户传递的网址进行扫描。
import requests
import sys
url = sys.argv[1] #获取外界传参,详细见:https://www.cnblogs.com/liangmingshen/p/8906148.html
with open("dir.txt","r") as f:
for line in f.readlines():
line = line.strip()
r = requests.get(url+"/"+line) #注意,这里新增了“ / ”
if r.status_code == 200:
print("url:" + r.url + " 存在")
在pycharm文件中固定参数:
。
。
import requests
import sys
url = sys.argv[1]
dic = sys.argv[2]
with open(dic,"r") as f:
for line in f.readlines():
line = line.strip()
r = requests.get(url+"/"+line) #注意,这里新增了“ / ”
if r.status_code == 200:
print("url:" + r.url + " 存在")
自定义User-Agent
直接将之前所学的拿过来就可以了。
import requests
url = "https://www.baidu.com"
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:86.0) Gecko/20100101 Firefox/86.0'}
r = requests.get(url,headers=headers)
print(r.request.headers)