Web page source code
Open the Web page, press the shortcut keys [Ctrl + U] open source page
HTML
HTML is the structure of the entire web, which is equivalent framework of the entire site. With "<", ">" symbols are part of the HTML tags, and the tags are in pairs
Common label as follows:
<html>..</html> 表示标记中间的元素是网页
<body>..</body> 表示用户可见的内容
<div>..</div> 表示框架
<p>..</p> 表示段落
<li>..</li>表示列表
<img>..</img>表示图片
<h1>..</h1>表示标题
<a href="">..</a>表示超链接
HTML
HTML example
local hyperlink can be a relative path or an absolute path.
Pictures of address can be a relative path or an absolute path.
<html>
<head>
<title>这是HTML测试页面的主题</title>
</head>
<body>
<div>
<h1>这是标题</h1>
<p>这是正文</p>
</div>
<div>
<ul>
<li>这是一个列表</li>
<li><a href='https://www.dytt8.net/index0.html'>这是一个网络超链接</a></li>
<li><a href='1.html'>这是一个本地超链接</a></li>
<li>下面这个是一张图片</li>
<img src="20120830173930_PBfJE.jpeg" alt="如果图像无法显示,将显示这个" />
</ul>
</div>
</body>
</html>
Enter the code, save a Notepad, and then modify the file name and extension name "HTML.html", the following results:
The legitimacy of reptiles
Each site has a document called robots.txt, of course, there are some sites not set robots.txt. For there is no set robots.txt site can not obtain passwords encrypted data through web crawler, which is the site of all the data pages can be crawled. If the site have a robots.txt file, it is necessary to determine whether there is data acquired prohibit visitors.
Allow access to some part of its path reptiles, and for not allowed, the total ban crawling