Python 入门教程笔记(附爬虫案例源码讲解)

企业开发 2025-04-08 12:57:39 阅读次数: 0

Python 入门教程笔记(附爬虫案例源码讲解)

下面是一个全面的 Python 教程，适合初学者和有一定编程经验的人士。Python 是一种高级编程语言，以其简洁和易读性而闻名，广泛应用于 Web 开发、数据分析、人工智能等多个领域。另外这里准备了一些参考资料链接：windows环境Python开发环境搭建指南(附实例源码和讲解教程)、简易服务端客户端下载、简易python-flask数据库服务器vue网页客户端下载、笨办法学 Python上下两册、一小时Python入门、Python十分钟入门教程、简明 Python 教程、byte-of-python、python经典实例代码汇总+100例、Python编程从入门到精通、Python笔试题汇总、python面试题。

1. 简介

定义：Python 是一种解释型、面向对象、动态数据类型的高级编程语言。
用途：
- Web 开发（如 Django 和 Flask 框架）。
- 数据分析（如 Pandas 和 NumPy 库）。
- 机器学习和人工智能（如 TensorFlow 和 PyTorch 库）。
- 自动化脚本。
- 科学计算。
特点：
- 语法简洁清晰。
- 丰富的标准库和第三方库。
- 跨平台支持（Windows、Linux、macOS）。
- 强大的社区支持。

2. 安装 Python

在 Windows 上安装

访问 Python 官方网站下载最新版本的 Python 安装包。
运行下载的 .exe 文件并按照提示进行安装。
勾选“Add Python to PATH”选项以自动配置环境变量。

在 macOS 上安装

使用 Homebrew 安装 Python：
```
brew install python
```

在 Linux 上安装

使用包管理器安装 Python：

sudo apt-get update
sudo apt-get install python3

3. 第一个 Python 程序

创建项目目录

创建一个新的目录用于存放你的 Python 项目，例如 myproject。

编写第一个程序

在 myproject 目录下创建一个名为 hello.py 的文件。
编辑 hello.py 文件，添加以下内容：
```
print("Hello, World!")
```

运行程序

打开终端或命令提示符，导航到 myproject 文件夹。
运行以下命令执行程序：
```
python hello.py
```
你应该会看到输出 Hello, World!。

4. Python 基础语法

注释

单行注释使用 #。
多行注释使用三引号 ''' ... ''' 或 """ ... """。

# 这是单行注释
"""
这是多行注释
可以跨越多行
"""

变量

变量不需要显式声明类型。
支持动态类型。

a = 42
b = 3.14
c = True
d = "Hello, World!"

数据类型

基本类型：int, float, bool, str。
复合类型：list, tuple, dict, set。

my_int = 42
my_float = 3.14
my_bool = True
my_str = "Hello, World!"

my_list = [1, 2, 3]
my_tuple = (1, 2, 3)
my_dict = {
    
    "one": 1, "two": 2}
my_set = {
    
    1, 2, 3}

字符串

使用单引号 ' ' 或双引号 " " 定义字符串。
支持多行字符串（三引号 ''' ... ''' 或 """ ... """）。

s1 = 'Hello, World!'
s2 = "This is a string."
s3 = '''This is a
multi-line string.'''

列表和元组

列表是可变的。
元组是不可变的。

# 列表
my_list = [1, 2, 3]
print(my_list)  # 输出: [1, 2, 3]

# 元组
my_tuple = (1, 2, 3)
print(my_tuple)  # 输出: (1, 2, 3)

# 添加元素到列表
my_list.append(4)
print(my_list)  # 输出: [1, 2, 3, 4]

控制结构

条件语句

if...elif...else 语句

age = 18

if age >= 18:
    print("You are an adult.")
elif age >= 13:
    print("You are a teenager.")
else:
    print("You are a child.")

循环

for 循环

for i in range(5):
    print(i)  # 输出: 0 1 2 3 4

while 循环

i = 0
while i < 5:
    print(i)  # 输出: 0 1 2 3 4
    i += 1

遍历列表

my_list = [1, 2, 3]
for item in my_list:
    print(item)  # 输出: 1 2 3

5. 函数

定义函数

使用 def 关键字定义函数。

def greet(name):
    return f"Hello, {
      
      name}!"

print(greet("Alice"))  # 输出: Hello, Alice!

默认参数

函数可以有默认参数值。

def greet(name, greeting="Hello"):
    return f"{
      
      greeting}, {
      
      name}!"

print(greet("Alice"))         # 输出: Hello, Alice!
print(greet("Bob", "Hi there"))  # 输出: Hi there, Bob!

可变参数

使用 *args 表示可变位置参数。
使用 **kwargs 表示可变关键字参数。

def sum(*args):
    total = 0
    for num in args:
        total += num
    return total

print(sum(1, 2, 3, 4))  # 输出: 10

def display_info(**kwargs):
    for key, value in kwargs.items():
        print(f"{
      
      key}: {
      
      value}")

display_info(name="Alice", age=30)  # 输出: name: Alice, age: 30

6. 类和对象

定义类

使用 class 关键字定义类。

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def say_hello(self):
        return f"Hello, my name is {
      
      self.name} and I am {
      
      self.age} years old."

p = Person("Alice", 30)
print(p.say_hello())  # 输出: Hello, my name is Alice and I am 30 years old.

继承

使用 class 关键字定义子类，并使用 super() 调用父类的方法。

class Student(Person):
    def __init__(self, name, age, grade):
        super().__init__(name, age)
        self.grade = grade

    def say_hello(self):
        return f"Hello, I'm a student named {
      
      self.name} and I am {
      
      self.age} years old, in grade {
      
      self.grade}."

s = Student("Bob", 20, "A")
print(s.say_hello())  # 输出: Hello, I'm a student named Bob and I am 20 years old, in grade A.

7. 文件操作

读取文件

使用 open 函数打开文件，并使用 read 方法读取内容。

with open("example.txt", "r") as file:
    content = file.read()
    print(content)

写入文件

使用 open 函数打开文件，并使用 write 方法写入内容。

with open("example.txt", "w") as file:
    file.write("This is some text.")

追加内容

使用 open 函数以追加模式打开文件，并使用 write 方法写入内容。

with open("example.txt", "a") as file:
    file.write("\nThis is additional text.")

8. 异常处理

捕获异常

使用 try...except 语句捕获和处理异常。

try:
    result = 10 / 0
except ZeroDivisionError:
    print("Cannot divide by zero.")

多个异常

使用多个 except 子句处理不同类型的异常。

try:
    result = 10 / 0
except ZeroDivisionError:
    print("Cannot divide by zero.")
except TypeError:
    print("Invalid data type.")

抛出异常

使用 raise 关键字抛出异常。

def divide(a, b):
    if b == 0:
        raise ValueError("Cannot divide by zero.")
    return a / b

try:
    result = divide(10, 0)
except ValueError as e:
    print(e)  # 输出: Cannot divide by zero.

9. 标准库

数学运算

使用 math 模块进行数学运算。

import math

print(math.sqrt(16))  # 输出: 4.0
print(math.pi)         # 输出: 3.141592653589793

时间和日期

使用 datetime 模块处理时间和日期。

from datetime import datetime

now = datetime.now()
print(now)  # 输出当前时间

随机数

使用 random 模块生成随机数。

import random

print(random.randint(1, 10))  # 输出: 1 到 10 之间的随机整数

10. 第三方库

安装第三方库

使用 pip 工具安装第三方库。

pip install numpy

使用第三方库

导入并使用第三方库。

import numpy as np

arr = np.array([1, 2, 3])
print(arr)  # 输出: [1 2 3]

11. 虚拟环境

创建虚拟环境

使用 venv 模块创建虚拟环境。

python -m venv myenv

激活虚拟环境

在 Windows 上激活虚拟环境：
```
myenv\Scripts\activate
```
在 macOS 和 Linux 上激活虚拟环境：
```
source myenv/bin/activate
```

退出虚拟环境

使用 deactivate 命令退出虚拟环境。

deactivate

12. python爬虫实例及讲解

下面是一个简单的 Python 爬虫示例，我们将使用 requests 库来获取网页内容，并使用 BeautifulSoup 库来解析 HTML。这个示例将从一个网站抓取一些数据并打印出来。假设我们要从一个新闻网站（如 https://news.ycombinator.com/）抓取最新的新闻标题和链接。

12.1. 安装必要的库

首先，你需要安装 requests 和 beautifulsoup4 库。你可以使用 pip 来安装这些库：

pip install requests beautifulsoup4

12.2. 编写爬虫代码

创建一个新的 Python 文件，例如 scraper.py，然后添加以下代码：

import requests
from bs4 import BeautifulSoup

# 目标 URL
url = 'https://news.ycombinator.com/'

# 发送 HTTP 请求
response = requests.get(url)

# 检查请求是否成功
if response.status_code == 200:
    # 解析 HTML 内容
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 查找所有的新闻条目
    news_items = soup.find_all('span', class_='titlelink')
    
    # 打印新闻标题和链接
    for item in news_items:
        title = item.get_text()
        link = item.find('a')['href']
        print(f'Title: {
      
      title}')
        print(f'Link: {
      
      link}\n')
else:
    print(f'Failed to retrieve the webpage. Status code: {
      
      response.status_code}')

12.3. 代码讲解

导入库

import requests
from bs4 import BeautifulSoup

requests 用于发送 HTTP 请求。
BeautifulSoup 用于解析 HTML 文档。

设置目标 URL

url = 'https://news.ycombinator.com/'

这里我们设置要爬取的网站 URL。

发送 HTTP 请求

response = requests.get(url)

使用 requests.get 方法发送 GET 请求到指定的 URL，并获取响应。

检查请求是否成功

if response.status_code == 200:

检查响应的状态码是否为 200，表示请求成功。

解析 HTML 内容

soup = BeautifulSoup(response.text, 'html.parser')

使用 BeautifulSoup 解析 HTML 文档。response.text 是响应的内容，'html.parser' 是解析器类型。

查找新闻条目

news_items = soup.find_all('span', class_='titlelink')

使用 find_all 方法查找所有具有特定类名 titlelink 的 <span> 标签。这些标签通常包含新闻标题和链接。

打印新闻标题和链接

for item in news_items:
    title = item.get_text()
    link = item.find('a')['href']
    print(f'Title: {
      
      title}')
    print(f'Link: {
      
      link}\n')

遍历找到的新闻条目。
get_text() 方法获取文本内容（即新闻标题）。
find('a')['href'] 获取 <a> 标签中的 href 属性值（即新闻链接）。
打印新闻标题和链接。

12.4. 运行爬虫

在终端或命令提示符中运行你的爬虫脚本：

python scraper.py

你应该会看到类似以下的输出：

Title: Example News Title 1
Link: https://example.com/news1

Title: Example News Title 2
Link: https://example.com/news2

...

12.5. 注意事项

合法性：确保你有权爬取该网站的数据。查看网站的 robots.txt 文件和使用条款。
频率控制：不要频繁请求同一个网站，以免对服务器造成负担。可以使用 time.sleep 控制请求频率。
错误处理：增加更多的错误处理逻辑，以应对网络问题或其他异常情况。

通过这个简单的示例，你可以开始编写自己的爬虫来抓取网页上的信息。随着经验的积累，你可以尝试更复杂的任务，比如登录网站、处理分页等。

总结

以上是一个全面的 Python 入门教程，涵盖了从基础语法到类和对象、文件操作、异常处理、标准库和第三方库的基本步骤。通过这些基础知识，你可以开始编写简单的 Python 程序，并进一步探索更复杂的功能和创意。

猜你喜欢

转载自blog.csdn.net/ashyyyy/article/details/142681290

Python 入门教程笔记(附爬虫案例源码讲解)

python | 爬虫笔记 - （八）Scrapy入门教程

Python爬虫入门教程：初识爬虫

python很全的爬虫入门教程

Scrapy入门教程 python 爬虫

Python爬虫入门教程导航帖

【Python爬虫】Python爬虫入门教程&注意事项

还不知道python怎样入门？看看这篇Python新手入门教程(附源码及详解)

python入门教程

Python 入门教程

基于python2.7的爬虫入门教程

Python爬虫入门教程：CSDN学院课程数据抓取

Python爬虫入门教程： 27270图片爬取

Python爬虫入门教程：爬取妹子图网站

Python爬虫入门教程二：爬取静态网页

Python爬虫入门教程一：环境准备

Python爬虫PyQuery库基本用法入门教程

python爬虫入门教程--快速理解HTTP协议（一）

Python爬虫入门教程，小白也能轻松学好

爬虫的概述及简单实践练习|python入门教程

分享Python7个爬虫小案例（附源码）

Python爬虫实战案例(四)附源码答案

【python】六个常见爬虫案例【附源码】

Python爬虫入门教程 73-100 Python分布式爬虫顶级教程

python爬虫入门教程(非常详细),超级简单的Python爬虫教程

python爬虫入门教程(非常详细)，全网最细的Python爬虫教程

python爬虫入门教程(非常详细),超级简单的Python爬虫保姆教程

Python爬虫入门案例

python爬虫-淘宝商品密码（图文教程附源码）

Python爬虫入门教程 80-100 Python 玩转NewSpaper爬虫框架

今日推荐

Electron中的关于静态资源加载问题解决方案

《Cursor-AI编程》基础篇-界面指南

《Cursor-AI编程》基础篇-Tab代码智能补充

《Cursor-AI编程》基础篇-Composer功能详解

《Cursor-AI编程》基础篇-Chat功能详解

《Cursor-AI编程》进阶篇-自定义模型

《Cursor-AI编程》进阶篇-上下文详解

【大模型系列篇】最强检索增强技术GraphRAG基本原理详解

【大模型系列篇】基于Ollama和GraphRAG v2.0.0快速构建知识图谱

解释什么是迁移学习？在 CNN 中如何应用？（面试题200合集，高频、关键）

解释数据增强（Data Augmentation）的概念和方法（（面试题200合集，高频、关键））

揭秘大模型“魔法”：Function Calling 让 AI 不止会说，更能“做”！

周排行

ConfigurationClassParser类的parse方法源码解析

基础大讲堂-java 位运算符

ConsecutiveInteger判断给定的整数n能否表示成连续的m(m>1)个正整数之和

多项式问题之六——多项式快速幂

Spring Security技术栈开发企业级认证与授权（四）RESTful API服务异常处理

Linux基础命令---apachectl

MATLAB中的线性插值

Unity编辑器拓展之十七：NGUI ComponentSelector增加搜索框

SqlServer 备份还原教程

[Unity动画]01.

每日归档

2025-04-12(10529)

2025-04-11(9561)

2025-04-10(1213)

2025-04-09(10354)

2025-04-08(12998)

2025-04-07(0)

2025-04-06(0)

2025-04-05(0)

2025-04-04(0)

2025-04-03(0)