文章目录

一、os包的理解
二、如何提取图片的名称？
三、遍历一个文件夹，提取里面的图像名称
四、如何提取图片名称中的特定部分？
五、代码报错修改

一、os包的理解

os 是 Python 中的一个内置模块，提供了与操作系统交互的功能。它允许您在 Python 程序中执行各种与操作系统相关的任务，如文件和目录操作、进程管理、环境变量访问等。以下是一些 os 模块中常用的函数和方法：

1.1 文件和目录操作

os.getcwd()：获取当前工作目录的路径。
os.chdir(path)：将当前工作目录更改为指定的路径。
os.listdir(path)：返回指定目录中的文件和目录列表。
os.mkdir(path)：创建一个新目录。
os.remove(path)：删除指定路径的文件。
os.rmdir(path)：删除指定路径的空目录。
os.path.exists(path)：检查指定路径是否存在。
os.path.isfile(path)：检查指定路径是否是一个文件。
os.path.isdir(path)：检查指定路径是否是一个目录。

1.2 进程管理

os.system(command)：在子shell中执行系统命令。
os.spawnl(mode, path)：新建进程执行指定的程序文件。
os.kill(pid, sig)：向指定进程发送信号。

1.3 环境变量

os.environ：一个包含环境变量的字典。
os.getenv(key)：获取指定环境变量的值。
os.getenv(key)：获取指定环境变量的值。

1.4 路径操作

os.path.join(path1, path2, …)：将多个路径组合成一个。
os.path.basename(path)：返回指定路径的文件名部分。
os.path.dirname(path)：返回指定路径的目录部分。
os.path.splitext(path)：分离路径的扩展名部分。

这些只是 os 模块提供的一些常用函数和方法的示例。在实际应用中，您可以根据需要使用其他函数和方法。要使用 os 模块，只需在 Python 脚本中导入它，例如：import os。然后，您就可以使用模块中的函数和方法来执行与操作系统相关的任务。

二、如何提取图片的名称？

要获取图像的名称，您可以使用Python中的os.path.basename函数。

import os

image_path = './val/cropped_(0, 0, 7, 26)_obj365_val_000000605687.jpg'
image_name = os.path.basename(image_path)
print(image_name)

在上述代码中，将"图像文件路径"替换为实际的图像文件路径。os.path.basename函数将返回文件路径中的文件名部分。然后，您可以将文件名存储在一个变量中，或者根据需要进行后续处理。

请注意，图像文件路径可以是绝对路径（例如：‘/path/to/image.jpg’）或相对路径（例如：‘images/image.jpg’）。确保指定正确的文件路径以获得正确的图像名称。

我们上面的输出结果为：

cropped_(0, 0, 7, 26)_obj365_val_000000605687.jpg

三、遍历一个文件夹，提取里面的图像名称

要遍历一个文件夹并提取其中的图像名称，我们可以使用 os 模块结合循环来完成。以下是一个示例代码：

import os

folder_path = '文件夹路径'

# 获取文件夹中的所有文件和子文件夹
files = os.listdir(folder_path)

# 遍历文件夹中的每个文件和子文件夹
for file in files:
    file_path = os.path.join(folder_path, file)
    
    # 检查文件是否为图像文件
    if os.path.isfile(file_path) and file.lower().endswith(('.jpg', '.jpeg', '.png', '.gif')):
        image_name = os.path.splitext(file)[0]  # 提取图像名称
        print(image_name)

在上述代码中，将 ‘文件夹路径’ 替换为您实际的文件夹路径。首先，使用 os.listdir() 函数获取文件夹中的所有文件和子文件夹的列表。然后，通过遍历这个列表，对于每个文件，使用 os.path.isfile() 函数检查它是否是一个文件，并使用 file.lower().endswith((‘.jpg’, ‘.jpeg’, ‘.png’, ‘.gif’)) 条件来判断是否为图像文件。如果是图像文件，则使用 os.path.splitext() 函数提取图像的名称，并打印或存储该名称。

请注意，这个示例假定文件夹中只包含图像文件。如果文件夹中包含其他类型的文件，您可以根据需要修改条件来筛选出图像文件。

我的文件夹路径为：

folder_path = 'segment-anything-main/notebooks/images/'

在这里插入图片描述

输出结果为：

在这里插入图片描述

四、如何提取图片名称中的特定部分？

cropped_(0, 0, 10, 13)_obj365_val_000000057887.jpg是我的一个图像名称，我想提取中间的坐标信息和图像的名称，该如何实现？

要从字符串 “cropped_(0, 0, 10, 13)_obj365_val_000000057887” 中提取 (0, 0, 10, 13) 和 “obj365_val_000000057887”，您可以使用字符串操作和正则表达式。

下面是一种可能的方法，使用正则表达式来匹配模式并提取所需的部分：

import re

# 原始字符串
filename = "cropped_(0, 0, 10, 13)_obj365_val_000000057887"

# 使用正则表达式匹配并提取坐标和对象名称
pattern = r'\((\d+), (\d+), (\d+), (\d+)\)_([^_]+)'
match = re.search(pattern, filename)

if match:
    # 提取坐标
    coordinates = tuple(map(int, match.group(1, 2, 3, 4)))
    print(coordinates)

    # 提取对象名称
    object_name = match.group(5)
    print(object_name)

在上述代码中，我们使用正则表达式模式 r’((\d+), (\d+), (\d+), (\d+))([^]+)’ 来匹配括号内的四个数字和下划线之间的部分。通过使用 re.search() 函数进行匹配，我们可以使用 match.group() 方法提取所需的结果。

输出将会是：

(0, 0, 10, 13)
obj365_val_000000057887

这样，我们就可以提取出 (0, 0, 10, 13) 和 “obj365_val_000000057887”。请注意，正则表达式模式中的括号 (…) 表示捕获组，我们使用 match.group() 来访问这些捕获组中的匹配结果。

五、代码报错修改

在运行代码时，遇到了一次错误：

Traceback (most recent call last):
  File "/home/wangzhenkuan/CLIP_image_encoder_sam_with_info.py", line 55, in <module>
    coordinates = tuple(map(int, name_parts[1].strip('()').split(',')))
ValueError: invalid literal for int() with base 10: '355.0'

根据报错信息，看起来我的坐标值包含浮点数，而 int() 函数只接受整数值。您可以尝试使用 float() 函数将坐标值转换为浮点数，或者根据您的需求选择使用 int() 或 float() 来处理坐标值。

以下是修改后的代码示例，将坐标值转换为浮点数：

import os
import torch
import clip
from PIL import Image
import pandas as pd

# 加载预训练的CLIP模型
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

# 图像文件夹路径
folder_path = 'segment-anything-main/notebooks/output/'  # 将此路径替换为图像文件夹的实际路径

# 批处理大小和Excel表格名称
batch_size = 10000  # 每个批次的图像数量
excel_prefix = 'output_batch_'  # Excel表格的前缀名称

# 获取图像文件列表
image_files = [filename for filename in os.listdir(folder_path) if filename.endswith(('.jpg', '.png'))]

# 计算批次数量
num_batches = len(image_files) // batch_size + 1

for batch_idx in range(num_batches):
    # 获取当前批次的图像文件列表
    start_idx = batch_idx * batch_size
    end_idx = min(start_idx + batch_size, len(image_files))
    batch_files = image_files[start_idx:end_idx]

    # 创建空DataFrame用于存储图像特征和信息
    df = pd.DataFrame()
    feature_list = []
    coordinates_list = []
    image_name_list = []

    # 遍历当前批次的图像文件
    for filename in batch_files:
        image_path = os.path.join(folder_path, filename)

        # 加载和预处理图像
        image = Image.open(image_path).convert('RGB')
        image_resized = image.resize((28, 28))
        image_input = preprocess(image_resized).unsqueeze(0).to(device)

        # 图像编码
        with torch.no_grad():
            image_features = model.encode_image(image_input)

        # 将图像特征添加到列表中
        image_features_list = image_features.squeeze().tolist()
        feature_list.append(image_features_list)

        # 解析图像文件名以提取坐标和对象名称
        name_parts = filename.split('_')
        coordinates = tuple(map(float, name_parts[1].strip('()').split(',')))
        object_name = '_'.join(name_parts[2:])
        coordinates_list.append(coordinates)
        image_name_list.append(object_name)

    df['features'] = feature_list
    df['coordinates'] = coordinates_list
    df['image_name'] = image_name_list

    # 生成当前批次的Excel表格
    base_path = 'output/'
    excel_filename = f"{
      
      excel_prefix}{
      
      batch_idx + 1}.xlsx"
    output_path = os.path.join(base_path, excel_filename)
    df.to_excel(output_path, index=False)

    print(f"Batch {
      
      batch_idx + 1} processed. Excel file saved: {
      
      excel_filename}")

请注意，将坐标值转换为浮点数后，存储在 DataFrame 的 coordinates 列中。图像名称存储在 image_name 列中。我门可以根据需要进一步调整和修改代码以适应需求。

我们生成的数据集如下所示：

在这里插入图片描述

第一列有512列。

【跑实验06】os包的理解？如何提取图片的名称？如何遍历一个文件夹，提取里面的图像名称？如何提取图片名称中的特定部分？代码错误地方修改；

文章目录

一、os包的理解

1.1 文件和目录操作

1.2 进程管理

1.3 环境变量

1.4 路径操作

二、如何提取图片的名称？

三、遍历一个文件夹，提取里面的图像名称

四、如何提取图片名称中的特定部分？

五、代码报错修改

猜你喜欢