如何读取csv文件中的复杂多层嵌套字典(基于pandas)

前言

对于很多NLP类型的标注任务,往往在标注人员标注完数据后都会把对应的数据保存到一个csv文件中,这个时候,标注的内容一般都是在一个比较复杂的多层嵌套字典中的,这篇博客就跟大家分享一下如何去获取csv文件中多层嵌套字典中的内容

任务背景介绍

csv文件的字段以及部分内容如上,我们具体看标注人员标注结果对应的字段“答案1”

下面是前面三行“答案1”的内容

{"nodes":[{"end_index":15,"id":1,"num_index":1,"start_index":11,"text":"直升机制","text_index":1,"type":"特征词"},{"end_index":21,"id":2,"num_index":1,"start_index":17,"text":"比较合理","text_index":1,"type":"情感词"},{"end_index":44,"id":3,"num_index":1,"start_index":42,"text":"医疗","text_index":1,"type":"特征词"},{"end_index":57,"id":4,"num_index":1,"start_index":55,"text":"苇草","text_index":1,"type":"对象"},{"end_index":112,"id":5,"num_index":1,"start_index":110,"text":"干员","text_index":1,"type":"对象"},{"end_index":121,"id":6,"num_index":1,"start_index":120,"text":"抽","text_index":1,"type":"特征词"},{"end_index":129,"id":7,"num_index":1,"start_index":127,"text":"角色","text_index":1,"type":"对象"},{"end_index":134,"id":8,"num_index":1,"start_index":132,"text":"肉鸽","text_index":1,"type":"特征词"},{"end_index":134,"id":9,"num_index":1,"start_index":132,"text":"肉鸽","text_index":1,"type":"对象"},{"end_index":141,"id":10,"num_index":2,"start_index":139,"text":"角色","text_index":2,"type":"对象"},{"end_index":154,"id":11,"num_index":1,"start_index":151,"text":"全图鉴","text_index":1,"type":"特征词"},{"end_index":162,"id":12,"num_index":2,"start_index":160,"text":"干员","text_index":2,"type":"对象"}],"relations":[{"node1":1,"relation_type":"情感","relation_value":"正面"},{"node1":1,"relation_type":"维度","relation_value":"养成系统"},{"node1":3,"relation_type":"情感","relation_value":"中性"},{"node1":3,"relation_type":"维度","relation_value":"强度表现"},{"node1":6,"relation_type":"情感","relation_value":"中性"},{"node1":6,"relation_type":"维度","relation_value":"抽卡体验"},{"node1":8,"relation_type":"情感","relation_value":"中性"},{"node1":8,"relation_type":"维度","relation_value":"玩法体验"},{"node1":11,"relation_type":"情感","relation_value":"中性"},{"node1":11,"relation_type":"维度","relation_value":"强度表现"},{"node1":1,"node2":2,"relation_type":"相关","relation_value":"相关"}]}
{"nodes":[{"end_index":27,"id":1,"num_index":1,"start_index":25,"text":"天赋","text_index":1,"type":"特征词"},{"end_index":31,"id":2,"num_index":1,"start_index":27,"text":"确实有限","text_index":1,"type":"情感词"},{"end_index":41,"id":3,"num_index":1,"start_index":39,"text":"体质","text_index":1,"type":"特征词"},{"end_index":44,"id":4,"num_index":1,"start_index":42,"text":"不行","text_index":1,"type":"情感词"},{"end_index":49,"id":5,"num_index":1,"start_index":47,"text":"小刻","text_index":1,"type":"对象"}],"relations":[{"node1":1,"relation_type":"情感","relation_value":"负面"},{"node1":1,"relation_type":"维度","relation_value":"强度表现"},{"node1":3,"relation_type":"情感","relation_value":"负面"},{"node1":3,"relation_type":"维度","relation_value":"强度表现"},{"node1":1,"node2":2,"relation_type":"相关","relation_value":"相关"},{"node1":3,"node2":4,"relation_type":"相关","relation_value":"相关"}]}
{"nodes":[{"end_index":15,"id":1,"num_index":1,"start_index":13,"text":"弱智","text_index":1,"type":"情感词"},{"end_index":18,"id":2,"num_index":1,"start_index":16,"text":"机制","text_index":1,"type":"特征词"},{"end_index":31,"id":3,"num_index":1,"start_index":27,"text":"活动设计","text_index":1,"type":"特征词"},{"end_index":33,"id":4,"num_index":1,"start_index":31,"text":"幽默","text_index":1,"type":"情感词"},{"end_index":94,"id":5,"num_index":1,"start_index":87,"text":"HE1和HE5","text_index":1,"type":"特征词"},{"end_index":108,"id":6,"num_index":1,"start_index":105,"text":"远程怪","text_index":1,"type":"特征词"},{"end_index":155,"id":7,"num_index":1,"start_index":153,"text":"干员","text_index":1,"type":"对象"},{"end_index":175,"id":8,"num_index":1,"start_index":173,"text":"开局","text_index":1,"type":"特征词"},{"end_index":189,"id":9,"num_index":1,"start_index":183,"text":"cost上限","text_index":1,"type":"特征词"},{"end_index":209,"id":10,"num_index":1,"start_index":207,"text":"关卡","text_index":1,"type":"特征词"},{"end_index":356,"id":11,"num_index":2,"start_index":350,"text":"cost上限","text_index":2,"type":"特征词"},{"end_index":376,"id":12,"num_index":1,"start_index":374,"text":"射手","text_index":1,"type":"对象"},{"end_index":388,"id":13,"num_index":1,"start_index":382,"text":"体验新式开局","text_index":1,"type":"特征词"},{"end_index":397,"id":14,"num_index":1,"start_index":394,"text":"堵堵堵","text_index":1,"type":"情感词"},{"end_index":416,"id":15,"num_index":2,"start_index":414,"text":"关卡","text_index":2,"type":"特征词"}],"relations":[{"node1":2,"relation_type":"情感","relation_value":"负面"},{"node1":2,"relation_type":"维度","relation_value":"剧情文案"},{"node1":3,"relation_type":"情感","relation_value":"负面"},{"node1":3,"relation_type":"维度","relation_value":"玩法体验"},{"node1":5,"relation_type":"情感","relation_value":"中性"},{"node1":5,"relation_type":"维度","relation_value":"玩法体验"},{"node1":6,"relation_type":"情感","relation_value":"中性"},{"node1":6,"relation_type":"维度","relation_value":"玩法体验"},{"node1":8,"relation_type":"情感","relation_value":"中性"},{"node1":8,"relation_type":"维度","relation_value":"玩法体验"},{"node1":9,"relation_type":"情感","relation_value":"中性"},{"node1":9,"relation_type":"维度","relation_value":"玩法体验"},{"node1":10,"relation_type":"情感","relation_value":"中性"},{"node1":10,"relation_type":"维度","relation_value":"玩法体验"},{"node1":11,"relation_type":"情感","relation_value":"中性"},{"node1":11,"relation_type":"维度","relation_value":"玩法体验"},{"node1":13,"relation_type":"情感","relation_value":"负面"},{"node1":13,"relation_type":"维度","relation_value":"玩法体验"},{"node1":15,"relation_type":"情感","relation_value":"中性"},{"node1":15,"relation_type":"维度","relation_value":"玩法体验"},{"node1":2,"node2":1,"relation_type":"相关","relation_value":"相关"},{"node1":13,"node2":14,"relation_type":"相关","relation_value":"相关"},{"node1":3,"node2":4,"relation_type":"相关","relation_value":"相关"}]}

表面上看起来确实是十分复杂

代码实现

我们先来打印一下每行的“答案1”字段的内容

import pandas as pd

data_path = r"csv文件的路径"
data = pd.read_csv(data_path)

for i,row in data.iterrows():
    answer = row['答案1']
    print(answer)

这些都是字符串数据,我们把对应的符号转化成python的格式数据,就能对嵌套字典进行遍历了

先来看看最外层的字典的主键

for i,row in data.iterrows():
    answer = row['答案1']
    answer = eval(answer)
    print(answer.keys())

扫描二维码关注公众号,回复: 17321594 查看本文章

可见最外层的主键有两个,‘nodes’和‘relations’,那我们来赋值一下

for i,row in data.iterrows():
    answer = row['答案1']
    answer = eval(answer)
    nodes = answer['nodes']
    relations = answer['relations']
    print('nodes:',nodes)
    print('relation:',relations)

可以直观地看到这两个主键对应的是一个列表数据,列表中的每一个元素又是一个字典,到这一步,我们可以一次打印出列表中的每个字典

for i,row in data.iterrows():
    answer = row['答案1']
    answer = eval(answer)
    nodes = answer['nodes']
    relations = answer['relations']
    print("nodes:")
    for node in nodes:
        print(node)
    print("relations:")
    for relation in relations:
        print(relation)

标注人员标注的结果就变得很清晰了

欢迎大家讨论交流~


猜你喜欢

转载自blog.csdn.net/weixin_57506268/article/details/135315354