如何在Pandas的DataFrame中的行上进行迭代?

本文翻译自:How to iterate over rows in a DataFrame in Pandas?

I have a DataFrame from pandas: 我有一个来自熊猫的DataFrame

import pandas as pd
inp = [{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}]
df = pd.DataFrame(inp)
print df

Output: 输出:

   c1   c2
0  10  100
1  11  110
2  12  120

Now I want to iterate over the rows of this frame. 现在,我要遍历该框架的行。 For every row I want to be able to access its elements (values in cells) by the name of the columns. 对于每一行,我希望能够通过列名访问其元素(单元格中的值)。 For example: 例如:

for row in df.rows:
   print row['c1'], row['c2']

Is it possible to do that in pandas? 熊猫有可能这样做吗?

I found this similar question . 我发现了类似的问题 But it does not give me the answer I need. 但这并不能给我我所需的答案。 For example, it is suggested there to use: 例如,建议在那里使用:

for date, row in df.T.iteritems():

or 要么

for row in df.iterrows():

But I do not understand what the row object is and how I can work with it. 但是我不明白什么是row对象以及如何使用它。


#1楼

参考:https://stackoom.com/question/178Oq/如何在Pandas的DataFrame中的行上进行迭代


#2楼

You should use df.iterrows() . 您应该使用df.iterrows() Though iterating row-by-row is not especially efficient since Series objects have to be created. 尽管逐行迭代并不是特别有效,因为必须创建Series对象。


#3楼

DataFrame.iterrows is a generator which yield both index and row DataFrame.iterrows是产生索引和行的生成器

import pandas as pd
import numpy as np

df = pd.DataFrame([{'c1':10, 'c2':100}, {'c1':11,'c2':110}, {'c1':12,'c2':120}])

for index, row in df.iterrows():
    print(row['c1'], row['c2'])

Output: 
   10 100
   11 110
   12 120

#4楼

You can also use df.apply() to iterate over rows and access multiple columns for a function. 您还可以使用df.apply()遍历行并访问一个函数的多列。

docs: DataFrame.apply() docs:DataFrame.apply()

def valuation_formula(x, y):
    return x * y * 0.5

df['price'] = df.apply(lambda row: valuation_formula(row['x'], row['y']), axis=1)

#5楼

While iterrows() is a good option, sometimes itertuples() can be much faster: 尽管iterrows()是一个不错的选择,但有时itertuples()可以更快:

df = pd.DataFrame({'a': randn(1000), 'b': randn(1000),'N': randint(100, 1000, (1000)), 'x': 'x'})

%timeit [row.a * 2 for idx, row in df.iterrows()]
# => 10 loops, best of 3: 50.3 ms per loop

%timeit [row[1] * 2 for row in df.itertuples()]
# => 1000 loops, best of 3: 541 µs per loop

#6楼

You can use the df.iloc function as follows: 您可以按以下方式使用df.iloc函数:

for i in range(0, len(df)):
    print df.iloc[i]['c1'], df.iloc[i]['c2']
发布了0 篇原创文章 · 获赞 7 · 访问量 2万+

猜你喜欢

转载自blog.csdn.net/asdfgh0077/article/details/105215083