如何获取大熊猫DataFrame的行数?

本文翻译自:How do I get the row count of a pandas DataFrame?

I'm trying to get the number of rows of dataframe df with Pandas, and here is my code. 我正在尝试使用Pandas获取数据框df的行数,这是我的代码。

Method 1: 方法1:

total_rows = df.count
print total_rows +1

Method 2: 方法2:

total_rows = df['First_columnn_label'].count
print total_rows +1

Both the code snippets give me this error: 这两个代码段都给我这个错误:

TypeError: unsupported operand type(s) for +: 'instancemethod' and 'int' TypeError:+不支持的操作数类型:“ instancemethod”和“ int”

What am I doing wrong? 我究竟做错了什么?


#1楼

参考:https://stackoom.com/question/14thZ/如何获取大熊猫DataFrame的行数


#2楼

You can use the .shape property or just len(DataFrame.index) . 您可以使用.shape属性,也可以仅使用len(DataFrame.index) However, there are notable performance differences ( len(DataFrame.index) is fastest): 但是,存在明显的性能差异( len(DataFrame.index)是最快的):

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(np.arange(12).reshape(4,3))

In [4]: df
Out[4]: 
   0  1  2
0  0  1  2
1  3  4  5
2  6  7  8
3  9  10 11

In [5]: df.shape
Out[5]: (4, 3)

In [6]: timeit df.shape
2.77 µs ± 644 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [7]: timeit df[0].count()
348 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [8]: len(df.index)
Out[8]: 4

In [9]: timeit len(df.index)
990 ns ± 4.97 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

在此处输入图片说明

EDIT: As @Dan Allen noted in the comments len(df.index) and df[0].count() are not interchangeable as count excludes NaN s, 编辑:正如@Dan Allen在评论中指出len(df.index)df[0].count()不可互换,因为count排除了NaN


#3楼

Use len(df) . 使用len(df) This works as of pandas 0.11 or maybe even earlier. 从熊猫0.11开始,甚至更早。

__len__() is currently (0.12) documented with Returns length of index . __len__()当前(0.12)记录为Returns length of index Timing info, set up the same way as in root's answer: 时间信息,设置方法与root用户的答案相同:

In [7]: timeit len(df.index)
1000000 loops, best of 3: 248 ns per loop

In [8]: timeit len(df)
1000000 loops, best of 3: 573 ns per loop

Due to one additional function call it is a bit slower than calling len(df.index) directly, but this should not play any role in most use cases. 由于有一个附加的函数调用,它比直接调用len(df.index)慢一点,但这在大多数用例中都不起作用。


#4楼

Apart from above answers use can use df.axes to get the tuple with row and column indexes and then use len() function: 除了上述答案外,还可以使用df.axes获取具有行和列索引的元组,然后使用len()函数:

total_rows=len(df.axes[0])
total_cols=len(df.axes[1])

#5楼

Suppose df is your dataframe then: 假设df是您的数据帧,则:

count_row = df.shape[0]  # gives number of row count
count_col = df.shape[1]  # gives number of col count

Or, more succinctly, 或者,更简洁地说,

r, c = df.shape

#6楼

len() is your friend, short answer for row counts is len(df) . len()是您的朋友,行计数的简短答案是len(df)

Alternatively, you can access all rows by df.index and all columns by df.columns , and as you can use the len(anyList) for getting the count of list, hence you can use len(df.index) for getting the number of rows, and len(df.columns) for the column count. 另外,您也可以访问所有行df.index和所有列由df.columns ,并且可以使用len(anyList)用于获取列表的数量,因此你可以使用len(df.index)用于获取数行数,列数为len(df.columns)

Alternatively, you can use df.shape which returns the number of rows and columns together, if you want to access the number of rows only use df.shape[0] and for the number of columns only use: df.shape[1] . 另外,您可以使用df.shape一起返回行数和列数,如果要访问行数,请仅使用df.shape[0] ;对于列数,请仅使用: df.shape[1]

发布了0 篇原创文章 · 获赞 137 · 访问量 84万+

猜你喜欢

转载自blog.csdn.net/xfxf996/article/details/105427128