jorijnsmit :
I have two dataframes which I am trying to concatenate. I made sure they have the same amount of columns and that the data types match.
However, when calling pd.concat([df1, df2], ignore_index=True)
I get a dataframe back with 24 columns and lots of NaN
values. I expect pd.concat()
to just place the second dataframe 'underneath' the first one (so the default; axis=0
).
What am I doing wrong?
>>> df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 798810 entries, 0 to 798809
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 798810 non-null Int64
1 1 798810 non-null float64
2 2 798810 non-null float64
3 3 798810 non-null float64
4 4 798810 non-null float64
5 5 798810 non-null float64
6 6 798810 non-null Int64
7 7 798810 non-null float64
8 8 798810 non-null Int64
9 9 798810 non-null float64
10 10 798810 non-null float64
11 11 798810 non-null float64
dtypes: Int64(3), float64(9)
memory usage: 75.4 MB
>>> df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 0 500 non-null Int64
1 1 500 non-null float64
2 2 500 non-null float64
3 3 500 non-null float64
4 4 500 non-null float64
5 5 500 non-null float64
6 6 500 non-null Int64
7 7 500 non-null float64
8 8 500 non-null Int64
9 9 500 non-null float64
10 10 500 non-null float64
11 11 500 non-null float64
dtypes: Int64(3), float64(9)
memory usage: 48.5 KB
>>> pd.concat([df1, df2], ignore_index=True).shape
(799310, 24)
jezrael :
I think columns names in one Dataframe are not numeric, but strings, so you can try:
df1.columns = df1.columns.astype(int)
df2.columns = df2.columns.astype(int)
df = pd.concat([df1, df2], ignore_index=True)
Or:
df = pd.concat([df1.rename(columns=int), df2.rename(columns=int)], ignore_index=True)