Putting Python Pandas DataFrames Together
If you have worked with Pandas for any length of time you have probably come across the need to stick a dataframe together with another dataframe. It turns out it’s not as simple as you might think.
I’m going to show you a couple of ways to accomplish putting one or more dataframes together in Pandas depending on the situation and desired results.
The simplest approach is just tacking one dataframe on to the end of another dataframe. This can work well if the dataframes contain exactly the same structure.
In this simple example we are appending df_two to the end of df_one:
df_one = df_one.append(df_two, ignore_index = True)
So df_one now looks like this:
That ignore_index parameter is why we have a consecutively ordered index. If it’s omitted or set to false the items will retain the index from their original dataframe. A related parameter is verify_integrity. If it’s set to True, it will throw an error if the index contains duplicates.
These dataframes have the same shape. If you try the append method on dataframes that are not the same shape it will work, but you will end up with NaN values in the places that the dataframes don’t match up:
Appending dataframes is all well and good, but for more complicated operations actual database style merges may work better.
Consider the following:
Lets say that we wanted to put the data in both col_two and col_three into a dataframe and keep it associated with the appropriate artist:
df = df_first.merge(df_second, left_on='name', right_on='artist_name', how='outer')
Much like the append method – if the data is not a similar shape you can expect NaN’s. Another consideration is that unless the columns you merge on have exactly the same content you will get unexpected results like duplicate rows, etc.
To Merge, Join, Append, or Concat?
Hopefully this will get you started putting dataframes together, but this only scratches the surface of this topic. I plan on explaining this further in upcoming blog posts but if you can’t wait – here is the official Pandas merge documentation.
If you want the Jupyter Notebook I used to create these very simple examples, you can grab it here.