The other day I posted a Pandas for Python Primer . I promised a follow up before the end of the week; and away we go…
Pandas for Python, A Slight Return
Previously I talked about creating DataFrames, their rows and columns, and how one might apply a function to them. In my experience, more often than not you’ll want to import some data. You start by importing Pandas of course. A note here – by convention some coders import Pandas as pd. You’ll want to replace my pandas. to pd. if you are doing this.
Assume that there is an excel spreadsheet in the current working directory called MoultrieEateries.xlsx. We’ll need some additional help for Pandas to understand excel, so open your shell and: pip install xlrd. If you don’t use pip you should switch lol- j/k. Use your package manger to get and install xlrd.
We’re importing the Pandas library, creating a DataFrame named df, and then printing the name of the columns. I just like to print something out to make sure nothing looks outta whack. You can also call something like:
to see a few lines of the top or the bottom of the DataFrame. Head and tail are safer than calling the entire DataFrame when you have no idea how big your data is and you just want to check the structure. You can specify how many rows are returned ala: df.head(3).
In my example here pandas is showing me 4 rows and all of the columns. I would only be interested in some of the columns in this DataFrame. They are also indexed by ID which is not what I want either. One of the best uses for Pandas is cleaning data. That is taking a bunch of information and manipulating it into just the data you need. Remember the DataFrame we created is called df:
df.sort_values("Ratings Average", ascending=False)
The first line of code up there deletes all of the columns that I don’t need. The numbers in the brackets are a slice. The second line re-orders my data so that it makes more sense by showing the eateries by ranking instead of by ID. Keep in mind that sort_values() does not work in-place, so you’ll have to change that parameter when calling it or assign it to a new variable if you want to retain the returned, sorted data. Also notice that I had to add another parameter setting the sort to ascending=False so those joints with the highest rating were first.
This has barely scratched the surface – there are tons of details about DataSets over here. Hopefully with these first 2 posts you have an idea about how to get started with Pandas. I’ll write one more post in a couple of days that will show you how to get some longitude and latitude from these addresses.
Not for nothin’, but this is my favorite Pandas reference link on the web.