Welcome to the third ( and final ) installment in this series of posts about Pandas for Python. If you are new to Pandas you might want to start with the first Pandas post. I left off in the previous post cleaning up my data, something that pandas is perfect for. Today I’m gonna talk about Geo-coding the data I’m working with.
Geo-coding your Pandas data
So now that we have our data in a format that is more useful, it’s time to do something with it! What I’ve decided to do today is get the longitude and latitude for all of our addresses. Luckily this data has the address information in a format that is all ready to be geo-coded:
But lets say your not so lucky and your data has address broken up into multiple fields like so:
One again, Pandas data cleaning to the rescue! We’ll clean this up by concatenating our fields:
df["Address"] =df["Address"] + "," + df["City"] + "," + df["State"] + "," + df["Country"]
And that would add all of those address columns ( city, state, and country ) into the main address field so that we are ready to geo-code it! You could drop the now extraneous city, state and country columns if you wanted to using df.drop(df.columns) as shown in the previous post.
Now, lets go back to our original data. Remember we are calling our DataFrame df, and it looks like this:
Since we are going to geo-code these addresses to get their longitude and latitude we’ll need to have geopy installed ( pip install geopy ) and then we’ll import it and make an instance like this:
from geopy.geocoders import Nominatim
Next, lets make a new column and cycle through our addresses to geo-code them. We’ll call this column Coordinates. We are gonna apply our geocoders instance to the address column to fill up our new Coordinates column:
As you can see, our new Coordinates table is filled up with new info. While it might not look like the info we were after at first glance, our longitude and latitude info is in there! Lets add a couple of more columns, not surprisingly called Latitude and Longitude. We’ll apply a nifty lambda expression to populate our new columns with info it’s getting from our Coordinates column. Finally, if our lambda function can’t get the longitude and latitude data it’ll substitute “None” for the missing info.
A quick aside: If you do not understand Lambda, you need to invest some more time learning about it. You are going to see it over and over and need to understand how lambda works.
df["Latitude"]=df["Coordinates"].apply(lambda x: x.latitude if x != None else None)
df["Longitude"]=df["Coordinates"].apply(lambda x: x.longitude if x != None else None)
You can see that our newly created columns are now filled with the longitude and latitude corresponding to the addresses. Pretty cool, huh? If you send google those numbers, separated by a comma, you’ll see a point on a map… I’ll let you take it from there, I am sure that you can imagine a number of useful applications.
There you go, a three part introduction to Pandas. Pandas is vast and powerful and these posts are in no way meant to cover everything it can do. I hope that someone finds it useful as a way to get started. Let me know in the comments section what you are doing with Pandas or if you have any questions about these posts.