Pandas for Python
In my continuing education / love affair with Python I have moved past the basics. I have started exploring the rich Python ecosystem, and my latest fascination is Pandas.
What is Pandas for Python?
Pandas for Python helps coders with data manipulation and analysis. It offers data structures and operations for manipulating numerical tables and time series. It’s free software released under a BSD license.
Much like how Python has nothing to do with snakes, Pandas name desn’t come from a cuddly looking Asian bear like animal. It is derived from the term “panel data”. Panel data is an econometrics term for data sets that include both time-series and cross-sectional data.
Pandas is easy to get started with. I’m a pip guy, so for me it was as simple as: pip install pandas. This downloads and installs Pandas and the other packages Pandas needs like NumPy. Panda creates and uses DataFrames, and the simplest version looks something like this:
That will create an empty DataFrame assigned to a variable called df_tk. Since a empty DataFrame is fairly useless, you can add data from a flie, a stream, etc. You can also add data manually with a list of lists or a dictionary. Here is a simple version using the list of lists idea:
df_tk = pandas.DataFrame([[3,24,68],[3,1,3]])
That will give you a DataFrame (think of it kinda like a virtual spread sheet) with 2 rows and 3 columns.
If you wanted to give your rows and columns names, you could so thusly:
df_tk = pandas.DataFrame([[2,4,6],[10,20,30]],columns=["ColI", "ColII" , "ColIII"],index=["Row1", "Row2"])
The real magic starts when you start applying functions to your DataFrame. For a simple example:
Hopefully this gives you a quick little intro to Pandas. If you are Python inclined I whole heartedly recommend installing it and giving it a spin. It scratches a very particular itch, but it does it very well. I’ll post a follow up Pandas post that explores it a little more in depth before the end of the week.