Using Matplotlib and Pandas to Analyze a Region’s Economic Activity

Applied Plotting, Charting and Data Representation using Python

Today we will be using Python’s built-in libaries, Matplotlib and Pandas, to analyze and compare how a region’s GDP compares to its neighboring areas over a specific timeframe.

Specifically we will be comparing Michigan’s GDP against it neighbouring states’ GDP over the last 23 years.

For this proyect we will be using Python’s Pandas and Matplotlib built-in libraries to analyze, compare and ultimately visualize Michigan’s GDP against Ohio’s, Indiana’s and Winsonsin’s respective GDPs.

To do this we will start by looking up publicly accessible datasets. These could be links to files such as CSV or Excel files, or links to websites which might have data in tabular form, such as Wikipedia pages. In this case we will be using data from the Federal Reserve Bank of St. Louis’s Federal Reserve Economic Data.

The FRED is an online database consisting of hundreds of thousands of economic data time series from scores of national, international, public, and private sources.

We will start by looking up the GDP datasets for our particular states. Writing our state’s name followed by the acronym “GDP” (Ex. “Michigan GDP”) on the FRED’s search bar should take us to our data’s webpage. We will download all four state’s data in separate csv files and save them to our computer for further processing.

https://fred.stlouisfed.org/series/MINGSP

https://fred.stlouisfed.org/series/OHNGSP

https://fred.stlouisfed.org/series/INNGSP

https://fred.stlouisfed.org/series/WINGSP

Now let’s get into the coding part of our proyect. We will start by opening our favorite Python interpreter and importing our modules.

Next, we will import our datasets using Panda’s “read.csv” function and assign them to a variable represented by the state’s name.

We will then add our datasets into a combined list of dataframes (13) and perform an outer merge set on our “DATE” column in order to combine our datasets into a single dataset (midwest) using Panda’s pd.merge function. (14)

We will now label our columns to better understand our data. (16–17) Afterwards we will format our “DATE” column in order to only represent the year from which our data was taken. (19–20)

On our last preprocessing line we will add an “expected value” (MEAN) column to our data in order to better understand it. (21)

Now, onto the Data Visualization part of our proyect. We will start by creating our plot using the “Accent” colormap. We will define an adecuate figure size and font size for our chart. (25)

We will then set our chart’s title, X label and Y label using matplotlib’s set_ functions shown below. We will also set the font size for each individual label included in these specific functions. (26–28)

Lastly, we will work on the visual aspect of our chart. We will start by turning on our chart’s minor ticks for the X and Y axis. (31)

Afterwards we will delete both the top and right spines in order to de-clutter our chart. (32–33)

On our last line of code, we will plot a legend for our chart in order to identify our diferent datasets more easily. (34)

After running our code we should end up with our chart looking something like this.

Conclusion

The plot identifies the GDP for Michigan and its neighbouring states over the last 23 years. A dip in the GDP can be seen in all states around the period of the 2009 Global Financial Crisis, with Michigan being the most affected state, as well as the last to recover from the recession. This suggests that variables such as unemployment and consumer spenditure were much more affected in Michigan than in its neigbouring states.

The visual was kept simple in order to provide a visual that actually makes sense. Most gridlines were removed because they were unnecesary as well as most unnecesary ticks, but the minor X ticks for the years were left as they actually helped in visualizing the data. Most borders (top and right) were also removed in order to de-junkify the graph. The legend of the plot was sent to the outside of the plot in order for it to not interfere with the actual data.

A note from the author...

Congratulations on getting to the end of this proyect! I hope I managed to explain the process in a concise way. Pandas and Matplotlib have been some of my most used libraries for Data Science, and I hope you will find them useful as well. Please don’t hesitate in following me on Medium, as I will be posting more articles related to my Data Science journey in the following weeks.

My name is Fernando Suárez, an aspiring Data Scientist from Guatemala. I like to write about basic data science concepts and play with different algorithms and Data Science tools. You could connect with me on LinkedIn.

I’m an Industrial Engineer from Guatemala. Join me on my Data Science journey!