Overview
Jupyter Notebook for Case Study 2
Welcome To The Data Analysis Process - Case Study 2
In this second case study, you’ll be analyzing fuel economy data provided by the EPA, or Environmental Protection Agency.
What is Fuel Economy?
Excerpt from Wikipedia page on Fuel Economy in Automobiles:
The fuel economy of an automobile is the fuel efficiency relationship between the distance traveled and the amount of fuel consumed by the vehicle. Consumption can be expressed in terms of volume of fuel to travel a distance, or the distance travelled per unit volume of fuel consumed.
Data Overview
Data Source
Below are the web pages from this video. Note that the datasets we’ll be working with are slightly simpler than those found here.
Types of Merges
So far, we’ve learned about appending dataframes. Now we’ll learn about pandas Merges, a different way of combining dataframes. This is similar to the database-style “join.” If you’re familiar with SQL, this comparison with SQL may help you connect these two.
Here are the four types of merges in pandas. Below, “key” refers to common columns in both dataframes that we’re joining on.
- Inner Join - Use intersection of keys from both frames.
- Outer Join - Use union of keys from both frames.
- Left Join - Use keys from left frame only.
- Right Join - Use keys from right frame only.
Below are diagrams to visualize each type.
Jupyter Notebooks
-
Cleaning Column Labes - Drop extraneous columns and standardize all columns e.g lower case and replace spaces with underscores
-
Fixing Data Types Pt 2 -
air_pollution_score
- splitting row with string values into two rows then append them to original dataFrame, then convert them to ints. Used pandasapply
function. -
Fixing Data Types Pt3 -
city_mpg
,hwy_mpg
,cmb_mpg
- used loop to correct data types. Result: final clean data. -
Exploring with Visuals - used scatter plots and histograms.
-
Drawing Conclusions - used scatter plots and histograms. Udacity’s Solution
-
Merging Datasets - perform inner merge based on car models in 2008 and 2018.
-
Results Merge - using
idxmax()
or filtering dataframe to find the most improved vehicle.