Link Search Menu Expand Document

Overview

Jupyter Notebook for Case Study 2

Welcome To The Data Analysis Process - Case Study 2

In this second case study, you’ll be analyzing fuel economy data provided by the EPA, or Environmental Protection Agency.

What is Fuel Economy?

Excerpt from Wikipedia page on Fuel Economy in Automobiles:

The fuel economy of an automobile is the fuel efficiency relationship between the distance traveled and the amount of fuel consumed by the vehicle. Consumption can be expressed in terms of volume of fuel to travel a distance, or the distance travelled per unit volume of fuel consumed.

Data Overview

Data Source

Below are the web pages from this video. Note that the datasets we’ll be working with are slightly simpler than those found here.

Types of Merges

So far, we’ve learned about appending dataframes. Now we’ll learn about pandas Merges, a different way of combining dataframes. This is similar to the database-style “join.” If you’re familiar with SQL, this comparison with SQL may help you connect these two.

Here are the four types of merges in pandas. Below, “key” refers to common columns in both dataframes that we’re joining on.

  1. Inner Join - Use intersection of keys from both frames.
  2. Outer Join - Use union of keys from both frames.
  3. Left Join - Use keys from left frame only.
  4. Right Join - Use keys from right frame only.

Below are diagrams to visualize each type.

image

image

Jupyter Notebooks

  1. Assessing

  2. Cleaning Column Labes - Drop extraneous columns and standardize all columns e.g lower case and replace spaces with underscores

  3. Filter, Drop Nulls, Dedupe

  4. Inspect Data Types

  5. Fixing Data Types Pt 1 - cyl

  6. Fixing Data Types Pt 2 - air_pollution_score - splitting row with string values into two rows then append them to original dataFrame, then convert them to ints. Used pandas apply function.

  7. Fixing Data Types Pt3 - city_mpg,hwy_mpg,cmb_mpg - used loop to correct data types. Result: final clean data.

  8. Exploring with Visuals - used scatter plots and histograms.

  9. Drawing Conclusions - used scatter plots and histograms. Udacity’s Solution

  10. Merging Datasets - perform inner merge based on car models in 2008 and 2018.

  11. Results Merge - using idxmax() or filtering dataframe to find the most improved vehicle.

top