Green Taxi Exploratory Data Analysis
In this assignment, I have tried to perform Exploratory data analysis on Green Taxi Pickup data to conclude price of Taxi fares in longer duration trip fares, number of passengers travelling together in one trip fare and which vendor will be available to provide service for shorter duration trip and longer duration trip respectively.
I have collected data from public accessible sources like www.data.gov.com
Loading Packages
Let us load the packages needed for visualization and exploratory analysis
I have imported pandas library to read dataset from formats like CSV. Numpy support some specific scientific functions such as linear algebra so we I have imported that library too. Seaborn library is used for data visualization.
Loading Data
Loading the csv as a dataframe and checking the structure of the dataset
Let us see statistical details of the data
Let us have a look at the distribution of various variables in the Data set.
(i) Passenger Count
Here we see that the mostly 1 or 2 passengers available in the cab. So we can conclude that mostly 1 or 2 people are travelling together in cab, large group of people travelling together is rare.
(ii) Price fare (total amount charged per trip distance)
Vendor availability for Shorter and Longer trip duration respectively
Here we see that vendor 1 mostly provides short trip duration cabs while vendor 2 provides cab for both short and long trips
The relationship between store forward flag and duration
Thus we see the flag was stored only for short duration trips and for long duration trips the flag was never stored.
Conclusion about the data set:
- There are only 1 or 2 passenger counts travelling together in one fare
- The ride fare is more for longer distance trip.
- Vendor 2 mostly provides the longer trips.
- fwd flag was only stored for short duration trips and not for long duration trips.