Airline On-Time Performance
Objective
The objective is to analyze the airline data to provide the airline on time performance statistics to the end user using R programming.
Project Overview
Airline on time performance refers the service success rate by the airlines based on the schedule. Airline delay is the most important issue in the airline industry, because it will lead to economic crisis in the airline business for the owners. This project analyses the airline data to provide the necessary statistics related to airline on time performance.One of the research study shows that that every year nearly 20% of airlines are delayed or cancelled. This delay or cancel makes big issue is airline industry for their service and their business. It affects both travellers and airlines in big way.
The project focuses on extracting airline on time performance statistics based on airline data history using R programming. Factors like weather, issues in scheduling, passenger arrival delay and etc., are causing the airline delay. The airline on time performance is measured by the following formula.
On-Time Performance = (On-Time Service/Total Number of Services)*100%
Proposed System
The proposed system concentrates on analyzing airline data history to provide the important and interesting statistics related to airline on time performance. The proposed system architecture is shown in the figure.
Figure: Proposed System Architecture
Module 1:Data Collection
The required data set US Department of Transportation airline on-time performance datais collected from the web. The attributes of the data set are origin, destination, date, early time and late time.
Module 2: Data Preparation
The collected raw data set is loaded into MySQL database with R integration. This raw data is susceptible to missing data and noisy data. So necessary preprocessing techniques like data cleaning methods applied to the data set to replace missing values and to smooth the noisy data.
Module 3: Statistics
The pre processed data set is processed in R tool to identify the important statistics. R packages dplyrand ggplot2 are used here to generate the necessary statistics.
Statistics answers the following,
- Number of airlines from same origin
- Number of airlines to same destination
- Arrival delay reasons
- Late Aircraft
- Weather
- Security
- Carrier
- National Aviation System
- Cancellations
- Weather
- Carrier
- National Aviation System
Module4: Data Visualization
The extracted statistics and information are visualized using R packages dplyr and ggplot2.
Benefits
- This project is used to find the interesting factors for airline on time performance. So business owners will benefit from the statistics by making better decisions in future ad understand the business thoroughly.
- Travelers will find the user friendly airline based on the airline on time performance statistics.
Software Requirements
- Windows
- MySQL
- R
Hardware Requirements
- Hard Disk – 500 GB or Above
- RAM required – 4 GB or Above
- Processor – Core i3 or Above
Technology Used
- Statistics
- Business Intelligence