Flight History Analysis Using Hadoop
Objective
- To analyze flight history data, which provides the reasons for flight delays, negative reviews by passengers.
Project Overview
Flight delays are a important issue in the flight industry, because it will lead to financial crisis in the business. This project identifies the factors influence the occurrence of flight delays. Research survey indicates that every year about 20% of flights are delayed or cancelled. It costs in very big way for both travelers and airlines.
The project is to analyze flight data history by gathering data from official web portal. The data that’s maintained in web portal is big in size and it is increasing everyday. So obviously big data analytics are the best way to analyze the data and extract the useful knowledge from the data set. Hadoop, MapReduce, Hadoop Distributed File System (HDFS) and HIVEare used here in this project as a big data concepts.
Proposed System
The proposed Flight History Analysis Using Hadoop system concentrates on analyzing flight data history to identify the reasons for negative feedback from users and reasons for flight delays. The proposed system architecture is shown in the figure.
Figure: Proposed System Architecture
Flight History Analysis Using Hadoop Queries
- Reasons for flight delay
- Reasons for negative feedback
- How to improve the business model?
Module 1:Data Collection
The required data set is collected from the https://www.kaggle.com/open-flights/flight-route-database. The attributes of the data set are year, month, day, day of the week, airline name, origin airport, destination airport, scheduled departure, scheduled arrival, departure time, arrival time, departure delay, arrival delay and distance.
Module 2: Data Preparation
The collected raw data set is loaded into HDFS directory. This raw data is vulnerable to impurity data like inconsistent and noisy. So before applying machine learning techniques, first data cleaning methods are applied to the missing data and noisy data.
Module 3: Machine Learning
The prepossessed data set is divided into a training set and test set. Here, the training set is used to create models, while test set is used to test the accuracy of the machine learning algorithm. If the accuracy is acceptable, then this applies to the future data.
Machine Learning Classification identifies
- Which attributes impact the flight delay?
- What are the main reasons for negative feedback from passengers?
- Is there any relation between variables that causes the flight delay?
- What kind of offers can be provided for particular segmentation of passengers?
- What kind of things need to be introduced to attract the new customers?
Module 4: Data Visualization
The extracted knowledge and patterns are visualized using Tableau – Business Intelligence tool.
Flight History Analysis Using Hadoop Benefits
- This project will give the exact reason for the flight delay, which will be the important factor in the business.
- Major financial losses can be avoided, with the usage of this project in real time.
Software Requirements
- Linux OS
- MySQL
- Hadoop & MapReduce
- Tableau
Hardware Requirements
- Hard Disk – 500 GB or Above
- RAM required – 4 GB or Above
- Processor – Core i3 or Above
Technology Used
- Big Data – Hadoop
- Business Intelligence
Leave a Reply