This project demonstrates academic and practical skills in data science and machine learning by modeling passenger choice behavior in the context of airlines and airports. The work is structured as a demonstration of end-to-end data analysis, feature engineering, and modeling.
- Goal: Analyze and model the factors influencing passenger choices between different airlines and airports, using real-world survey data.
- Approach: The project follows a typical data science workflow: data cleaning, exploratory data analysis (EDA), feature engineering, and predictive modeling.
- Implementation: All work is performed in Jupyter notebooks for clarity and reproducibility.
- The dataset (
airport.xlsx
) contains survey responses from airline passengers, including demographic information, travel details, and their choices of airline and airport. - Key columns:
Airport
,Airline
,Age
,Gender
,Nationality
,TripPurpose
,TripDuration
,ProvinceResidence
,GroupTravel
,Destination
, and more.
- Data Cleaning
- Handle missing values and remove irrelevant or incomplete records.
- Filter out 'Other' categories to focus on main analysis groups.
- Exploratory Data Analysis (EDA)
- Summarize data distributions, check for imbalances, and visualize key variables.
- Calculate and interpret choice probabilities for different passenger segments.
- Feature Engineering
- Create new features (e.g., binary flags for Korean vs. foreign airlines, categorical encodings).
- Group and transform variables to prepare for modeling.
- Modeling
- Prepare data for machine learning models (e.g., logistic regression, classification).
- Demonstrate model training, evaluation, and interpretation of results.
A key aspect of this project is the careful, data-driven approach to defining and analyzing groupings (e.g., airline types, passenger segments):
-
Explicit Group Definitions:
Groupings such as “Korean vs. Foreign Airlines” are defined based on domain knowledge and project goals, with clear code and comments explaining the rationale. -
Flexible, Reusable Analysis Functions:
The project includes modular functions (e.g.,choiceProb
) that allow for flexible analysis of choice probabilities across any grouping variable, making the workflow adaptable to different questions and datasets. -
Data-Driven Grouping Decisions:
Groupings are informed by exploratory analysis—choice probabilities are calculated and visualized before finalizing group definitions, ensuring that feature engineering is grounded in the data. -
Transparency and Interpretability:
All grouping logic is clearly documented, and outputs are designed to be interpretable, supporting both reproducibility and clear communication of results.
This rigorous approach demonstrates not only technical proficiency but also a strong analytical mindset and attention to best practices in data science.
- Data cleaning and preprocessing with pandas
- Exploratory data analysis and visualization
- Feature engineering and transformation
- Handling missing data and categorical variables
- Building and evaluating machine learning models
- Writing modular, readable, and reproducible code in Jupyter notebooks
- Install dependencies:
pip install -r requirements.txt
- Open the Jupyter notebooks (
Airline_Modelling.ipynb
,Airport_Modelling.ipynb
, orAirlineModel_refactored.ipynb
). - Run cells step by step to follow the workflow from raw data to modeling and results.
- This project is intended as a demonstration of academic and practical skills in data science and machine learning, not as a production system.
- The code is modular and well-commented for clarity and learning purposes.
For questions or collaboration, please contact the project author.