[April 2023 American Saiga Competition] Question Y: Understanding Used Sailboat Prices Three complete papers and codes

[April 2023 American Saiga Competition] Question Y: Understanding Used Sailboat Prices 25 pages of complete papers and codes

Please add a picture description

1 topic

2023 MCM Question Y: Understanding Used Sailboat Prices

Like many luxury items, the value of a sailboat changes with age and market conditions. The attached "2023_MCM_Problem_Y_Boats.xlsx" file includes data for approximately 3,500 sailing boats 36 to 56 feet in length sold in Europe, the Caribbean, and the United States in December 2020. A boating enthusiast provided the data to COMAP. Like most real-world datasets, it may have missing data or other issues that require some data cleaning before analysis. The Excel file includes two tabs, one for monohulls and one for catamarans. The columns in each excel include:

  • Make : The manufacturer name of the ship.

  • Variant : A ship name that identifies a specific model.

  • Length (ft) : The length of the boat in feet.

  • Geographic Region : The geographic region where the ship is located (Caribbean, Europe, America).

  • Country/Region/State : The specific country/region/state where the vessel is located. Listing Price (USD) : The advertised price to purchase the ship in USD.

  • Year : The year the ship was built.

For a given make, variant and year, there are many other sources besides the provided Excel files that can provide a detailed description of the characteristics of a particular sailboat. You may supplement the provided dataset with any additional data of your choice; however, you must include the data in "2023_MCM_Problem_Y_Boats.xlsx" in your modeling. Be sure to fully identify and document the source of any supplementary data used. Sailboats are often sold through brokers. To gain a better understanding of the sailboat market, a Hong Kong (SAR) sailboat broker commissioned your team to prepare a report on the pricing of used sailboats. This broker expects you to:

  • Develop a mathematical model to explain the listing price for each sailboat in the provided spreadsheet. Include any predictors you find useful. You can utilize other sources for other characteristics of a given sailboat (such as beam, draft, displacement, rigging, sail area, hull material, engine hours, sleeping capacity, headroom, electronics, etc.) and breakdown by year and region. Economic data. Identify and describe all data sources used. A discussion of the accuracy of price estimates for each sailing variant is included.

  • Use your model to account for the regional impact (if any) on listing prices. Discuss whether the area effects of all sailing variants are consistent. Describe the actual and statistical significance of any regional effects noted.

  • Discuss how your modeling of a given geographic area might work in the Hong Kong (SAR) market. Select an informative subset of sailboats from the spreadsheet provided, broken down into monohulls and catamarans. Comparable listing price data for this subset were found from the Hong Kong (SAR) market. Model the regional influence (if any) of Hong Kong (SAR) on the price per sailboat for sailboats in the subset. Are catamarans and monohulls the same impact?

  • Identify and discuss any other interesting and informative inferences or conclusions your team draws from the data.

  • Prepare a one to two page report for Hong Kong (SAR) sailing brokers. Include some well-chosen charts to help brokers understand your conclusions.

PDF solutions totaling no more than 25 pages should include:

  • A one-page summary table that clearly describes your approach to the problem and the most important conclusions drawn from the analysis in the context of the problem.

  • Table of contents.

  • Your complete solution.

  • Give the broker a one to two page report.

appendix

**data file. **2023_MCM_Problem_Y_Boats.xlsx

Monohulled Sailboats : monohulled sailboats

Catamarans : Catamarans

Data file entry description

  • Make : The manufacturer name of the ship. Make, Variant, Length, Geographic Region

  • Variant : A ship name that identifies a specific model.

  • Length (ft) : The length of the boat in feet.

  • Geographic Region : The geographic region where the ship is located (Caribbean, Europe, America).

  • Country/Region/State : The specific country/region/state where the vessel is located. Listing Price (USD) : The advertised price to purchase the ship in USD.

  • Year : The year the ship was built.

Glossary

  • Beam : The width of the boat at its widest point.

  • Broker : A sailboat.

  • Catamarans : A type of multihull boat with two parallel hulls of equal size.

  • Displacement : The weight of a ship's displacement.

  • Draft (Draft) : The minimum water depth required to make the ship float without touching the bottom.

  • Engine Hours (EngineHours) : The number of hours the ship's engines have been running since new.

  • Headroom : The height at which you can stand in the cabin.

  • Hull : The main body or outer shell of a boat or other watercraft, including the bottom, sides, and decks.

  • Hull Materials (Hull Materials) : The material used to make the hull. Materials used include fiberglass, steel, wood and composites.

  • Listing Price : The price requested by the seller. The boat may be sold for a different price.

  • Manufacturer (Make) : The manufacturer of the sailboat.

  • Monohull Sailboats : Sailing boats with only one hull, usually centered on a heavy keel (central blade).

  • Rigging : The system of ropes, cables, and pulleys used to support and control the sails, rudder, and other systems of a sailboat.

  • Sail Area : The total surface area of ​​a ship with its sails fully up.

  • Variant (Variant) : Indicates the name of a particular model of sailboat. For example, "Sun Odyssey 54 DS".

2 papers

2.1 Paper 1: Second-Hand Yacht Market Research (27 pages)

With the development of the economy, the second-hand yacht market is booming, but the price varies from region to region. To keep buyers and sellers informed, we use the PLSR-GA-BP model to study price predictions for different yacht sizes and analyze regional validity using parametric tests.

Aiming at problem 1 , firstly, the data is reconstructed from two aspects of area effect and hull property, then the data is cleaned, and the missing data and abnormal data are filled by cubic spline interpolation method; then, the importance of each index is analyzed by PLSR . The index with the greatest impact on monohull yachts is displacement (0.773), the index with the least impact is GDP (0.008), the index with the greatest impact on catamaran yachts is the year of use (0.537), and the index with the least impact is total logistics cost (0.003) ; The PLSR-GA-BP model is used for prediction research, PLSR predicts a large amount of data, and the GA-BP prediction residual sequence corrects a large amount of data. The final performance of the model is: RMSE=0.019, MAPE=0.154, R2=0.844 for monohull yachts; RMSE=0.028, MAPE=0.211, R2=0.837 for catamaran yachts.

In order to solve the second problem , we firstly integrate the data of the two yachts, and then divide them into regions, and analyze the price changes in each region as the variance, and use one-way analysis of variance, the results show that P=0.003<0.05, Explain that there is variability in prices across regions, and then use the Kendall consistency test to analyze the consistency between the four region attributes and price changes, P= 0.000<0.05; this proves that the results are credible, with a consistency coefficient of 0.996, It shows that our regional attribute is the main factor causing the regional price difference, and finally analyzes the possible reasons for the regional price difference; we also study the variability caused by the regional effect on the hull hardware index, and the results show that only the sail area does not have variability , and the remaining five indicators all have variability. These changes are mainly determined by the geographical environment of each region.

For the third question , we collect relevant data from Hong Kong, simulate the regional effect of the Hong Kong market, and screen out yacht models that meet the requirements, such as BavariaCruiser46 (single-hull yacht) and Lagoon450 (catamaran yacht), and bring the corresponding data into PLSR-GA- The BP model is trained, and the fitting curve is shown in Figure 7.2. The test results of BavariaCruiser46 are MAPE=0.188, RMSE=0.026, R2=0.881, and the test results of Lagoon450 are RMSE=0.041, MAPE=0.174, R2=0.904. It can be seen that the test results are very good, which is enough to prove the test of our regional effect analysis The results are very good, which proves that our analysis of area effects is practical enough.

For question 4 , we dug up more information from the intercontinental distribution of orders and the continental distribution of yacht prices. We found that the production of catamarans has increased year by year, while that of monohulls has decreased year by year.
Finally, we performed a sensitivity test on the PLSR-GA-BP model, noise tests on the two most important factors affecting yacht prices, and the results showed that the variation of MAPE and RMSE was less than 10%, so our model is very robust; we then concluded this article in a letter to the Hong Kong Regional Director.

insert image description here

2.2 Paper 2: Research on Sailing Price Prediction Based on Polynomial Regression (26 pages)

Summary

With the gradual popularity of sailing, more and more people come into contact with and fall in love with this sport, and the consumer market of sailing also expands accordingly. How to reasonably price sailboats is a problem that sailboat dealers need to focus on.
This article discusses the influence of sailing year, size, draft, sail area, displacement and other factors of the sailing ship itself, as well as the regional factor of GDP, on the pricing of local sailing ships. First, we collected sailing data from the sailing data network sailboatdata.com, combined with the GDP data of various regions to form a sailing characteristic matrix. Then principal component analysis is used to reduce the dimensionality of the feature matrix. Experiments show that only two principal components are needed to cover 99.8% of the information of all features. Finally, we use the polynomial regression algorithm to train a regression function that can predict the price of sailboats, so as to obtain the relationship between each feature and the price of sailboats, and the prediction accuracy rate reaches 98.4%
. Models of sailboats were established regression models. By comparing the principal component weights and polynomial regression coefficients of the models corresponding to different types of sailboats, we found that the influence of regional factors on different types of sailboats is similar.
By collecting the GDP and selling price data of sailboats in Hong Kong, and applying the same polynomial regression model, we can calculate that our model is also applicable in Hong Kong, and the influence of regional factors on monohulls and catamarans is the same.
Regarding the weights corresponding to the principal components calculated in the model and the original features, we can see that features such as draft and GDP have a considerable impact on the pricing of sailboats, while the impact of the year of manufacture is negligible.

Finally, according to the results of mathematical modeling, we put forward corresponding suggestions to sailing dealers in Hong Kong.

Keywords: sailboat pricing, principal component analysis, polynomial regression

insert image description here

2.3 Paper 3: Second-Hand Sailing Market: Factor Analysis and Pricing Model (35 pages)

The value of a used item is often influenced by a number of factors, and a used sailboat is no exception. The purpose of this paper is to build a sailboat pricing model to evaluate the influence of different factors on the pricing of second-hand sailboats. Studying this issue can provide market participants with more reliable price references, thereby improving the efficiency of transactions in the overall market.

For Factor Analysis Model , we obtain more data from relevant websites and conduct correlation analysis on the data. To explore the relationship between categorical variables and prices, we used ANOVA and the results showed that all categorical variables had a significant effect on price. For continuous variables, we use Pearson correlation analysis, and the results show that some continuous variables are correlated with prices, while others may exhibit non-linear relationships.

For the used sailboat pricing model , we split the data using a five-fold cross-validation method and optimized the model using Bayesian optimization in 8 different models. Based on the superior performance of the extreme random tree (ERT) algorithm in the evaluation, ERT is selected as the secondary sailboat pricing model. Furthermore, we compute feature importances using the Gradient Boosted Decision Tree (GBDT) algorithm and compare it with the ERT algorithm. After the model was established, we calculated the importance of each influencing factor and found that the beam factor had the greatest impact on the price, and the influence of other factors is shown in Table 6. Finally, we calculated the
prediction accuracy of each category and found that there were 276 categories The prediction accuracy is more than 70%.

For the Hong Kong market model , we add Hong Kong to our dataset and retrain the model after one-hot encoding Hong Kong. We then use ANOVA to explore the relationship between geographic regions and listing prices in Hong Kong. The results show that geographic region still has a significant impact on listing prices even after adding Hong Kong data. The impact of Hong Kong on prices is 0.0038, which is smaller than that of the United States and Europe. In the end, we found that Hong Kong has a greater impact on catamaran prices than on monohull prices.

At the same time , we also found other beneficial conclusions. We explored the relationship between the geographic region and listing price of each sailing boat variety through variance analysis, and the statistical results showed that 18% of the brands were significantly related to the region. However, in the overall second-hand sailboat market, this proportion is relatively small, and more than 80% of sailboat brands are not subject to geographical restrictions. Finally, we wrote a report showing the results of data analysis and relevant conclusions to brokers.

Keywords: ANOVA, Gradient Boosted Decision Trees, Extremely Randomized Trees, Representation

insert image description here

3 Obtaining methods

See the bottom of the article, or private message me

https://zhuanlan.zhihu.com/p/631524325

Guess you like

Origin blog.csdn.net/weixin_43935696/article/details/130809365