Analyzing Airbnb Austin using Data Science

Patrick Bloomingdale
3 min readFeb 8, 2021

For this blog post I will be using the CRoss-Industry Standard Process for Data Mining (CRISP-DM) to analyze AirBnB dataset for Austin Texas.

Detailed analysis with all required code and data can be obtained in my Github repository.

I used Austin Airbnb listing and review datasets available from Airbnb: http://insideairbnb.com/get-the-data.html.

Picture of Downtown Austin

Question 1: What is the average price per month and day of the week?

Average price per month, Austin Texas
Average price per weekday, Austin Texas

Question 1: Answer
1. The average price per month is highest in September and lowest in December.
2. The average price per day is the highest on Friday and Saturday, and
3. The lowest towards the beginning of the week (Monday, Tuesday, Wednesday).

Question 2: What are the top five property types by ratings and number of reviews?

Top five properties by ratings and reviews

Question 2: Answer
Looking at scores between 90 and 100, entire house was rated the highest and had the most reviews, followed by: Entire guesthouse, Entire apartment, Private room in house, and Entire condominium.

Question 3: What is the average review score rating by neighborhood?

Question 3: Answer

The top five neighborhoods with the highest review score rating were Clarksville and Rollingwood at 100%, followed by: Dripping Springs (99.5%), Bee Cave (99.3%), Del Valle (99.0%).

Question 4: What are the factors that influence price and how relevant are they?

To answer this question I created a linear model:

  1. Split data into train and test data sets.
  2. Fit linear model (lm_model) on the X_train and y_train.
  3. Predict using lm_model by providing the linear model the X_test for it to make predictions.

Then I looked at the size of the coefficients in the model to try and show how the different variables impact price.

Question 4: Answer

I looked at 12 different variables that had an influence on price.
The r-squared score for the model was 0.15 on 3092 values. After looking at the weight coefficients I was able to determine that bedrooms, review score data, and beds have an impact on predicting price.

size of the coefficients

--

--