Thetazero Pubs

Predicting start rating on Yelp with mulitvariable linear regression

Haris Berbic


Restaurants on Yelp! dataset are frequently revied. Tourists and local check online if a place iw worth eating,

The task is to predict the 'star rating' of a restaurant for a given users average stars. We will use multivariable linear regression for our task.

The dataset comprises three tables that cover 11,537 businesses, 8,282 check-ins, 43,873 users, and 229,907 reviews. The entire dataset as well as details about the dataset are available on the Yelp website.

Question of Interests

Apply the multiple linear regression model for the data set Yelp, and predict the star rating for restaurant.

Why we care:DATA_ADSENSE

  • Predicet stars rating for restarurants can be used to suggest improvments for bussiness.

Methods and data

Some data transformation were applied on datasets to derive a final dataset with usefull features.

  • Original data were merged per user, review and bussiness
  • Only target features were extraced from dataset in order to get tidy dataset
  • Multiple liner regression for predicting stars based on business_avg_stars and business_avg_stars
  • Confidence and prediction intervals were tested
fit <- lm(stars.y ~ user_avg_stars + business_avg_stars , data=reg_data)



  • Satisfied results (low error rates) for model-based approach
  • linar modeling of ther rating base on user's average results are good fit with a mean R2 of 0.2211
  • Model can be improved by adding new features
Copyright © 2016 All Rights Reserved. Privacy Policy