Let’s find out Top-rated Hyderabadi Biryani restaurants !! | Zomato Hyderabad Restaurants EDA
Zomato Hyderabad Restaurants EDA
This is my first ever EDA notebook. Hope you will enjoy & feel free to comment your thoughts or any feedback is welcomed :)
Dataset link: kaggle.com/code/itsraghul/zomato-restaurant..
0. Import
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd
To see multiple outputs for a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
Peeping into the data
df = pd.read_csv(“HyderabadResturants.csv”)
df
df.sort_values(['ratings'], ascending=False, inplace=True)
df
Top rated restaurants in Hyderabad
We see bad data i.e ‘New’ & ‘-’ in ratings column. So, we clean this out.
df[‘ratings’].head(10)
# Checking unique values in the data so that we can have an idea what all other values are non-integers.
df['ratings'].unique()
#Check how many unique values are present.
df['ratings'].value_counts()
Removing rows with ratings as ‘New’ & ‘-’
df = df[df[‘ratings’]!=’New’]
df = df[df[‘ratings’]!=’-’]
We have cleaned a little bit of the data & that would help us get started.
Now we have the ratings in descending order so that we can see top restaurants in Hyderabad
df[‘ratings’].value_counts()
%matplotlib notebook
fig = plt.figure()
sub_plot = fig.add_subplot(1,1,1)
sub_plot.plot(df[‘ratings’].astype(‘str’).value_counts(),’ko-’)
Changing the data type
df[‘ratings’] = pd.to_numeric(df[‘ratings’])
Ratings vs Price.
Are top-rated restaurants pricy?
df[‘ratings’].corr(df[‘price for one’])
Output: 0.022768433064483028
Top rated hotels graph
%matplotlib notebook
fig2 = plt.figure()
sub_plot2 = fig2.add_subplot(1,1,1)
sub_plot2.scatter(df['ratings'].astype('str'),df['price for one'])
Since there is no correlation between them, the graph doesnot have a good shape/pattern.
Plotting top-8 hotels names with ratings
fig3 = plt.figure(figsize=(7,4))
plot1 = fig3.add_subplot(1,1,1)
plot1.plot(df['names'][:8], df['ratings'][:8])
plt.xticks(rotation=10, ha='right')
But we got top restaurants in all cuisines. We see most of them are sweet shops. Let’s see cuisine wise top-rated restaurants
df['cuisine'].value_counts()
There is a problem here. Some values repeating in multiple rows. Finding all unique values by splitting with ‘,’
val = list(df['cuisine'])
unique_cuisines = dict()
for i in val:
tmp = i.split(',')
for j in tmp:
if j not in unique_cuisines:
unique_cuisines[j] = 1
else:
unique_cuisines[j] += 1
# Sort descendingly
unique_cuisines = (sorted(unique_cuisines.items(), key=lambda item: item[1], reverse=True))
unique_cuisines
In unique_cuisines we see 166 ‘Desserts’, that’s the reason we see top-rated restaurants as Sweet shops.
Now, let’s answer the burning question
Top-rated Biryani restaurants!!
To achieve this, we need to filter the cuisine column in our data
biryani = df[df[‘cuisine’].str.contains(pat=’ Biryani’)]
biryani = biryani[df[‘cuisine’].str.contains(pat=’Biryani’)]
biryani
Now we have all the restaurants which serve the famous Hyderabadi Biryani
%matplotlib notebook
fig4 = plt.figure(figsize=(7,4))
plt1 = fig4.add_subplot(1,1,1)
plt.subplots_adjust(bottom=0.3)
plt1.plot(biryani['names'][:10],biryani['ratings'][:10])
plt.xticks(rotation=14)
The drawback about the plot is, we can see only few restaurants. To see all top-rated Biryani restaurants, let’s do this.
biryani = biryani.sort_values(‘ratings’, ascending=False)
biryani
Now you can select your favorite restaurant & have biryani.
You can extend this to top-rated Fast Food, North Indian etc restaurants.
I hope you have gained some insights about Hyderabad’s Restaurants data.
If you have liked the content, please hit the “Clap” button and,
Connect with me — LinkedIn : linkedin.com/in/bomma-pranay GitHub : github.com/Bomma-Pranay
— — By Bomma Pranay A Data Science Enthusiast