Let’s find out Top-rated Hyderabadi Biryani restaurants !! | Zomato Hyderabad Restaurants EDA

Let’s find out Top-rated Hyderabadi Biryani restaurants !! | Zomato Hyderabad Restaurants EDA

Zomato Hyderabad Restaurants EDA

This is my first ever EDA notebook. Hope you will enjoy & feel free to comment your thoughts or any feedback is welcomed :)

0. Import

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd

To see multiple outputs for a cell

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

Peeping into the data

df = pd.read_csv(“HyderabadResturants.csv”)
df

df.sort_values(['ratings'], ascending=False, inplace=True)
df

Top rated restaurants in Hyderabad

We see bad data i.e ‘New’ & ‘-’ in ratings column. So, we clean this out.
df[‘ratings’].head(10)

# Checking unique values in the data so that we can have an idea what all other values are non-integers.
df['ratings'].unique()

#Check how many unique values are present.
df['ratings'].value_counts()

Removing rows with ratings as ‘New’ & ‘-’

df = df[df[‘ratings’]!=’New’]
df = df[df[‘ratings’]!=’-’]

We have cleaned a little bit of the data & that would help us get started.

Now we have the ratings in descending order so that we can see top restaurants in Hyderabad
df[‘ratings’].value_counts()

%matplotlib notebook
fig = plt.figure()
sub_plot = fig.add_subplot(1,1,1)
sub_plot.plot(df[‘ratings’].astype(‘str’).value_counts(),’ko-’)

Changing the data type

df[‘ratings’] = pd.to_numeric(df[‘ratings’])

Ratings vs Price.

Are top-rated restaurants pricy?

df[‘ratings’].corr(df[‘price for one’])

Output: 0.022768433064483028

Top rated hotels graph

%matplotlib notebook
fig2 = plt.figure()

sub_plot2 = fig2.add_subplot(1,1,1)
sub_plot2.scatter(df['ratings'].astype('str'),df['price for one'])

Since there is no correlation between them, the graph doesnot have a good shape/pattern.

Plotting top-8 hotels names with ratings

fig3 = plt.figure(figsize=(7,4))
plot1 = fig3.add_subplot(1,1,1)
plot1.plot(df['names'][:8], df['ratings'][:8])
plt.xticks(rotation=10, ha='right')

But we got top restaurants in all cuisines. We see most of them are sweet shops. Let’s see cuisine wise top-rated restaurants

df['cuisine'].value_counts()

There is a problem here. Some values repeating in multiple rows. Finding all unique values by splitting with ‘,’

val = list(df['cuisine'])
unique_cuisines = dict()
for i in val:
tmp = i.split(',')
for j in tmp:
if j not in unique_cuisines:
unique_cuisines[j] = 1
else:
unique_cuisines[j] += 1
# Sort descendingly
unique_cuisines = (sorted(unique_cuisines.items(), key=lambda item: item[1], reverse=True))
unique_cuisines

In unique_cuisines we see 166 ‘Desserts’, that’s the reason we see top-rated restaurants as Sweet shops.

Now, let’s answer the burning question

Top-rated Biryani restaurants!!

To achieve this, we need to filter the cuisine column in our data

biryani = df[df[‘cuisine’].str.contains(pat=’ Biryani’)]
biryani = biryani[df[‘cuisine’].str.contains(pat=’Biryani’)]
biryani

Now we have all the restaurants which serve the famous Hyderabadi Biryani

%matplotlib notebook
fig4 = plt.figure(figsize=(7,4))

plt1 = fig4.add_subplot(1,1,1)
plt.subplots_adjust(bottom=0.3)

plt1.plot(biryani['names'][:10],biryani['ratings'][:10])
plt.xticks(rotation=14)

The drawback about the plot is, we can see only few restaurants. To see all top-rated Biryani restaurants, let’s do this.

biryani = biryani.sort_values(‘ratings’, ascending=False)
biryani

Now you can select your favorite restaurant & have biryani.

You can extend this to top-rated Fast Food, North Indian etc restaurants.

I hope you have gained some insights about Hyderabad’s Restaurants data.

If you have liked the content, please hit the “Clap” button and,

Connect with me — LinkedIn : linkedin.com/in/bomma-pranay GitHub : github.com/Bomma-Pranay

— — By Bomma Pranay A Data Science Enthusiast