Akbar's Portfolio

Simulating data using Python

Start by launching your Python environment and importing these packages below.

import pandas as pd
import numpy as np

Since we are creating a survey dataset on a hypothetical list of brands, it is imperative we incorporate every respondent's halo effect as shown below as a function.

def generateHalo(mean = 5, stdev = 5, size = 1000): # Generate halo effect for each respondent
rng = np.random.default_rng(2021) # Set seed only for this function and not globally
halo = rng.normal(loc = mean, scale = stdev, size = size)
return halo

Next, we create scores (ratings) for each aspect of a brand.

def generateScores(halo, scoreRange = (0, 10)): # Generate scores for 1 column
mean = np.random.uniform(scoreRange[0], scoreRange[1]) # Set mean
stdev = np.random.uniform(scoreRange[0], scoreRange[1]) # Set standard deviation
scores = np.random.normal(loc = mean, scale = stdev, size = len(halo))
scores = np.floor(scores + halo) # Add halo effect and use floor to get integers
scores = np.clip(scores, scoreRange[0], scoreRange[1]) # Limit values to be within range
return scores

Then, we create scores (ratings) for all brands.

def generateBrandSurvey(halo, brands = ['a','b','c','d','e','f','g','h','i','j']): # Generate scores for all brands
df = pd.DataFrame()
for i in brands:
dfBrand = generateBrandScores(halo, brand = i)
df = pd.concat([df, dfBrand], axis = 0)
df = df.reset_index(drop = True)
return df

Finally, we compile these functions and generate the simulated data.

def main():
halo = generateHalo()
df = generateBrandSurvey(halo)
return df
if __name__ == '__main__':
df= main()
df.head()

Let's see the output...

perform	leader	latest	fun	serious	bargain	value	trendy	rebuy	brand
10.0	10.0	6.0	10.0	5.0	0.0	10.0	10.0	7.0	a
8.0	8.0	5.0	3.0	0.0	0.0	4.0	7.0	7.0	a
4.0	9.0	1.0	0.0	10.0	0.0	3.0	10.0	10.0	a
10.0	10.0	10.0	10.0	0.0	10.0	10.0	10.0	10.0	a
10.0	10.0	6.0	10.0	0.0	9.0	10.0	10.0	10.0	a