Simulating data using Python

Start by launching your Python environment and importing these packages below.

 import pandas as pd
 import numpy as np


Since we are creating a survey dataset on a hypothetical list of brands, it is imperative we incorporate every respondent's halo effect as shown below as a function.

 def generateHalo(mean = 5, stdev = 5, size = 1000): # Generate halo effect for each respondent
  rng = np.random.default_rng(2021) # Set seed only for this function and not globally
  halo = rng.normal(loc = mean, scale = stdev, size = size)
  return halo


Next, we create scores (ratings) for each aspect of a brand.

 def generateScores(halo, scoreRange = (0, 10)): # Generate scores for 1 column
  mean = np.random.uniform(scoreRange[0], scoreRange[1]) # Set mean
  stdev = np.random.uniform(scoreRange[0], scoreRange[1]) # Set standard deviation
  scores = np.random.normal(loc = mean, scale = stdev, size = len(halo))
  scores = np.floor(scores + halo) # Add halo effect and use floor to get integers
  scores = np.clip(scores, scoreRange[0], scoreRange[1]) # Limit values to be within range
  return scores


Then, we create scores (ratings) for all brands.

 def generateBrandSurvey(halo, brands = ['a','b','c','d','e','f','g','h','i','j']): # Generate scores for all brands
  df = pd.DataFrame()
  for i in brands:
  dfBrand = generateBrandScores(halo, brand = i)
  df = pd.concat([df, dfBrand], axis = 0)
  df = df.reset_index(drop = True)
  return df


Finally, we compile these functions and generate the simulated data.

 def main():
  halo = generateHalo()
  df = generateBrandSurvey(halo)
  return df
 if __name__ == '__main__':
  df= main()
 df.head()


Let's see the output...

perform leader latest fun serious bargain value trendy rebuy brand
10.0 10.0 6.0 10.0 5.0 0.0 10.0 10.0 7.0 a
8.0 8.0 5.0 3.0 0.0 0.0 4.0 7.0 7.0 a
4.0 9.0 1.0 0.0 10.0 0.0 3.0 10.0 10.0 a
10.0 10.0 10.0 10.0 0.0 10.0 10.0 10.0 10.0 a
10.0 10.0 6.0 10.0 0.0 9.0 10.0 10.0 10.0 a