To generate a synthetic dataset using Faker library for the previous 101 visualization examples, we'll create a Python script that generates random data for the specified columns. Since Faker generates random data, keep in mind that this dataset will be artificial and not representative of any real-world data.
First, make sure you have installed the Faker library. You can install it using pip:
bash code
pip install Faker
Let's generate the dataset with the required columns:
python code
import pandas as pd
import random
from faker import Faker
from datetime import datetime, timedelta
# Set random seed for reproducibility
random.seed(42)
# Initialize Faker and other necessary variables
fake = Faker()
start_date = datetime(2020, 1, 1)
end_date = datetime(2022, 1, 1)
# Create empty lists to store the generated data
order_ids = []
customer_ids = []
product_ids = []
purchase_dates = []
product_categories = []
quantities = []
total_sales = []
genders = []
marital_statuses = []
price_per_unit = []
customer_types = []
ages = [] # New list to store ages
# Number of rows (data points) to generate
num_rows = 10000
# Generate the dataset
for _ in range(num_rows):
order_ids.append(fake.uuid4())
customer_ids.append(fake.uuid4())
product_ids.append(fake.uuid4())
purchase_date = start_date + timedelta(days=random.randint(0,
(end_date - start_date).days))
purchase_dates.append(purchase_date)
product_categories.append(fake.random_element(elements=('Electronics',
'Clothing', 'Books', 'Home', 'Beauty')))
quantities.append(random.randint(1, 10))
total_sales.append(random.uniform(10, 500))
genders.append(fake.random_element(elements=('Male', 'Female')))
# Only 'Male' and 'Female' will be added
marital_statuses.append(fake.random_element(elements=('Single',
'Married', 'Divorced', 'Widowed')))
price_per_unit.append(random.uniform(5, 50))
customer_types.append(fake.random_element(elements=('New
Customer', 'Returning Customer')))
ages.append(random.randint(18, 80)) # Generate random ages
between 18 and 80
# Create a DataFrame from the generated lists
df = pd.DataFrame({
'Order_ID': order_ids,
'Customer_ID': customer_ids,
'Product_ID': product_ids,
'Purchase_Date': purchase_dates,
'Product_Category': product_categories,
'Quantity': quantities,
'Total_Sales': total_sales,
'Gender': genders,
'Marital_Status': marital_statuses,
'Price_Per_Unit': price_per_unit,
'Customer_Type': customer_types,
'Age': ages # Add the 'Age' column to the DataFrame
})
# Save the DataFrame to a CSV file
df.to_csv('ecommerce_sales.csv', index=False)
# Display the first few rows of the generated dataset
print(df.head())
This code will generate a DataFrame with the specified columns 'Order_ID', 'Customer_ID', 'Product_ID', 'Purchase_Date', 'Product_Category', 'Quantity', and 'Total_Sales', etc. You can now use this generated dataset for data visualization and analysis and apply the previous 101 visualization examples on it. Remember that this dataset is synthetic and should only be used for learning or testing purposes. For real-world analysis, it's essential to use genuine and representative data.