Mimesis

Description

Something that is incredibly valuable in the programming world is being able to have generated data for projects where you need to do testing without the real data. I worked in the Healthcare industry as a Python programmer and one thing that was always a topic was “How do we test our code without working with real data?” and “How do we generate this data?” Fortunately I never had to worry about that data as I was auditing permissions in my code. I did have to write a data generator for that team though. If only I had seen this first… Mimesis is a python library you can use to generate data based on many different types of objects. In this tutorial I will walk through using this library.

Installation

We will use pip to install mimesis. Use virtual environments as needed.

pip install mimesis

Data Providers

All of the data providers we have access to can be found here.
Some ones that pop out to me:

  • USASpecProvider - ssn/USPS Tracking Number
  • Address - addresses
  • Finance - banking data including cryptocurrency
  • Datetime - time info
  • Food - food info
  • Code - ISBNs or IMEIs

Warning on Providers

In the documentation it says to ensure you don’t create many providers. It actually will worsen porformance the more providers there are in a program.

Person Example

Here is some code to create a person provider.

import mimesis

myperson = mimesis.Person()

Now we can call different aspects of the provider to generate data. For example whenever we call the age() function we will get a different random age.

myperson.age()

Schema

We need to determine what the schema is that we want to emulate. Let’s use the following.

firstname,lastname,address,email

Generate Sample Data

First we need to create an Address provider to generate address.

addr_prov = mimesis.Address()

To generate this data we can run the following.

firstname = myperson.first_name()
lastname = myperson.last_name()
address = addr_prov.address()
email = myperson.email()

Let’s generate a full dataset by looping until we have the number of rows we want. The next bit of code will give use 1000 rows of data!


# set the number of values we want
data_size = 1000

# iterate and create data rows
for i in range(data_size):
    firstname = myperson.first_name()
    lastname = myperson.last_name()
    address = addr_prov.address()
    email = myperson.email()
    print(f'{firstname},{lastname},{address},{email}')

Scope

This library has incredible promise to help generate tons of sample data for any testing need. Here are the docs to get you started on using it for yourself.

Ensure to read the docs as there are some nuances with some of the providers. Some can be customized in different ways to better suit your needs.