Mimesis
Mimesis
Description
Something that is incredibly valuable in the programming world is being able to have generated data for projects where you need to do testing without the real data. I worked in the Healthcare industry as a Python programmer and one thing that was always a topic was “How do we test our code without working with real data?” and “How do we generate this data?” Fortunately I never had to worry about that data as I was auditing permissions in my code. I did have to write a data generator for that team though. If only I had seen this first… Mimesis is a python library you can use to generate data based on many different types of objects. In this tutorial I will walk through using this library.
Installation
We will use pip to install mimesis. Use virtual environments as needed.
pip install mimesis
Data Providers
All of the data providers we have access to can be found here.
Some ones that pop out to me:
- USASpecProvider - ssn/USPS Tracking Number
- Address - addresses
- Finance - banking data including cryptocurrency
- Datetime - time info
- Food - food info
- Code - ISBNs or IMEIs
Warning on Providers
In the documentation it says to ensure you don’t create many providers. It actually will worsen porformance the more providers there are in a program.
Person Example
Here is some code to create a person provider.
import mimesis
myperson = mimesis.Person()
Now we can call different aspects of the provider to generate data. For example whenever we call the age()
function we will get a different random age.
myperson.age()
Schema
We need to determine what the schema is that we want to emulate. Let’s use the following.
firstname,lastname,address,email
Generate Sample Data
First we need to create an Address provider to generate address.
addr_prov = mimesis.Address()
To generate this data we can run the following.
firstname = myperson.first_name()
lastname = myperson.last_name()
address = addr_prov.address()
email = myperson.email()
Let’s generate a full dataset by looping until we have the number of rows we want. The next bit of code will give use 1000 rows of data!
# set the number of values we want
data_size = 1000
# iterate and create data rows
for i in range(data_size):
firstname = myperson.first_name()
lastname = myperson.last_name()
address = addr_prov.address()
email = myperson.email()
print(f'{firstname},{lastname},{address},{email}')
Scope
This library has incredible promise to help generate tons of sample data for any testing need. Here are the docs to get you started on using it for yourself.
Ensure to read the docs as there are some nuances with some of the providers. Some can be customized in different ways to better suit your needs.