lk-geimfari / mimesis Goto Github PK

Mimesis is a powerful Python library that empowers developers to generate massive amounts of synthetic data efficiently.

Home Page: https://mimesis.name

License: MIT License

Python 99.77% Shell 0.14% Makefile 0.09%

mimesis fake data generator fixtures dummy schema testing python json-generator

mimesis's Introduction

Mimesis: The Fake Data Generator

Documentation: https://mimesis.name/

Mimesis (/mɪˈmiːsɪs) is a robust data generator for Python that can produce a wide range of fake data in various languages.

The key features are:

Multilingual: Supports 35 different locales.
Extensibility: Supports custom data providers and custom field handlers.
Ease of use: Features a simple design and clear documentation for straightforward data generation.
Performance: Widely recognized as the fastest data generator among Python solutions.
Data variety: Includes various data providers designed for different use cases.
Schema-based generators: Offers schema-based data generators to effortlessly produce data of any complexity.
Intuitive: Great editor support. Fully typed, thus autocompletion almost everywhere.

Installation

To install mimesis, use pip:

~ pip install mimesis

To work with Mimesis on Python versions 3.8 and 3.9, the final compatible version is Mimesis 11.1.0. Install this specific version to ensure compatibility.

Documentation

You can find the complete documentation on the Read the Docs.

It is divided into several sections:

You can improve it by sending pull requests to this repository.

Usage

The library is exceptionally user-friendly, and it only requires you to import a Data Provider object that corresponds to the desired data type.

For instance, the Person provider can be imported to access personal information, including name, surname, email, and other related fields:

from mimesis import Person
from mimesis.locales import Locale

person = Person(Locale.EN)

person.full_name()
# Output: 'Brande Sears'

person.email(domains=['example.com'])
# Output: '[email protected]'

person.email(domains=['mimesis.name'], unique=True)
# Output: '[email protected]'

person.telephone(mask='1-4##-8##-5##3')
# Output: '1-436-896-5213'

License

Mimesis is licensed under the MIT License. See LICENSE for more information.

mimesis's People

Contributors

Stargazers

Watchers

Forkers

mrcrilly prabhath6 techscientist wkryst ravidey7 bderusha costava arpit1997 battleroid jackmcmorrow akashyssboddeda sambuddhabasu mlterpstra92 mirelsol meowterspace casvandongen gschizas offermann o-bender martini97 mrasskazov xsavikx brigadier senthilnayagam rgordeev mipaaa pconcepcion cl0ne ariestiyansyah bohorqux baiduinc modulexcite mannpy cunnainiuhaohe aburgd wikkiewikkie jasonwaiting-dev shibli049 drpoggi redus yn-coder aybb pombredanne lyleh el strogo maximillian2 paulwaltersdev benjixx gulzaar askras czfmuyu jlwt90 aijikl axce1 sas-fe bbb1991 ronak15 stasonhub juxhindb rodnandes tytarenko nyimbi gholmes k4nar aleksandergondek sharop leonpalafox nilopc-python cclauss blakev hhy5277 arduinoboy99 wonjin911 awesome-python qzane vault-the roaet tsimpdim faheel kounoike edjroz viktortat marcosvafg zefifi thammk mrtosz zamasharik simba3447 ramusbucket gnubyte ar4s romcheg n140191 dhdavvie anurag-ks ninoninkovic chloeann rahulkmr1 emmeowzing

mimesis's Issues

Add json-minifer

@caspian-seagull It would be great if you can write gulp-task that will minify all *.json files in data/*locale*/. All dist files will be saved in release/data in root of elizabeth.

Date and time formats are missing for ko and jp locales.

In data/ko/datetime.json and data/jp/datetime.json. Example from en locale:

"formats": { "date": "%m/%d/%Y", "time": "%H:%M:%S" },

Add support of custom providers.

We need to realize something like this:

>>> from elizabeth import Generic

>>> generic = Generic('en')

>>> class SomeProvider():
        def hello(self):
            return "Hello!"

>>> class Another():
        def bye(self):
            return "Bye!"

>>> generic.add_provider(SomeProvider)
>>> generic.add_provider(Another)

>>> generic.someprovider.hello()
>>> generic.another.bye()
# Hello!
# Bye!

Add abbreviation to State field

Add option for abbreviation of state name i.e. something like:

address.state(abbrev=True)

to return 'WA' for state, vs. 'Washington'.

Probably want to do something similar for states/provinces in other countries?

"Big list of naughty strings" support

I really like the "Big naughty strings" collection. Is it possible to support them?

To generate bad input from user.

Proposal - store generated object field value to generate depended fields

This is a proposal.

Right now I type

from elizabeth import Personal
p = Personal('en')
print( p.age() )
print( p.age() )

And got output

25
40

Because age is generated by request and doesn't store in object p. What if I want to add the field child_count or work experience, depend on previously generated age value?

Add Title/Prefix, Suffix options to Personal

As an example, have something like person.title() that would randomly pull from values such as 'Dr.', 'Sir', 'Honorable' (or ' '), or person.prefix(gender=) that returns 'Mr.', 'Mrs.', 'Ms.' as appropriate.

Also, similar for suffix to the surname... either person.surname(suffix=True) or person.suffix() to return values such as 'Sr', 'Jr', 'III', 'PhD', etc.

Add department to Business

Add department() to Business().

Example:

>>> from elizabeth import Business

>>> business = Business('en')
>>> business.department()
'Sports & Outdoors'

Add the ability to test all locales at once.

We need test all locales at one moment without manual changes file tests.py. Because by default tests.py will test only en locale. If we want to check other locales then we need manually change value of LANG in file tests.py and it's not good. One of the best solution is a pytest.fixture

So if anyone can help us with this problem, please let me know. Thanks!

Issue with side-panel in docs.

Look here, please.

Python2 support

As we discussed earlier adding legacy python version support is not much of an effort.
The main difficulty is the manual labour to ensure every .decode() and .encode() calls are in place.

However, there are several questions to answer:

Is python2 support even needed?
How important it is?
How should it be tested?

What do you think?

Add str to providers.

How it will look:

>>> from elizabeth import Personal
>>> p = Personal('pt-br')
>>> p
'Personal:pt-br:Brazilian Portuguese'

>>> Personal('en-gb')
'Personal:en-gb:British English'

Example available here

Support for Declension

Support for declension. This is very usefull for russian and some other languages.

I can suggest some examples for russian, if it would be helpfull.

For example from address.json

"suffix": [
      "Аллея",
      "ул."
    ]

If I'll add the бульвар (Boulevard) to suffix list, then Авангардная from streets list will be incorrect - it should be Авангардный.

To update README.md

TODO:

Refactoring
Beautify
Delete excess things

Add range to Datetime

Please add a feature to Datetime so as to be able to generate dates within a given range e.g. for birthdates, where dates that are too old or too young aren't terribly useful.

Possibly merge some ideas from radar (https://github.com/barseghyanartur/radar) ...which then with some help from str() can generate dates suitable for using in a test DB such as sqlite:

>>> str(radar.random_date(start='1960-01-01', stop='2000-12-31'))
'1985-11-28'

Test Path in MS Windows.

Today has been added class Path that provides methods and property for generate the dummy paths. I tested it only on Linux. It would be great if someone can run all tests on MS Windows.

Usage

>>> from elizabeth import Path

>>> path = Path()

>>> path.root
/
>>> path.home
/home/

>>> path.user(gender='female')
/home/mariko

>>> path.users_folder(user_gender='male')
/home/john/Documents

>>> path.dev_dir()
/home/fidelia/Development/Erlang

# etc.

Update docs/guide.rst

With version 0.1.9, there have been many changes therefore we need to update guide.

Compatibility issue with utils.download_image()

I am trying to write a test to cover situations where unverified_ctx is true and the build is failing on 3.3 and 3.4 with the following error:

E AttributeError: 'module' object has no attribute '_create_unverified_context'

I did some research and discovered that ssl._create_unverified_context was renamed from ssl._create_stdlib_context in 3.4, but does not exist in 3.3.

Changing _create_unverified_context to _create_stdlib_context will allow the build to pass on 3.4, but it will still fail on 3.3.

See PEP 476 for info on ssl._create_unverified_context and ssl._create_stdlib_context.

Question: Subclassing a locale

I thought about creating a de-ch locale. And, as de-ch is just a special version of de, it would be nice to subclass the de locale and replace only the needed fields instead of copy the whole data and maintain a separate data set. Is this possible?

pip installation error

Collecting elizabeth
  Using cached elizabeth-0.3.15.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-5xf3_d/elizabeth/setup.py", line 3, in <module>
        from elizabeth import __version__, \
      File "elizabeth/__init__.py", line 25, in <module>
        from elizabeth.core import *
      File "elizabeth/core/__init__.py", line 1, in <module>
        from elizabeth.core.providers import (
      File "elizabeth/core/providers.py", line 34, in <module>
        from elizabeth.core import interdata as common
      File "elizabeth/core/interdata/__init__.py", line 16
    SyntaxError: Non-ASCII character '\xd0' in file elizabeth/core/interdata/__init__.py on line 16, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-5xf3_d/elizabeth/

I get this error when i try install elizabeth both in venv or global.

test_internet: dubious regexp

tests/test_data/test_internet.py includes the following regexp fragment:

[$-_@.&+]

But $-_ is a character range that includes digits, uppercase letter and a bunch of punctuation characters.
You probably wanted this instead:

[$_@.&+-]

The dubious regexp was found using pydiatra.

Images

Could eliz generate the random images? I do not found any (only personal.avatar linking).

Are you have a such plans?

Address.street_address() not working

Trying to use church, ran into a snag with the following:

>>> from church import Address

>>> address = Address('en')

>>> address.street_address()
Traceback (most recent call last):
  File "<input>", line 1, in <module>
AttributeError: 'Address' object has no attribute 'street_address'

...which seems pretty much identical to what is shown here (http://church.readthedocs.io/en/latest/guide.html#address):

address = Address('en')
...
# Get a random address.
#786 Clinton Lane
street_address = address.street_address()

Thanks!

Problem during installation

Error message:

$ pip install elizabeth
Collecting elizabeth
  Using cached elizabeth-0.3.11.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-lElGl2/elizabeth/setup.py", line 3, in <module>
        import elizabeth
      File "elizabeth/__init__.py", line 27, in <module>
        from elizabeth.core import *
      File "elizabeth/core/__init__.py", line 1, in <module>
        from .elizabeth import (
      File "elizabeth/core/elizabeth.py", line 38, in <module>
        from . import interdata as common
      File "elizabeth/core/interdata/__init__.py", line 5, in <module>
        from .code import *
      File "elizabeth/core/interdata/code.py", line 56
    SyntaxError: Non-ASCII character '\xc4' in file elizabeth/core/interdata/code.py on line 56, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

I suggest encoding: utf-8 was forgotten in comment at first line.

Update documentation.

Documentation is outdated. We should fix it immediately.

Add image downloader

If we want to save avatars on our local machine, then we should have that opportunity.

It will looks like:

>>> from elizabeth import Personal
>>> from elizabeth.utils import download_image

>>> p = Personal('en')
>>> avatar_url = p.avatar()
>>> avatar = download_image(avatar_url, save_path)

Add middle_name or patronymic_name.

So... We need middle_name or patronymic_name in Personal() or in builtins providers.

Problems with unicode on Windows.

JeStoneDev from a habr has following:

Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:01:18) [MSC v.1900 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from elizabeth import Personal
>>> user = Personal('is')
>>> for _ in range(0, 9):
...     print(user.full_name(gender='male'))
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "C:\Users\mainj\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xf0' in position 5: character maps to <undefined>

So, if your machine on Windows 10 then, please, try to fix it.

Data which must be added

We need to add all data to version 0.4.0, namely:

nl/text.json:88:	"Test"

is/personal.json:4491:	"Test"

cs/food.json:3:	"Test"
cs/food.json:6:	"Test"
cs/food.json:9:	"Test"
cs/food.json:12:	"Test"
cs/food.json:15:	"Test"
cs/personal.json:3:	"Test"
cs/personal.json:6:	"Test"
cs/personal.json:1992:	"Test"
cs/personal.json:1995:	"Test"
cs/personal.json:1998:	"Test"
cs/personal.json:2001:	"Test"
cs/personal.json:2351:	"Test"
cs/personal.json:2354:	"Test"
cs/personal.json:2359:	"Test"
cs/personal.json:2362:	"Test"
cs/personal.json:2398:	"Test"
cs/text.json:119:	"Test"
cs/text.json:122:	"Test"
cs/text.json:126:	"Test"
cs/science.json:3:	"Test"

da/personal.json:12: "Test"
da/text.json:93:         "Test"
da/text.json:96:	 "Test"
da/text.json:100:	 "Test"
da/science.json:3:    "Test"

pl/personal.json:8316:	"Test"
pl/text.json:99:	"Test"

es/personal.json:886:	"Test"
es/personal.json:1171:	"Test"
es/personal.json:1174:	"Test"
es/personal.json:1179:	"Test"
es/personal.json:1182:	"Test"
es/text.json:98:	       "Test"
es/address.json:382:      "Test"
es/science.json:3:          "Test",
es/science.json:4:          "Test"

If you see your locale in list, then please, help us. It's really important to add all these data.

Check correctness of all data for all locales.

We have the support of 33 languages and it would be great if native-speakers of one's will check the correctness of data for his own language.

For example. I'm Russian and I'm sure of the correctness of the data for this language. But we also want to be sure of the correctness of German (de, de-ch), Italian (it) and other languages.

Checked locales:

chinese suport

hello,can suport chinese in other locals?

Typo in german locale

https://github.com/lk-geimfari/church/blob/master/church/data/de/company#L7

should be "Deutsche Telekom" see https://en.wikipedia.org/wiki/Deutsche_Telekom

Are these entries added manually or is there maybe a bug in your retrieval system?

Required docs [Sphinx]

Very required the documentation, because at this moment we have only small guidebook.

Add a generator of numbers.

Floats
Integers

Refusal of unstructured data storage.

I think that text files as a storage is not a better solution. We can use JSON for structured data storage.

Example:

.
├── personal.json
├── business.json
├── datetime.json
├── food.json
├── address.json
├── science.json
├── text.json

We need to jsonify all data for all locales. @sobolevn what do you think about this idea?

Add customization for international data.

So, for example if we want generate email with custom domains.

>>> def email(gender, *args):
       # ...

>>> for i in range(0, 4):
       Personal('en').email()

'[email protected]'
'[email protected]'
# ...

>>> domains = ["@example.com"]

>>> for i in range(0, 5):
        Personal('en').email(gender="female", domains)

'[email protected]'
'[email protected]'
# ...

I.e user can generate email with domains which he want.

Run tests on macOS

I think that it's should work without problems, but i need make sure.

Attempting install, getting SyntaxError Non-ASCII character error

On Mac OSX 10.12.2 using zsh, I run:

$ pip install elizabeth

And get the following error message:

Collecting elizabeth
  Using cached elizabeth-0.3.4.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/vg/c9ncn1fs5xzdqccf9mxf24vr0000gn/T/pip-build-lnl_QD/elizabeth/setup.py", line 3, in <module>
        import elizabeth
      File "elizabeth/__init__.py", line 10, in <module>
        from elizabeth.core import *
      File "elizabeth/core/__init__.py", line 1, in <module>
        from .elizabeth import (
      File "elizabeth/core/elizabeth.py", line 35, in <module>
        from . import interdata as common
      File "elizabeth/core/interdata.py", line 720
    SyntaxError: Non-ASCII character '\xe2' in file elizabeth/core/interdata.py on line 720, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/vg/c9ncn1fs5xzdqccf9mxf24vr0000gn/T/pip-build-lnl_QD/elizabeth/

Rewrite tests using pytest

Our current tests are too complicated and we need to fix it. I suggest rewrite tests using pytest testing framework.

If someone has skills with pytest then i would like to hear where we can start.

Wrong display of a french address

>>> from church import Address
>>> addr = Address('fr')
>>> addr.address()
'371 Bezout Rue du'

The output should be:
'371 Rue du Bezout'

Auto fetch science.json\article from Wikipedia

science.json\article is a list of some articles from Wikipedia.

You can fetch it's automaticaly from Wiki Data by group, for example, in Elizabeth generating session.

Is it idea?

Update logo.

It's done!

Add builtins specific data providers.

Every language has specific data that suit only for ones. For example SSN for en (USA) or CPF for pt-br. CPF can be useful only for brazilians.

If user want to use this providers then he must be imported explicitly.

Here's how it will look:

>>> from elizabeth import Generic
>>> from elizabeth.builtins import Brazil

>>> generic = Generic('pt-br')

>>> class BrazilProvider(Brazil):
        class Meta:
            name = "brazil_provider"

>>> generic.add_provider(BrazilProvider)
>>> generic.brazil_provider.cpf()
'001.137.297-40'

Remove useless methods from the providers.

So, we need to find and remove useless methods.

For example:
I once added the names of scientists to science.json, but now I'm not sure that these data can be useful.
So, what do you think about it?