joke2k / faker Goto Github PK
View Code? Open in Web Editor NEWFaker is a Python package that generates fake data for you.
Home Page: https://faker.readthedocs.io
License: MIT License
Faker is a Python package that generates fake data for you.
Home Page: https://faker.readthedocs.io
License: MIT License
I am currently using a wrapper for fake-factory to be able to choose the output but it would be great it would become part of fake-factory core.
This is the script i have in my path: https://gist.github.com/makefu/9101269
usage:
$ LANG=de_DE.utf-8 faker address
Davide-Kaul-Weg 175
94892 Königs Wusterhausen
faker has so many great new changes in git, I think you guys should release all of them onto pypi soon, perhaps after pulling in the pull request with the docs.
The provider data and provider logic are pretty tightly intertwined.
It'd be nice if they were separated out--then it'd be a lot easier to port some of the other provider lists out there.
For example, look at how ForgeryPy structures the data separate from the logic--ForgeryPy dictionaries are the equivalent of Faker's Providers: https://github.com/tomekwojcik/ForgeryPy/tree/master/forgery_py/dictionaries
He's got a generic loader that kicks in when a custom function isn't defined for a provider.
That project seems relatively abandoned, so it'd be nice to pull that clean functionality into this project.
It'd also probably make it easier for people to localize their providers because they just change the data files without having to think about the attached python code.
I wanted some fake time series data for a project and couldn't find anything suitable for my needs.
Is something like this in the scope of this project ?
See http://cbsg.sourceforge.net/cgi-bin/live for an example.
"miscelleneous" is not a word. It should be "miscellaneous".
Minor problem, but inconvenient when integrating in apps that follow PEP8 more closely.
Last release was in March. Perhaps a new release would be in order to make people using pypi get it as well? Would be appreciated! Keeps the ecosystem going and all that.
Executing pip install fake-factory leads to:
http://pastebin.com/Vy9erGF0
Windows 7 x64, python 2.7.4, pip 1.2.1
Running the tests fail on my Xubuntu 14.04 virtual machine (32-bit with Python 2.7.6) due to a ValueError: timestamp out of range for platform time_t
in L246 of faker/providers/date_time.py
; see below for the output:
$ python setup.py test
running test
running egg_info
writing dependency_links to fake_factory.egg-info/dependency_links.txt
writing fake_factory.egg-info/PKG-INFO
writing top-level names to fake_factory.egg-info/top_level.txt
reading manifest file 'fake_factory.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'fake_factory.egg-info/SOURCES.txt'
running build_ext
test_add_provider_gives_priority_to_newly_added_provider (faker.tests.FactoryTestCase) ... ok
test_command (faker.tests.FactoryTestCase) ... 6588 Shasta Locks
South Tamikaville, CO 72509-4971
ok
test_documentor (faker.tests.FactoryTestCase) ... ERROR
test_format_calls_formatter_on_provider (faker.tests.FactoryTestCase) ... ok
test_format_transfers_arguments_to_formatter (faker.tests.FactoryTestCase) ... ok
test_get_formatter_returns_callable (faker.tests.FactoryTestCase) ... ok
test_get_formatter_returns_correct_formatter (faker.tests.FactoryTestCase) ... ok
test_get_formatter_throws_exception_on_incorrect_formatter (faker.tests.FactoryTestCase) ... ok
test_magic_call_calls_format (faker.tests.FactoryTestCase) ... ok
test_magic_call_calls_format_with_arguments (faker.tests.FactoryTestCase) ... ok
test_parse_returns_same_string_when_it_contains_no_curly_braces (faker.tests.FactoryTestCase) ... ok
test_parse_returns_string_with_tokens_replaced_by_formatters (faker.tests.FactoryTestCase) ... ok
======================================================================
ERROR: test_documentor (faker.tests.FactoryTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/mdxs/dev/gh/faker/faker/tests.py", line 65, in test_documentor
print_doc()
File "/home/mdxs/dev/gh/faker/faker/cli.py", line 77, in print_doc
formatters = doc.get_formatters(with_args=True, with_defaults=True)
File "/home/mdxs/dev/gh/faker/faker/documentor.py", line 28, in get_formatters
(provider, self.get_provider_formatters(provider, **kwargs))
File "/home/mdxs/dev/gh/faker/faker/documentor.py", line 78, in get_provider_formatters
example = self.generator.format(name)
File "/home/mdxs/dev/gh/faker/faker/generator.py", line 56, in format
return self.get_formatter(formatter)(*args, **kwargs)
File "/home/mdxs/dev/gh/faker/faker/providers/date_time.py", line 246, in date_time_ad
return datetime.fromtimestamp(random.randint(-62135600400, int(time())))
ValueError: timestamp out of range for platform time_t
----------------------------------------------------------------------
Ran 12 tests in 0.201s
FAILED (errors=1)
Why not to store all data in files?
See person.py in https://gist.github.com/kissarat/755a2d39546dc828ae37
You may use dump.py to make it easy
If you want to store in this way I would convert the project
There are no need to implement child classes if no specifics in implementation. May be data files only
Hi,
We're using this currently in our tests to generate test data. However we'd also like to use it to generate sample HTML pages (blog posts - for example). For this it would be great if faker could have an image provider (or maybe a file provider as a lower level).
Would you be averse to this idea? If not I'm more than happy to work on the provider and submit a pull request.
Cheers,
Ben
I think that the current way of documenting everything on Github only doesn't scale very well. I suggest you put the docs onto readthedocs.
.prefix
(and .suffix
) can occasionally return a tuple of values instead of a single value when prefixes_male
and prefixes_female
(or suffixes_*
) are present in the provider.
See here for the code responsible.
I wasn't sure if this was intentional (it's documented to do so -- then again, the documentation is autogenerated, isn't it?), so I didn't make a PR yet, but it's certainly counterintuitive.
How do I submit a new provider for inclusion into a future build?
With system python:
☄ python --version
Python 2.7.7
☄ which faker
/usr/local/bin/faker
☄ python -c "import faker; print faker.VERSION"
0.4.2
☄ faker address -r 10
PSC 7159, Box 2889
APO AP 50457
PSC 5924, Box 3842
APO AA 79576-2701
PSC 4394, Box 0547
APO AA 13834-3973
PSC 1353, Box 2874
APO AE 17295
PSC 8492, Box 6715
APO AE 89299-8347
PSC 0676, Box 5745
APO AA 45384
PSC 7082, Box 0817
APO AE 39616
PSC 9015, Box 5179
APO AP 79298
PSC 3885, Box 3107
APO AA 97447
PSC 3078, Box 3599
APO AE 16713-0587
In virtualenv:
☄ python --version
Python 3.4.1
☄ which faker
/Users/kyl/Code/Playground/faker/.venv/bin/faker
☄ python -c "import faker; print(faker.VERSION)"
0.4.2
☄ faker address -r 10
94283 Jewell Shoal Suite 192
West Cade, TN 16897-7888
93143 Runolfsdottir Summit Suite 471
Lilliamouth, KS 80170-8892
PSC 5138, Box 8808
APO AE 12600-9380
787 Rohan Drive Apt. 652
Port Ebertport, FL 84541-9565
12609 Gulgowski Club
Waelchihaven, VT 93071
Unit 6204 Box 4740
DPO AA 61620-2499
0791 Daxton Avenue
Chaneltown, TN 87248-1822
6046 Emard Camp
Lennyborough, FM 79310
83026 Kane Shore
Lake Casie, SD 63881-1429
881 Davis Walks Suite 491
McKenziehaven, TX 35051-3973
In the using from shell section of the docs, I understand how to display the result of a fake. There is an example:
$ python -m faker address
However, it is not clear to me how to give a provider's name, for example 'Lorem' (should that be lowercase 'lorem'?), and display all of the provider's fakes. It would be good if there was an example provided.
As I can see, fake.first_name()
can return either a male or female first name. Do you plan to make a difference between them? Like fake.first_name(gender='male')
, where the default value could be 'any'
.
I ask it because I want to add support for Hungarian names. I have an up-to-date list with all the Hungarian names, put in two files: males and females. I could put them in two sets, or I could add them in one set.
It would be useful to have a job provider together with the company provider. If anyone could point me to a good list, i would work on it.
I got the pip install to finish but python wont recognize anything from the faker library upon use.
Hello, I noticed in faker/Providers/De_de/internet.py in the _to_ascii method, the capital O is missing an umlaut.
It should be: ('Ö', 'Oe')
Currently:
replacements = (
('ä', 'ae'), ('Ä', 'Ae'),
('ö', 'oe'), ('O', 'Oe'),
('ü', 'ue'), ('Ü', 'Ue'),
('ß', 'ss')
It would be great that if faker was initialized with only a locale and no territory, that it would use a sensible default.
For example I currently have to do the following if using something such as "en" instead of "en_US".
from faker import Factory
from faker import AVAILABLE_LOCALES
locale = 'en'
if locale not in AVAILABLE_LOCALES:
locale = next(l for l in AVAILABLE_LOCALES if l.startswith(locale))
factory = Factory.create(locale)
This happens when using dynamic mock data in local development where django sets the locale to "en" because we do not define territories.
Currently, every time a provider is added, we need to update the lists in __init__
.
This is error-prone and it would be more sustainable if we could discover providers automatically.
I wanted to add a method to BaseProvider that allows for sampling n unique elements.
There are situations in which I want to grab several random things, but I want those results to be unique. I just forked and added this to my own fork, but I wanted to run it by you before making a pull request.
# in faker/provides/__init__.py BaseProvider
@classmethod
def random_sample(cls, array=('a','b','c'), number=2):
""" Returns $number unique elements from $array"""
return random.sample(array, number)
I may be missing something but I don't think faker spits out random genders in the person provider. While trivial to write, I think this should still be included in faker.
It would be useful to have a parameter that would disallow a set of characters from a provider's output.
# Don't use outputs that have /, %, or &
fake.bs(disallowed_characters=['/', '%', '&'])
The use case I ran into was that we needed fake strings that could safely be put into URIs and therefore cannot contain /
.
Thoughts on this?
This is what a user sees on https://pypi.python.org/pypi/fake-factory:
This is caused by PyPI not understanding Markdown.
You can use pandoc to convert Markdown to ReStructuredText that PyPI understands.
fake.timezone() sometimes throws an exception, possibly when a country doesn't have any timezones defined:
>>> from faker import Faker
>>> f = Faker()
>>> f.timezone()
'Africa/Mogadishu'
>>> f.timezone()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/vagrant/.python/lib/python3.3/site-packages/faker/providers/date_time.py", line 378, in timezone
return cls.random_element(cls.countries)['timezones'].pop(0)
This is with Python 3.3 using fake-factory 0.4.0 from pypi.
Factory Boy provides easy replacement for fixtures. It allows for an easy definition of factories, various build factories, factory inheritance etc.
It has a FuzzyAttribute mechanism which suites perfectly for faker.
see #106 (diff)
I got this idea but i'm not sure it would be the simplest: the actual profile.py becomes something like "internal_profile.py", its methods are renamed "internal_simple_profile()" and "internal_profile()", and is removed from the list of standard providers. Then we will have a standard profile.py that simply calls self.generator.internal_profile(). For each locale instead, we will be able to add more logic, for example to customize field names and eventually values.
Do you think there would be a simpler way to do it?
I'm having problems to install Faker 0.4.2
on Python 3.4.2
:
$ pip install fake-factory
Collecting fake-factory
Using cached fake-factory-0.4.2.tar.gz
Traceback (most recent call last):
File "<string>", line 20, in <module>
File "/private/var/folders/98/hxvgjtd93ql1s1c4695y6w2h0000gq/T/pip-build-e5pfmuys/fake-factory/setup.py", line 9, in <module>
NEWS = open(os.path.join(here, 'CHANGELOG.rst')).read()
File "/Users/pedro.teixeira/.virtualenvs/cave/bin/../lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 100: ordinal not in range(128)
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 20, in <module>
File "/private/var/folders/98/hxvgjtd93ql1s1c4695y6w2h0000gq/T/pip-build-e5pfmuys/fake-factory/setup.py", line 9, in <module>
NEWS = open(os.path.join(here, 'CHANGELOG.rst')).read()
File "/Users/pedro.teixeira/.virtualenvs/cave/bin/../lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 100: ordinal not in range(128)
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/98/hxvgjtd93ql1s1c4695y6w2h0000gq/T/pip-build-e5pfmuys/fake-factory
Downloading/unpacking fake-factory from https://pypi.python.org/packages/source/f/fake-factory/fake-factory-0.4.1.tar.gz#md5=27ac002a6f3a4b46d8996b5ef6ad5a7c
Downloading fake-factory-0.4.1.tar.gz (306kB): 306kB downloaded
Running setup.py egg_info for package fake-factory
Traceback (most recent call last):
File "<string>", line 16, in <module>
File "/Users/gkisel/.virtualenvs/faker/build/fake-factory/setup.py", line 9, in <module>
NEWS = open(os.path.join(here, 'CHANGELOG.rst')).read()
IOError: [Errno 2] No such file or directory: '/Users/gkisel/.virtualenvs/faker/build/fake-factory/CHANGELOG.rst'
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 16, in <module>
File "/Users/gkisel/.virtualenvs/faker/build/fake-factory/setup.py", line 9, in <module>
NEWS = open(os.path.join(here, 'CHANGELOG.rst')).read()
IOError: [Errno 2] No such file or directory: '/Users/gkisel/.virtualenvs/faker/build/fake-factory/CHANGELOG.rst'
pip installation under Python 3 fails:
$ python --version
Python 3.3.5
$ pip install faker
Downloading/unpacking faker
Downloading Faker-0.0.4.tar.gz
Running setup.py (path:/home/abcde/temp/faker_test/env3/build/faker/setup.py) egg_info for package faker
Traceback (most recent call last):
File "<string>", line 17, in <module>
File "/home/abcde/temp/faker_test/env3/build/faker/setup.py", line 5, in <module>
import faker
File "./faker/__init__.py", line 11, in <module>
import data
ImportError: No module named 'data'
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 17, in <module>
File "/home/abcde/temp/faker_test/env3/build/faker/setup.py", line 5, in <module>
import faker
File "./faker/__init__.py", line 11, in <module>
import data
ImportError: No module named 'data'
----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /home/abcde/temp/faker_test/env3/build/faker
Storing debug log for failure in /home/abcde/.pip/pip.log
The docs mention being able to call the seed() method so you can use a generated dataset as part of a unit test.
Due to the way Faker uses the random module, this usecase is a bit fragile. Any modification to the data requested, or any outside uses of the random module during generation will diverge the dataset.
Here is a quick script demonstrating the problem along with a couple of potential solutions:
import random
from faker import Faker
fake = Faker()
# initial run
fake.seed(1234)
print fake.name()
print fake.name()
print fake.name()
# repeated run with same data
fake.seed(1234)
print fake.name()
print fake.name()
print fake.name()
# adding new fake calls prevent us from getting the same names we had originally
fake.seed(1234)
print fake.name(), fake.email()
print fake.name(), fake.email()
print fake.name(), fake.email()
# One way is to implement a preserve/restore mechanism so that the user can get back to the previous trail of data
fake.seed(1234)
print fake.name()
r = random.getstate()
print fake.email()
random.setstate(r)
print fake.name()
r = random.getstate()
print fake.email()
random.setstate(r)
print fake.name()
r = random.getstate()
print fake.email()
random.setstate(r)
# A similar problem arises if the program using faker happens to use a non-instance random call during generation.
# The best way to prevent this issue is to have faker use an instance of random rather than the module version.
# If faker used an instance version of random, you could also resolve the original problem by using different faker instances
fake.seed(1234)
fake2 = Faker()
fake2.seed(1234)
print fake.name(), fake2.email()
print fake.name(), fake2.email()
print fake.name(), fake2.email()
What do you think about adding a coding style guide/standard for this project?
I can see that style differs a lot from file to file. As a result it needs a lot of cleanup work to do.
Provider a way for the users to add their own custom provider on runtime.
I just received this feedback talking about the original Faker in PHP:
"I would recommend some sort of distinguishing name then. They both have the same name, that is going to be really confusing. Even something like FakerPy or something."
I think it makes sense and FakerPy is a good option.
It would be nice if one could generate random usernames and passwords too. I have a tool for that ( https://github.com/jabbalaci/jabbapylib/blob/master/jabbapylib/apps/userpass.py ) that I use for online registrations.
If you like the idea, I can make a pull request in order to integrate it to faker
.
Currently, the en_US.person
provider contains a long list of names, many of which are actually pretty rare in the US (eg: 'Eusebio' or 'Filiberto').
We can populate the list using data from http://ssa.gov/oact/babynames/decades/names2000s.html (or any other decade).
Related: #69
I've added fake-factory
to ohloh.net at https://www.ohloh.net/p/fake-factory to keep some statistics on the code base and to allow contributors to claim/track their commits.
At the moment, there is no "Manager" ... Who should register as a project manager Someone who works on the project. Ideally the owner, founder, lead developer, or release manager.
So I guess either @joke2k or @fcurella should claim that role by clicking on the "Become the first manager for fake-factory" on the https://www.ohloh.net/p/fake-factory page.
You can also check out http://www.fakenamegenerator.com/ too to get some new ideas. You can select your gender, name set and country, and it generates a complete fake identity. Maybe some parts of it could be integrated in faker too.
The US_en
phone_number() provider includes formats that can generate invalid phone numbers (i.e. numbers which can't be parsed as standard US numbers by phonenumbers.py):
import phonenumbers
from fake import Faker
faker = Faker()
number = faker.phone_number()
phonenumber.parse(number,'US')
The above code will return a NumberParseException
if the phone number is generated using the first format, '+##(#)##########'
with an invalid country code (e.g. +08(1)111111111
). One possibility is to try and force this format to always use a valid country code following the +
. However, because other providers/localizations can already be used to generate specific international number formats including leading country codes, etc... I think it'd be simpler to only include valid US numbers in the US_en
provider. In this case, it'd be easiest to simply remove the '+##(#)##########'
formats from the provider?
All date_time fakers generate within the last given time period (now month), not in 'this' time period (aka current month.)
It's uncommon to use apostrophe in a email address. It would be good not to use last names with apostrophe in the generation process.
Hi,
There are duplicate formats in https://github.com/joke2k/faker/blob/master/faker/providers/en_US/phone_number.py
E.g. '+##(#)##########'
appears twice, as does '0##########'
.
Will be happy to submit a PR for a fix myself. Wanted to confirm the duplicate is a bug.
Best,
Anthony
$ pip install fake-factory==0.3
Downloading/unpacking fake-factory==0.3
Downloading fake-factory-0.3.tar.gz (86kB): 86kB downloaded
Running setup.py egg_info for package fake-factory
Installing collected packages: fake-factory
Found existing installation: fake-factory 0.2
Uninstalling fake-factory:
Successfully uninstalled fake-factory
Running setup.py install for fake-factory
Successfully installed fake-factory
Cleaning up...
$ python
Python 2.7.5+ (default, Jun 2 2013, 13:26:34)
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from faker import Factory
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named faker
>>>
Version 0.2 works great.
This error occurs when attempting to use the password
method on a Factory
object.
Python 2.7.6 (default, Feb 26 2014, 12:07:17)
[GCC 4.8.2 20140206 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from faker import Factory
>>> fake = Factory.create()
>>> fake.password()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Generator' object has no attribute 'password'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.