GithubHelp home page GithubHelp logo

dbf's People

Contributors

crass avatar ethanfurman avatar ltvolks avatar synapticarbors avatar tirkarthi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbf's Issues

Request: Ability to read DBF IV or convert from IV to 3

We've come across a strange issue migrating some customer data. We're trying to read DBF from DBase IV, which seems unsupported by dbf. I can see why, given the age.

Does anybody have any recommendations on how we might convert a DBF file from Dbase IV to 3 so it can be read by the dbf library?

Ascii code page 855

Hi Ethan,

first of all, amazing job! This library is like gold for me!
I work in UNICEF in Ukraine, and unfortunately the Cyrillic alphabet here is slightly different. I see your library supports the codepage 866, but in Ukraine they use the codepage 855. Is there a way for you to add this support to the library? Thanks!

CSV Headers with formatting characters enclosing the column name

I was wondering if there is a way to get rid of the formatting characters that are added to the headers when exporting a database DBF to a CSV?

(b'Name',4)
(b'Address',7)
and so on...

I assume this extra characters are to apply the bold type for the number of characters that the string contains, or something like that, but since I'm trying to dump those CSV into another database, they are causing me issues. The field name with no extras would be awesome...

Great assets, by the way, it helped me a lot!

Issue with records

That's not really a bug but a different behavior of what I'm expecting.
I have some DBF files from an old CRM, when I print all records I get a different result of when I open from libreoffice.
I noticed this issue because I have a column that should have unique values and libreoffice shows it correctly.
With dbf though, I see duplication, also in the DBF file there is such duplication.
It's possible the CRM is doing something to sign as "discard" row, because the correct data is the one showed from libreoffice.
An interesting thing I notice, in the DBF file when I see duplicated rows, the duplicate row ends with an * which basically means, deleted rows in DBF.
Is there a way to skip deleted rows?
Nevermind found it myself.
I can use is_deleted

Convert "\x00...\x00" strings to None

I found str values filled with \x00...\x00 in some DBFs, which I suppose should be converted to None (I don't know the DBF specs, I'm just supposing since the data source where I downloaded the file from also shared a CSV version and these values are NULL in the CSV).

The code to reproduce is below. I could not upload the DBF file here, but you can download it from my website.

from pprint import pprint

import dbf

dbf_filename = "vw_infracao_poligono_publico.dbf"
encoding = "utf8"
table = dbf.Table(dbf_filename, codepage=encoding, on_disk=True)
field_names = table.field_names
table.open()
for record in table:
    row = {field_name: record[field_name] for field_name in field_names}
    if row["CODIGOSOLI"].strip() == "autoinfracaosga_URX8ECKUFY6305":
        pprint(row)

The result is (check the value for NUMERO_AUT):

{'CODIGOSOLI': 'autoinfracaosga_URX8ECKUFY6305                                                                                                                                                                                                                                ',
 'DATA_AUTUA': datetime.datetime(2017, 5, 8, 3, 0),
 'DESCRICAO': 'Por desmatar a corte raso área total de 4,3ha de vegetação '
              'nativa em RL sem autorização do órgão '
              'competente.                                                                                                                                           ',
 'DT_ABERTUR': datetime.datetime(2017, 5, 15, 16, 32, 28, 943000),
 'MOTIVO_AUT': 'Desmatamento.                                                                                                                                                                                                                                                 ',
 'NUMERO_AUT': '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
 'NUMERO_SEI': '202000017012601                                                                                                                                                                                                                                               '}

I've tested some solutions and the fastest one is this:

for record in table:
    row = {}
    for field_name in field_names:
        value = record[field_name]
        if isinstance(value, str) and value and value[0] == value[-1] == "\x00" and set(value) == {"\x00"}:
            value = None
        row[field_name] = value
    if row["CODIGOSOLI"].strip() == "autoinfracaosga_URX8ECKUFY6305":
        pprint(row)

... and the result:

{'CODIGOSOLI': 'autoinfracaosga_URX8ECKUFY6305                                                                                                                                                                                                                                ',
 'DATA_AUTUA': datetime.datetime(2017, 5, 8, 3, 0),
 'DESCRICAO': 'Por desmatar a corte raso área total de 4,3ha de vegetação '
              'nativa em RL sem autorização do órgão '
              'competente.                                                                                                                                           ',
 'DT_ABERTUR': datetime.datetime(2017, 5, 15, 16, 32, 28, 943000),
 'MOTIVO_AUT': 'Desmatamento.                                                                                                                                                                                                                                                 ',
 'NUMERO_AUT': None,
 'NUMERO_SEI': '202000017012601                                                                                                                                                                                                                                               '}

Tables of type (50, 'Visual FoxPro (VarChar, VarBinary, or BLOB enabled)') not supported

Hello, first of all thank you very much for this very useful library. When I want to copy the structure of a table foxpro gives me the following error:

dbf.exceptions.DbfError: Tables of type (50, 'Visual FoxPro (VarChar, VarBinary, or BLOB enabled)') not supported.

Is there any chance that this will be supported in the future? Thank you very much!

dbf.Table() Overwrites existing files if 'field_specs' are specified

dbf module should warn the user if they are overwriting any existing files when calling dbf.Table()
What i tried to do was open my dBase III file on memory :

myDatabaseFile = path-to-my-file
testTable = dbf.Table(filename=myDatabaseFile, on_Disk=False).open(dbf.READ_ONLY)
print("Number of records on this file : " + str(testTable.__len__()))

This threw an error :

dbf.DbfError: field list must be specified for memory tables

What i did not know was that this module (probably) does support loading a file into memory, but only throught create_index(), so i foulishly added the field specs to the arguments and restarted the test. And lo and behold :

Number of records on this file : 0

Turns out i nuked an entire 56K records database in an instant, the database file itself was deleted from that folder. Fortunately i had several backups.
Still, i think that the READ_ONLY flag is very deceptive, since the code will delete or overwrite an existing file without warning (depending on how you set on_Disk) if the field_specs are specified. READ_ONLY should also throw an error when specifying field_specs, since the developer will have to populate the database it anyways.

Uncomplete DBF loading

Hi,

table = dbf.Table ( filename=DBF_FILE, on_disk=True )

when i open DBF table (link below) it will not load all columns (only 47 instead of 58). If i specified column names, it deleted all data in DBF. I tried specify codepage, but it not helped. Im I missing something or only this DBF file have something wrong? Thanks

SFGX_test.zip

Exclusively opening a file

When working through the DBASE environment, there is an option to open a file "exclusively", so that other applications wait on the released lock, before altering the table content.

Is there an option to get/check/release this lock through this module? Do you have any idea on how we would approach such an issue? Perhaps I could contribute such a functionality, if you have any tips/ideas on how to pursue this problem.

Thanks for an amazing package!

Bulk insertion speed

Hello

With this code I can insert 900 rows/sec with a SSD.
On a network folder it inserts 200 rows/sec.

for i in range(50000):
    data = (str(i), str(i) + '.jpeg', 'Lorem ipsum dolor sit amet', '{GCF2807B-E86B-CAC7-5F21-4B18C46ADDD0}', '441000', '4637070', '5', '600', '600', '791', '6', '41', '27', '0', '0', '0', '1.5', '1753322669', '0', '440998', '443998', '4634072', '4637072', '440998', '443998', '4634072', '4637072', 0, 1)
    mydbf.append(data) # write line per line

This doesn't seem faster :

metadata.append(multiple=50000) # 7s
i = 0
for rec in dbf.Process(metadata):
    data = (str(i), str(i) + '.jpeg', 'Lorem ipsum dolor sit amet', '{GCF2807B-E86B-CAC7-5F21-4B18C46ADDD0}', '441000', '4637070', '5', '600', '600', '791', '6', '41', '27', '0', '0', '0', '1.5', '1753322669', '0', '440998', '443998', '4634072', '4637072', '440998', '443998', '4634072', '4637072', 0, 1)
    rec[:] = data
    i += 1

Is there any sub-second bulk insert method?

Conversion fails when dbf file contains numbers stored in scientific notation

I don't know how the original file was generated but according to the internet, a dbf file may contain numbers stored in scientific notation.
In such cases, the following error can be observed:

ValueError: invalid literal for int() with base 10: b'1.E+3'

This is coming from retrieve_numeric() function from tables.py, and can be fixed by wrapping the string in float() before converting to int:

        if fielddef[DECIMALS] == 0:
            return string and int(float(string)) or 0
        else:
            return string and float(string) or 0.0

Iteration seems not working but table `record_length` ok

Thanks for the module and your work!

I cannot read dbf file content according to the document. Is there any further document?

trade = dbf.Table('table.dbf')
trade.open()
# length ok
print(trade.record_length)
# output nothing
for r in trade:
    print(r)

Reindex seems not work

Hi all,

Thanks for that package.
Im using dbf package in python for reading and writing in foxpro dbf files. It runs ok except when I try to update a dbf file with cdx files asociate. After add the new records I use function reindex but it seems not work because if I open the dbf from other application the new records are not available. In order to see the new records I need to open the dbf with microsoft foxpro and puh on reindex. After reindex the new records are available. Please, could you take a look to reindex function? or indicate us how we can reindex the dbf file from python?

Thanks in advance,
Best regards
Fran

Incorrect handling of Clipper dbf file with long character field

I'm trying to write Python code to manipulate Clipper dbf tables from an old project of mine. An error is thrown when trying to open one of the files. It has one field (among 38) that is a long character field, with length of 256. Total record length is 629.

When opening with dbf_type=None (the default), I get this error:

dbf.BadDataError: Header shows record length of 629, but calculated record length is 373

When opening with dbf_type='clp', it opens properly. However, when accessing table.current_record, I get this error:

dbf.BadDataError: record data is not the correct length (should be 1257, not 629)

Here is the stack trace:

Traceback (most recent call last):
  File "/Users/bill/PycharmProjects/EZ DryClean/src/dbf/taxcalc.py", line 75, in <module>
    main()
  File "/Users/bill/PycharmProjects/EZ DryClean/src/dbf/taxcalc.py", line 21, in main
    dbf_customer.showHead()
  File "/Users/bill/PycharmProjects/EZ DryClean/src/dbf/Clipper.py", line 21, in showHead
    print(self.table.current_record)
  File "/Users/bill/PycharmProjects/EZ DryClean/venv/lib/python3.9/site-packages/dbf/__init__.py", line 2946, in current_record
    return self[index]
  File "/Users/bill/PycharmProjects/EZ DryClean/venv/lib/python3.9/site-packages/dbf/__init__.py", line 5472, in __getitem__
    return self._table[value]
  File "/Users/bill/PycharmProjects/EZ DryClean/venv/lib/python3.9/site-packages/dbf/__init__.py", line 5220, in __getitem__
    maybe = Record(recnum=index, layout=meta, kamikaze=bytes, _fromdisk=True)
  File "/Users/bill/PycharmProjects/EZ DryClean/venv/lib/python3.9/site-packages/dbf/__init__.py", line 3085, in __new__
    raise BadDataError("record data is not the correct length (should be %r, not %r)" %
dbf.BadDataError: record data is not the correct length (should be 1257, not 629)

The showHead() method referenced above:

    def showHead(self, rows: int = 2):
        self.table.goto('top')
        if self.table.bof:
            self.table.skip()
        print(f'Showing {rows} rows for {self.table.filename}')
        print('---------------------------------')
        for i in range(rows):
            print(self.table.current_record)
            self.table.skip()
        print('=================================')

The relevant lines from taxcalc.py:

    dbf_orders = DB('../../data/ORDERS.DBF')
    dbf_orders.showHead()
    dbf_customer = DB('../../data/CUSTOMER.DBF', dbf_type='clp')
    dbf_customer.showHead()

ORDERS.DBF does not have any long character fields, and everything works properly for it. CUSTOMER.DBF has the one long character field (256).

I've been trying to find the bug and fix it, but haven't been able to isolate it, yet.

DeprecationWarning importing ABCs from collections

I'm currently using dbf with Python 3.8 when I received this warning:
DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.9 it will stop working
Are there plans to adopt for Python 3.9?

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3: character maps to <undefined>

Looping over the records in the file straatnm.dbf gives UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3: character maps to
you can download the dBASE files from this location : https://downloadagiv.blob.core.windows.net/crab-stratenlijst/dBASE/CRAB_stratenlijst.zip

Traceback (most recent call last):
File "", line 2, in
File "/home/administrator/.local/lib/python2.7/site-packages/dbf/init.py", line 3263, in str
result.append("%3d - %-10s: %r" % (seq, field, self[field]))
File "/home/administrator/.local/lib/python2.7/site-packages/dbf/init.py", line 3194, in getitem
return self.getattr(item)
File "/home/administrator/.local/lib/python2.7/site-packages/dbf/init.py", line 3161, in getattr
value = self._retrieve_field_value(name)
File "/home/administrator/.local/lib/python2.7/site-packages/dbf/init.py", line 3361, in _retrieve_field_value
datum = retrieve(record_data, fielddef, self._meta.memo, self._meta.decoder)
File "/home/administrator/.local/lib/python2.7/site-packages/dbf/init.py", line 4145, in retrieve_character
data = fielddefCLASS
File "/usr/lib/python2.7/encodings/cp1252.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 3: character maps to

Add more examples

Problem Creating a DBF with an Integer Field

Hi,
I wrote the following routine to create a new DBF file from an existing structure extended DBF file.

def create_dbf_from(struct_file, new_file):
  db = dbf.Table(struct_file)
  db.open(mode=dbf.READ_ONLY)
  structure_str = ""
  reccount = 0
  for record in db:
    if reccount > 0:
      structure_str += "; " + record.field_name.rstrip()
      reccount += 1
    else:
      structure_str += record.field_name.rstrip()
      reccount = 1
    if record.field_type == "N" or record.field_type == "B"  or record.field_type == "F":
       structure_str += " "+record.field_type+"("+str(record.field_len)+","+str(record.field_dec)+")"
    else:
       if record.field_type != "C":
         structure_str += " "+record.field_type
       else:
         structure_str += " "+record.field_type+"("+str(record.field_len)+")"

  db.close()
  print(structure_str)
  newdb = dbf.Table(new_file, structure_str)
  newdb.close()

It works except when the structure extended file contains an Integer field. I then tried just entering the following commands into Python:
import DBF
new_table = dbf.Table('new_file_name.dbf', 'field1 I; field2 N(5,2)')

I get the following error message:
File C:\ProgramData\anaconda3\lib\site-packages\dbf_init_.py:5585 in init
self.add_fields(field_specs)

File C:\ProgramData\anaconda3\lib\site-packages\dbf_init_.py:5888 in add_fields
raise FieldSpecError("Unknown field type: %s" % field_type)

FieldSpecError: Unknown field type: FieldType.INTEGER

Is it something that I am doing or does "dbf" have an issue creating an integer field.

TIA,

Jeff

Delimiter for export

Is there any way to export in csv using a delimiter other than comma? (for example the vertical bar | )

VFP Memo field doesn't store anything bug

I am using dbf and creating new tables for vfp and then pushing data into those newly created tables.
When I use vfp to look at the memo fields they look like there should be some data but when viewing them they are empty.
Normally an empty memo field has memo in lower case but these are Memo so that means there is data there but clicking it returns nothing.
If I switch it to db3 when I create those same tables the memo fields are fine viewing in vfp.
Now viewing them thru python the memo fields do have data so not sure why vfp table memo fields don't work but the db3 table memo fields do. I am using vfp 9 not 6 I wouldn't think that is it but I don't know that for a fact.

Start value of fields of table types vfp/clp

Hi
I recently stumbled upon a DBF file, which was readable, but did not yield the records I expected. After some investigation, I found out, that the fields all had the same start value (resulting in total gibberish in the records).
By modifying the code to not use the start value from the field spec, but using the offset, which gets build up during field initialization, I was able to "solve" my issue. The Db3Table already works this way, the VfpTable and ClpTable not.

Is this intended (and therefore a bug in my DBF file) or a potential bug in the library?

diff --git a/dbf/__init__.py b/dbf/__init__.py
index 5dc5080..81ed089 100755
--- a/dbf/__init__.py
+++ b/dbf/__init__.py
@@ -6769,3 +6769,3 @@ class ClpTable(Db3Table):
                 raise BadDataError("Unknown field type: %s" % type)
-            start = unpack_long_int(fieldblock[12:16])
+            start = offset
             length = fieldblock[16]
@@ -7101,3 +7101,3 @@ class VfpTable(FpTable):
                 raise BadDataError("Unknown field type: %s" % type)
-            start = unpack_long_int(fieldblock[12:16])
+            start = offset
             length = fieldblock[16]

Example of module

I just want to see the code of :

1.How to get record count of : (A) Table (B) Filtered Row (C) Conditional Indexed
2.Relation of Two Table i.e. how do I setup relation of two table based on a indexed key i.e. set relation to into a (As Works in Foxpro)
3.Foxpro's Browse like Grid/Window which gives many features
4.Generation of Text file as output / Report Formating

Why, I am asking for this? Because, if I get the above code in action, it will not only boost up my Python learning, but also get me rid of shit MS tech.

Add Latin Codepage? [feature request]

Any interest in adding a latin codepage?
I added latin-1 a.ka. ISO/IEC 8859-1 or cp809 (https://en.wikipedia.org/wiki/ISO/IEC_8859-1)) since I had a number of characters that weren't translating to ascii or utf.
I just stuck it on the end of the end of the code_pages definition:
dbf/init.py:8364 0xf1 : ('latin-1', 'Latin Western Europe')

OS: Windows Server 2008 R2
Python 2.7 (I know)

Working for me, happy to submit a pull request: https://github.com/wolviex/dbf
But I also put some comments in the readme regarding Table arguments, and won't take it personally if you'd rather do it differently =)

Thanks!

Add as_dict method to dbf.Record

I'd like to propose adding a as_dict method to dbf.Record class. It can be helpful if we don't know in advance the field names for a DBF and don't want to check table.field_names and cleanup strings.

I can create a PR with docs but already tested the following code:

    def as_dict(self, strip_strings=False):
        row = {}
        for field_name in self._meta.fields:
            value = self[field_name]
            if isinstance(value, str) and strip_strings:
                value = value.strip()
            row[field_name] = value
        return row

strip_strings is very useful for me (I use it in almost everytime I'm reading a DBF), since the string values always have trailing spaces.

test_mismatched_extensions test fails on case-insensitive filesystems

I tried have a look through the logic, but the overall intent is a bit over my head as I’m not familiar with the code and only use this package indirectly. Fails on e.g. macOS. Test log:

======================================================================
FAIL: test_mismatched_extensions (__main__.TestDbfFunctions.test_mismatched_extensions)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/private/tmp/nix-build-python3.11-dbf-0.99.9.drv-0/dbf-0.99.9/dbf/test.py", line 4822, in test_mismatched_extensions
    self.assertEqual(table._meta.memoname, new_memo_name)
AssertionError: '/private/tmp/nix-build-python3.11-dbf-0.99.9.drv-0/tmpedpvv9u9/temptable.dbt' != '/private/tmp/nix-build-python3.11-dbf-0.99.9.drv-0/tmpedpvv9u9/temptable.Dbt'
- /private/tmp/nix-build-python3.11-dbf-0.99.9.drv-0/tmpedpvv9u9/temptable.dbt
?                                                                          ^
+ /private/tmp/nix-build-python3.11-dbf-0.99.9.drv-0/tmpedpvv9u9/temptable.Dbt
?                                                                          ^

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.