GithubHelp home page GithubHelp logo

Comments (25)

steven-esser avatar steven-esser commented on May 28, 2024

For now, we can keep this script in the etc/scripts/ directory, like scancode does.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg Strange, still not able to invoke help:

(deltacode) Thu Mar 29, 2018 05:52 PM  /c/code/nexb/dev/deltacode JMH (75-use-json-to-csv-script)
$ etc/scripts/json2csv -h
bash: etc/scripts/json2csv: No such file or directory

(deltacode) Thu Mar 29, 2018 05:52 PM  /c/code/nexb/dev/deltacode JMH (75-use-json-to-csv-script)
$ etc/scripts/json2csv.py -h
etc/scripts/json2csv.py: line 25: from: command not found
etc/scripts/json2csv.py: line 26: from: command not found
etc/scripts/json2csv.py: line 27: from: command not found
etc/scripts/json2csv.py: line 29: import: command not found
etc/scripts/json2csv.py: line 30: from: command not found
etc/scripts/json2csv.py: line 31: import: command not found
etc/scripts/json2csv.py: line 32: import: command not found
etc/scripts/json2csv.py: line 34: import: command not found
etc/scripts/json2csv.py: line 37: import: command not found
etc/scripts/json2csv.py: line 40: syntax error near unexpected token `('
etc/scripts/json2csv.py: line 40: `def load_deltas(json_input):'

(deltacode) Thu Mar 29, 2018 05:52 PM  /c/code/nexb/dev/deltacode JMH (75-use-json-to-csv-script)
$

Hmmm. I'll figure it out . . . .

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

BTW, the syntax error seems erroneous. It purports to cover the first method, which looks OK to me:

from __future__ import print_function
from __future__ import absolute_import
from __future__ import unicode_literals

import codecs
from collections import OrderedDict
import json
import os

import click
click.disable_unicode_literals_warning = True

import unicodecsv


def load_deltas(json_input):
    """
    Return a list of scan results loaded from a json_input, either in
    DeltaCode standard JSON format.
    """
    with codecs.open(json_input, 'rb', encoding='utf-8') as jsonf:
        delta = jsonf.read()

    delta_results = json.loads(delta, object_pairs_hook=OrderedDict)

    delta_results = delta_results['deltas']

    return delta_results

We see the syntax error only when running etc/scripts/json2csv.py -h -- the ScanCode source for our work says the command should be run without the .py extension: etc/scripts/json2csv -h

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

Is the script able to be run correctly from the cli at all? If it is we shouldn’t worry about getting -h to work right this second if that is the case

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

No -- I get a similar result without the -h:

(deltacode) Fri Mar 30, 2018 08:48 AM  /c/code/nexb/dev/deltacode JMH (75-use-json-to-csv-script)
$ etc/scripts/json2csv
bash: etc/scripts/json2csv: No such file or directory

(deltacode) Fri Mar 30, 2018 08:49 AM  /c/code/nexb/dev/deltacode JMH (75-use-json-to-csv-script)
$

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

Using a full path works:

(deltacode) Fri Mar 30, 2018 10:11 AM  /c/code/nexb/dev/deltacode JMH (75-use-json-to-csv-script)
$ python etc/scripts/json2csv.py
Usage: json2csv.py [OPTIONS] JSON_INPUT CSV_OUTPUT

Error: Missing argument "json_input".

(deltacode) Fri Mar 30, 2018 10:11 AM  /c/code/nexb/dev/deltacode JMH (75-use-json-to-csv-script)
$ python etc/scripts/json2csv.py -h
Usage: json2csv.py [OPTIONS] JSON_INPUT CSV_OUTPUT

  Convert a DeltaCode JSON file to a CSV.

Options:
  -h, --help  Show this message and exit.

(deltacode) Fri Mar 30, 2018 10:11 AM  /c/code/nexb/dev/deltacode JMH (75-use-json-to-csv-script)
$ python etc/scripts/json2csv.py  /c/code/nexb/dev/deltacode_tests/scan_1_file_moved_20180328.json /c/code/nexb/dev/deltacode_tests/scan_1_file_moved_20180330.csv

(deltacode) Fri Mar 30, 2018 10:12 AM  /c/code/nexb/dev/deltacode JMH (75-use-json-to-csv-script)
$

Sample output (though with unwanted unicode prefix and quotation marks):

score,factors,new,old
0,[u'moved'],b/a4.py,a/a4.py

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

@johnmhoran We need to handle factors better. something like ' '.format(factors)

meaning all we need to do is construct a string from the list of factors, where each factor is separated by a space.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg Tried a number of approaches. This works on my initial one-factor test: ('factors', ' '.join(factors)).

        yield OrderedDict([
            ('score', score),
            # ('factors', factors),  # unicode and quotes
            # ('factors', ''.format(factors)),  # empty
            # ('factors', ' '.format(factors)),  # empty
            # ('factors', '{}'.format(factors)),  # unicode and quotes
            # ('factors', u'{}'.format(factors)),  # unicode and quotes
            # ('factors', u'{:s}'.format(factors)),  # unicode and quotes
            # ('factors', u"{}".format(factors)),  # unicode and quotes
            # ('factors', u"{0}".format(factors)),  # unicode and quotes
            ('factors', ' '.join(factors)),
            ('new', new),
            ('old', old),
        ])

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

that looks good

i always mix up format and join

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

👍

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg Our current CSV output headers are Factors,Score,Path,Name,Type,Size,Old Path while our new json2csv headers are score,factors,new,old. Do we want to use the structure and content of the current CSV headers, switch to the new headers, or adopt some other approach?

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

Let’s copy them for now; forgot about those additions

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

Thanks @majurg , will do.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg I've been trying to call our json2csv.py methods from the cli() method inside cli.py. To do so I'm trying to import json2csv (or alternatively one of its methods) from the sibling directory, etc/scripts/ (cli.py is inside src/deltacode/).

deltacode
    etc
        scripts
            json2csv.py
    src
        deltacode
            cli.py

Have tried a number of approaches, but the typical error has been a variation on this message: ImportError: No module named etc.scripts.json2csv. A few: ValueError: Attempted relative import beyond toplevel package. I get the feeling that using from . . . for importing is limited to the src directory.

I thought ScanCode might have had a similar need and could provide an example, but I've not yet found one. What's the proper way to import and call a json2csv method from inside cli.py?

from deltacode.

pombredanne avatar pombredanne commented on May 28, 2024

@johnmhoran the json2csv script is not in a proper python module available in the path.
json2csv as it is cannot be imported in the deltacode modules as is. You need to move json2csv in a proper python module or move its core code in such module: for now, json2csv can import code from deltacode, but not the other way around so this needs a tiny bit of module design to decide which feature lives where

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

Thanks @pombredanne . I'll coordinate with @majurg on the design work. Perhaps we need to move json2csv.py into src/deltacode/, where cli.py and its siblings currently live (and move the companion test, test_json2csv.py, into our tests/ directory).

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

@johnmhoran I think we should proceed with removing the csv code on the deltacode side (the -c option and relating functions) and focus on just supporting a json output.

For displaying info in a csv (for audit reasons or otherwise), we can just use the json2csv script. We will have more control over what the csv output can be without the overhead of supporting it in deltacode proper.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg Just to be clear, we want to retain the json2csv structure we started with inside etc/scripts/, right?

FYI, after saving that structure in a commit, I moved json2csv.py into src/deltacode/ (and moved the related test file and data). The CSV conversion was available with this command: python src/deltacode/json2csv.py /c/code/nexb/dev/deltacode_tests/scan_sorted01_20180328.json /c/code/nexb/dev/deltacode_tests/scan_sorted01_20180401.csv

I then worked on calling json2csv.json_delta_to_csv() from cli(), trying various approaches using a temporary file to first create JSON output and then pass that output to our new conversion function. The closest I got used the code excerpted below, which throws the error ValueError: Unterminated string starting at: line 142 column 17 (char 4233) when a deltacode -n [new_JSON_path] -o [old_JSON_path] -c [CSV_output_path] command is run.

. . .
from deltacode import json2csv
from tempfile import NamedTemporaryFile

. . .

def cli(new, old, csv_file, json_file, all_delta_types):

. . .

    options = OrderedDict([
        ('new_scan_path', new),
        ('old_scan_path', old),
        ('--all-delta-types', all_delta_types)
    ])

    # do the delta
    deltacode = DeltaCode(new, old, options)

    # output to csv
    if csv_file:
        # write_csv(deltacode, csv_file, all_delta_types)
        temp_file = NamedTemporaryFile(mode='w+t', delete=False)
        write_json(deltacode, temp_file, all_delta_types)
        json2csv.json_delta_to_csv(temp_file.name, csv_file)
    # generate JSON output
    else:
        write_json(deltacode, json_file, all_delta_types)

Have not been able to figure out the source of the error (or, more importantly, the fix). The error seems to be triggered by this line inside json2csv.load_deltas(): delta_results = json.loads(delta, object_pairs_hook=OrderedDict)

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

@johnmhoran Yes, we want to keep that old structure in etc/scripts/. There should no longer be errors for you once you revert back.

I see what you were trying to do, and you are going about it the right way. It is hard to tell the source of the error without seeing the actual stack trace in full. It could be as simple as a ./configure.bat.

In any case, we no longer want this csv stuff in src/deltacode/cli once the script is moved back and working.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

Thanks @majurg . Understood and will do, and will then work on adding tests. (BTW, had already run ./configure.bat, to no avail. ;-[ )

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg Do we want the main DeltaCode help to say anything about the new json2csv script? Currently that info is available only by running python etc/scripts/json2csv.py -h.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

No, but we will mention it somewhere in the Docs outside the codebase.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

👍

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

comments left on the latest commit. you can open a PR once you make the changes.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

Merged #94, closing.

from deltacode.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.