GithubHelp home page GithubHelp logo

Comments (31)

johnmhoran avatar johnmhoran commented on May 28, 2024 1

@majurg Taking a break, will then add several more tests for the license_changes key/value pair, e.g., no licenses key/value present in scan and no license changes in modified Delta objects.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

First and foremost, we will need to update our Scan object (#2) to hold scancode option information.

Additionally, we will need to modify our File object to hold some or all of the scancode license information, like license key, expression, owner etc.

Finally the meat of the work is a post-determine_delta() function that runs across only the modified category of Delta objects. We may want to modify our 'deltas' data structure from a list to a dictionary as well.

from deltacode.

mjherzog avatar mjherzog commented on May 28, 2024

The requested enhancement is to evaluate whether a modified file includes a license change. This would not change its status as a "modified" file. We need some other way to show it is modified, but that there was no license change - based on the same license expression in both files. We could possibly compare the set of detected licenses, but using just license expression would probably be more efficient.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

Yes, the naming is something I will discuss with @johnmhoran. There will likely be some changes as we drill down into specific categories of deltas.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg I've added the entirety of the ScanCode licenses value to the File object, and it's now displayed as well as part of the OrderedDict returned by the DeltaCode object's to_dict() method. The order of the key/value pairs inside the licenses field, however, differs from its order in a ScanCode scan, e.g., start_line and end_line are no longer adjacent to one another.

This should not affect processing, but makes the data less readable by users IMHO. I think this could readily be corrected by using an OrderedDict for the licenses, but perhaps that would be overkill. What do you think? And should I commit and push my initial work on this issue before you take the time to consider the question?

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg One other thought: while I'm working on the modified category, shall I change unchanged to unmodified throughout the DeltaCode codebase? Or does this merit a separate issue?

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

Either way is fine; Another possible solution would be to create a new License class.

You can make that change at any time, thats fine.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

Thanks, @majurg -- I'll try the License class approach (and will change unchanged to unmodified).

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg While I'm able to add the ScanCode licenses value as a new File object attribute using setattr(self, 'licenses', dictionary.get('licenses')), I've tried a wide range of alternative structures using a new License class that accepts a dictionary, but have not yet succeeded in adding the licenses value with this separate class. This is an example of the most common error I've encountered:

TypeError: <deltacode.models.License instance at 0x0386CA80> is not JSON serializable

Despite extensive research and testing, I've been unable to figure out how to resolve. Perhaps we can have a brief discussion this morning when we're both in the office.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

I am assuming you are using print statements/printing out the json to test your results.

Two options:

  1. ignore this for now and do all your testing and debugging via the test suite.

  2. modify the json output function to account for this new Licenses field. IMO this is not the best as we do not really want to output this license information at all, we just want to use it for comparing two File objects

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

45 tests pass, 2 fail, both throwing variations on TypeError: <deltacode.models.License instance at 0x046D1508> is not JSON serializable at this line: json_output = json.dumps(dict_output). Testing by running some deltacode command-line tests produces the same error.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

what tests specifically

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024
    def test_DeltaCode_json_file_added(self):
        new_scan = self.get_test_loc('deltacode/new_added1.json')
        old_scan = self.get_test_loc('deltacode/old_added1.json')

        result = DeltaCode(new_scan, old_scan)
        dict_output = result.to_dict()
        json_output = json.dumps(dict_output)
        loaded_json = json.loads(json_output)

        assert loaded_json['deltas_count'] == 9
        assert loaded_json['deltacode_stats']['added'] == 1
        assert loaded_json['deltacode_stats']['modified'] == 0
        assert loaded_json['deltacode_stats']['removed'] == 0
        assert loaded_json['deltacode_stats']['unchanged'] == 8

    def test_DeltaCode_json_file_modified(self):
        new_scan = self.get_test_loc('deltacode/new_modified1.json')
        old_scan = self.get_test_loc('deltacode/old_modified1.json')

        result = DeltaCode(new_scan, old_scan)
        dict_output = result.to_dict()
        json_output = json.dumps(dict_output)
        loaded_json = json.loads(json_output)

        assert loaded_json['deltas_count'] == 8
        assert loaded_json['deltacode_stats']['added'] == 0
        assert loaded_json['deltacode_stats']['modified'] == 1
        assert loaded_json['deltacode_stats']['removed'] == 0
        assert loaded_json['deltacode_stats']['unchanged'] == 7

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

@johnmhoran why do we dump as json, then load that same json back to a dict?

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg Looks like that's a result of adapting the tests from the time when the to_dict() method was to_json() and produced JSON output. I made the following change to both failing tests and they now pass:

        result = DeltaCode(new_scan, old_scan)
        # dict_output = result.to_dict()
        # json_output = json.dumps(dict_output)
        # loaded_json = json.loads(json_output)

        loaded_json = result.to_dict()

However, when printing json.dumps(data) or selecting the JSON output option, I still receive a variation of this error: TypeError: <deltacode.models.License instance at 0x038E3210> is not JSON serializable.

Clearly I need to fix this, and I'll need your guidance to understand it first. I also need a way to test the new License class. Visualizing it in some way would also be helpful -- right now, when I print data defined like this: data = delta.to_dict(), the output for the licenses field looks like this: 'licenses': <deltacode.models.License instance at 0x03E31A80>. Not what we want.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

@johnmhoran yes it will always fail to serialize to json because we do not have a to_dict function for License object. So, thats simple enough to add that, and then call License.to_dict() in the appropriate location in DeltaCode object's to_dict().

Adding a __repr__ function to our License object fixes the object hash from being printed.

As far as tests go, they should be similar to the File object tests we have.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg Thanks for your help earlier. I've revised the License constructor, added to_dict() methods for the File and License classes, and referred to the File to_dict() method inside the DeltaCode to_dict() method. Not only do all 47 tests now pass, but both the printed JSON and the generated JSON files throw no errors, look good, and include the ScanCode licenses key/value pairs in each of the old and new Delta objects.

Pushing shortly, and then I'll start working on some new tests for the latest set of changes.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

@johnmhoran The next step should be updating our DeltaCode.deltas field to an dictionary where the keys are 'added', 'removed', 'modified', 'unmodified', and the values are list of corresponding delta objects.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

Excellent, @majurg . I'll get started on it.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg I was about to push my commit when I reread your instruction and realized that I misinterpreted the goal. Rather than modifying the DeltaCode.deltas field, I've modified the deltas key in the dictionary generated by DeltaCode.to_dict() to group Delta objects by a category-based key (and modified generate_csv() and the test helper function to accommodate the changes).

I'll need to revert all those changes and tackle DeltaCode.deltas instead but, before I do, I want to confirm that that's what I should do.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

Yes, the deltas field should be changed; this will subsequently require an update of to_dict
and perhaps generate_csv as well

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

we can have a session if you want as well.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg Sorry about misinterpreting what really was a clear instruction. I've been so focused on the JSON deltas key and its Delta objects that when I saw deltas, that's the association I automatically made.

I think I have a good sense of how to approach DeltaCode.deltas/determine_delta() but a brief session would be helpful -- I'm available whenever it's convenient for you.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg Given the goals we discussed yesterday, is there any reason for us to retain the license_changes key/value pair in our determine_delta() OrderedDict?

deltas = OrderedDict([
('added', []),
('removed', []),
('modified', []),
('unmodified', []),
('license_changes', [])
])

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

No, you can remove that.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

Thanks, will do.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg I think this is ready for a new PR -- let me know if you agree and if so I'll open the PR.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

@johnmhoran comments left on latest commit

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

@johnmhoran you can open a PR when you are ready so this can get merged in.

from deltacode.

johnmhoran avatar johnmhoran commented on May 28, 2024

@majurg Just saw your message. Excellent. PR enroute.

What would you like me to focus my efforts on for the rest of the week? I'm available to discuss via uberconf at your convenience.

from deltacode.

steven-esser avatar steven-esser commented on May 28, 2024

merged #22, closing

from deltacode.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.