Comments (31)
@majurg Taking a break, will then add several more tests for the license_changes
key/value pair, e.g., no licenses
key/value present in scan and no license changes in modified
Delta objects.
from deltacode.
First and foremost, we will need to update our Scan object (#2) to hold scancode option information.
Additionally, we will need to modify our File object to hold some or all of the scancode license information, like license key, expression, owner etc.
Finally the meat of the work is a post-determine_delta() function that runs across only the modified
category of Delta objects. We may want to modify our 'deltas' data structure from a list to a dictionary as well.
from deltacode.
The requested enhancement is to evaluate whether a modified file includes a license change. This would not change its status as a "modified" file. We need some other way to show it is modified, but that there was no license change - based on the same license expression in both files. We could possibly compare the set of detected licenses, but using just license expression would probably be more efficient.
from deltacode.
Yes, the naming is something I will discuss with @johnmhoran. There will likely be some changes as we drill down into specific categories of deltas.
from deltacode.
@majurg I've added the entirety of the ScanCode licenses
value to the File object, and it's now displayed as well as part of the OrderedDict returned by the DeltaCode object's to_dict()
method. The order of the key/value pairs inside the licenses
field, however, differs from its order in a ScanCode scan, e.g., start_line
and end_line
are no longer adjacent to one another.
This should not affect processing, but makes the data less readable by users IMHO. I think this could readily be corrected by using an OrderedDict for the licenses
, but perhaps that would be overkill. What do you think? And should I commit and push my initial work on this issue before you take the time to consider the question?
from deltacode.
@majurg One other thought: while I'm working on the modified
category, shall I change unchanged
to unmodified
throughout the DeltaCode codebase? Or does this merit a separate issue?
from deltacode.
Either way is fine; Another possible solution would be to create a new License class.
You can make that change at any time, thats fine.
from deltacode.
Thanks, @majurg -- I'll try the License class approach (and will change unchanged
to unmodified
).
from deltacode.
@majurg While I'm able to add the ScanCode licenses
value as a new File
object attribute using setattr(self, 'licenses', dictionary.get('licenses'))
, I've tried a wide range of alternative structures using a new License
class that accepts a dictionary, but have not yet succeeded in adding the licenses
value with this separate class. This is an example of the most common error I've encountered:
TypeError: <deltacode.models.License instance at 0x0386CA80> is not JSON serializable
Despite extensive research and testing, I've been unable to figure out how to resolve. Perhaps we can have a brief discussion this morning when we're both in the office.
from deltacode.
I am assuming you are using print statements/printing out the json to test your results.
Two options:
-
ignore this for now and do all your testing and debugging via the test suite.
-
modify the json output function to account for this new Licenses field. IMO this is not the best as we do not really want to output this license information at all, we just want to use it for comparing two File objects
from deltacode.
45 tests pass, 2 fail, both throwing variations on TypeError: <deltacode.models.License instance at 0x046D1508> is not JSON serializable
at this line: json_output = json.dumps(dict_output)
. Testing by running some deltacode command-line tests produces the same error.
from deltacode.
what tests specifically
from deltacode.
def test_DeltaCode_json_file_added(self):
new_scan = self.get_test_loc('deltacode/new_added1.json')
old_scan = self.get_test_loc('deltacode/old_added1.json')
result = DeltaCode(new_scan, old_scan)
dict_output = result.to_dict()
json_output = json.dumps(dict_output)
loaded_json = json.loads(json_output)
assert loaded_json['deltas_count'] == 9
assert loaded_json['deltacode_stats']['added'] == 1
assert loaded_json['deltacode_stats']['modified'] == 0
assert loaded_json['deltacode_stats']['removed'] == 0
assert loaded_json['deltacode_stats']['unchanged'] == 8
def test_DeltaCode_json_file_modified(self):
new_scan = self.get_test_loc('deltacode/new_modified1.json')
old_scan = self.get_test_loc('deltacode/old_modified1.json')
result = DeltaCode(new_scan, old_scan)
dict_output = result.to_dict()
json_output = json.dumps(dict_output)
loaded_json = json.loads(json_output)
assert loaded_json['deltas_count'] == 8
assert loaded_json['deltacode_stats']['added'] == 0
assert loaded_json['deltacode_stats']['modified'] == 1
assert loaded_json['deltacode_stats']['removed'] == 0
assert loaded_json['deltacode_stats']['unchanged'] == 7
from deltacode.
@johnmhoran why do we dump as json, then load that same json back to a dict?
from deltacode.
@majurg Looks like that's a result of adapting the tests from the time when the to_dict()
method was to_json()
and produced JSON output. I made the following change to both failing tests and they now pass:
result = DeltaCode(new_scan, old_scan)
# dict_output = result.to_dict()
# json_output = json.dumps(dict_output)
# loaded_json = json.loads(json_output)
loaded_json = result.to_dict()
However, when printing json.dumps(data)
or selecting the JSON output option, I still receive a variation of this error: TypeError: <deltacode.models.License instance at 0x038E3210> is not JSON serializable
.
Clearly I need to fix this, and I'll need your guidance to understand it first. I also need a way to test the new License class. Visualizing it in some way would also be helpful -- right now, when I print data
defined like this: data = delta.to_dict()
, the output for the licenses
field looks like this: 'licenses': <deltacode.models.License instance at 0x03E31A80>
. Not what we want.
from deltacode.
@johnmhoran yes it will always fail to serialize to json because we do not have a to_dict
function for License object. So, thats simple enough to add that, and then call License.to_dict()
in the appropriate location in DeltaCode object's to_dict().
Adding a __repr__
function to our License object fixes the object hash from being printed.
As far as tests go, they should be similar to the File object tests we have.
from deltacode.
@majurg Thanks for your help earlier. I've revised the License
constructor, added to_dict()
methods for the File
and License
classes, and referred to the File
to_dict()
method inside the DeltaCode
to_dict()
method. Not only do all 47 tests now pass, but both the printed JSON and the generated JSON files throw no errors, look good, and include the ScanCode licenses
key/value pairs in each of the old
and new
Delta
objects.
Pushing shortly, and then I'll start working on some new tests for the latest set of changes.
from deltacode.
@johnmhoran The next step should be updating our DeltaCode.deltas field to an dictionary where the keys are 'added', 'removed', 'modified', 'unmodified'
, and the values are list of corresponding delta objects.
from deltacode.
Excellent, @majurg . I'll get started on it.
from deltacode.
@majurg I was about to push my commit when I reread your instruction and realized that I misinterpreted the goal. Rather than modifying the DeltaCode.deltas
field, I've modified the deltas
key in the dictionary generated by DeltaCode.to_dict()
to group Delta
objects by a category-based key (and modified generate_csv()
and the test helper function to accommodate the changes).
I'll need to revert all those changes and tackle DeltaCode.deltas
instead but, before I do, I want to confirm that that's what I should do.
from deltacode.
Yes, the deltas
field should be changed; this will subsequently require an update of to_dict
and perhaps generate_csv as well
from deltacode.
we can have a session if you want as well.
from deltacode.
@majurg Sorry about misinterpreting what really was a clear instruction. I've been so focused on the JSON deltas
key and its Delta
objects that when I saw deltas
, that's the association I automatically made.
I think I have a good sense of how to approach DeltaCode.deltas/determine_delta()
but a brief session would be helpful -- I'm available whenever it's convenient for you.
from deltacode.
@majurg Given the goals we discussed yesterday, is there any reason for us to retain the license_changes
key/value pair in our determine_delta()
OrderedDict?
deltacode/src/deltacode/__init__.py
Lines 49 to 55 in 196fabd
from deltacode.
No, you can remove that.
from deltacode.
Thanks, will do.
from deltacode.
@majurg I think this is ready for a new PR -- let me know if you agree and if so I'll open the PR.
from deltacode.
@johnmhoran comments left on latest commit
from deltacode.
@johnmhoran you can open a PR when you are ready so this can get merged in.
from deltacode.
@majurg Just saw your message. Excellent. PR enroute.
What would you like me to focus my efforts on for the rest of the week? I'm available to discuss via uberconf at your convenience.
from deltacode.
merged #22, closing
from deltacode.
Related Issues (20)
- Add limited delta stats after running HOT 1
- Upgrade DeltaCode to Python 3 HOT 4
- Create DeltaCode documentation on ReadTheDocs HOT 6
- Linux and MacOs buid is showing some warnings in TravisCI HOT 3
- configure failed,why? HOT 10
- Adding Azure Pipelines HOT 1
- In the output content format ‘[ ’how to understand? HOT 2
- Configure failed with "file setup.py not found" HOT 11
- Azure Piplines seems to be filing for Windows Test Jobs
- Create objects to score scan information HOT 6
- Add function to handle loading 2 codebases. HOT 2
- Remove redundant Scan model HOT 1
- Remove redundant File model
- Add Dockerfile
- Separate csv formatted output in Deltacode
- Merge DeltaCode in ScanCode TK
- Update structure to use the https://github.com/nexB/skeleton
- RFC: DeltaCode next! and roadmap HOT 2
- Update documentation after deltacode gets merge in scancode-toolkit HOT 2
- License detection diffs are incorrect HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from deltacode.