GithubHelp home page GithubHelp logo

Comments (3)

Artanias avatar Artanias commented on August 29, 2024

Выглядит страшно, но парсится.

>>> import pandas as pd
>>> data = pd.DataFrame({"date": "01/09/2023 19:20:44", "first_path": "https://github.com/Artanias/games/blob/master/Sudoku/Model_sudoku_tf.py", "second_path": "https://github.com/Artanias/games/blob/master/Sudoku/Model_num_tf.py", "first_heads": [["Assign", "Assign", "Assign", "Assign", "Assign", "Assign", "Assign", "Expr", "Expr", "Expr"]], "second_heads": [["Assign", "Assign", "Assign", "Assign", "Assign", "Assign", "Assign", "Assign", "Expr", "Expr", "Expr"]], "jakkar": 1.0, "operators": 0.875, "keywords": 1.0, "literals": 0.9230769230769231, "weighted_average": 0.9632867132867134, "first_modify_date": "2020-07-25T11:52:12Z", "second_modify_date": "2020-08-04T07:40:29Z", "struct_similarity": 0.8804347826086957, "compliance_matrix": [[[[9, 9], [9, 13], [9, 13], [9, 17], [9, 20], [9, 20], [9, 20], [9, 20], [7, 18], [7, 30], [7, 9]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 17], [11, 19], [11, 19], [17, 17], [16, 21], [16, 21], [16, 21], [13, 24], [11, 22], [11, 34], [7, 17]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 18], [12, 19], [12, 19], [13, 22], [13, 25], [13, 25], [13, 25], [18, 20], [8, 26], [12, 34], [7, 18]], [[7, 18], [9, 20], [9, 20], [11, 22], [13, 23], [13, 23], [13, 23], [9, 27], [16, 16], [14, 30], [7, 16]], [[7, 30], [11, 30], [11, 30], [11, 34], [16, 32], [16, 32], [16, 32], [12, 36], [14, 30], [28, 28], [7, 28]], [[7, 9], [7, 13], [7, 13], [7, 17], [7, 20], [7, 20], [7, 20], [7, 20], [7, 16], [7, 28], [7, 7]]]]})
>>> data.to_csv('data.csv', sep=';')
>>> pd.read_csv('data.csv', sep=';', index_col=0)
                  date                                         first_path  ... struct_similarity                                  compliance_matrix
0  01/09/2023 19:20:44  https://github.com/Artanias/games/blob/master/...  ...          0.880435  [[[9, 9], [9, 13], [9, 13], [9, 17], [9, 20], ...

[1 rows x 14 columns]
>>> pd.read_csv('data.csv', sep=';', index_col=0)['compliance_matrix'][0]
'[[[9, 9], [9, 13], [9, 13], [9, 17], [9, 20], [9, 20], [9, 20], [9, 20], [7, 18], [7, 30], [7, 9]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 17], [11, 19], [11, 19], [17, 17], [16, 21], [16, 21], [16, 21], [13, 24], [11, 22], [11, 34], [7, 17]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 18], [12, 19], [12, 19], [13, 22], [13, 25], [13, 25], [13, 25], [18, 20], [8, 26], [12, 34], [7, 18]], [[7, 18], [9, 20], [9, 20], [11, 22], [13, 23], [13, 23], [13, 23], [9, 27], [16, 16], [14, 30], [7, 16]], [[7, 30], [11, 30], [11, 30], [11, 34], [16, 32], [16, 32], [16, 32], [12, 36], [14, 30], [28, 28], [7, 28]], [[7, 9], [7, 13], [7, 13], [7, 17], [7, 20], [7, 20], [7, 20], [7, 20], [7, 16], [7, 28], [7, 7]]]'
>>> import json
>>> json.loads(pd.read_csv('data.csv', sep=';', index_col=0)['compliance_matrix'][0])
[[[9, 9], [9, 13], [9, 13], [9, 17], [9, 20], [9, 20], [9, 20], [9, 20], [7, 18], [7, 30], [7, 9]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 13], [13, 13], [13, 13], [11, 19], [12, 21], [12, 21], [12, 21], [12, 21], [9, 20], [11, 30], [7, 13]], [[9, 17], [11, 19], [11, 19], [17, 17], [16, 21], [16, 21], [16, 21], [13, 24], [11, 22], [11, 34], [7, 17]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 20], [12, 21], [12, 21], [16, 21], [20, 20], [20, 20], [20, 20], [13, 27], [13, 23], [16, 32], [7, 20]], [[9, 18], [12, 19], [12, 19], [13, 22], [13, 25], [13, 25], [13, 25], [18, 20], [8, 26], [12, 34], [7, 18]], [[7, 18], [9, 20], [9, 20], [11, 22], [13, 23], [13, 23], [13, 23], [9, 27], [16, 16], [14, 30], [7, 16]], [[7, 30], [11, 30], [11, 30], [11, 34], [16, 32], [16, 32], [16, 32], [12, 36], [14, 30], [28, 28], [7, 28]], [[7, 9], [7, 13], [7, 13], [7, 17], [7, 20], [7, 20], [7, 20], [7, 20], [7, 16], [7, 28], [7, 7]]]

Думаю можно будет похимичить с типами и может не придётся json.loads делать.

Пока есть такие поля:

>>> data.columns
Index(['date', 'first_path', 'second_path', 'first_heads', 'second_heads',
       'jakkar', 'operators', 'keywords', 'literals', 'weighted_average',
       'first_modify_date', 'second_modify_date', 'struct_similarity',
       'compliance_matrix'],
      dtype='object')

date - дата проверка, при кешировании может быть полезно, т.к. есть проблема как раз таки с устаревание данных, может хеш ещё брать.
first_path, second_path - путь до первой работы и второй соответственно либо на системе, либо на GitHub.
jakkar, operators, keywords, literals - результат по быстрым метрикам.
weighted_average - средневзвешенное среднее быстрых метрик.
first_modify_date, second_modify_date - дата коммита с гита, для локальных не реализовано, т.к. можно проверять и без гита и нужна хитрая проверка.
first_heads, second_heads - названия объектов первого уровня первого и второго скрипта соответственно (названия функций), сами по себе бесполезны, полезны вместе с результатом в compliance_matrix.
struct_similarity - результат сравнения структур двух работ.

FYI, @mirrin00, @zmm.

from code-plagiarism.

Artanias avatar Artanias commented on August 29, 2024

При этом способ сохранения в json сейчас не будет во-первых захломлять память, но будет захломлять количество файлов в одной папке, что также плохо, в то время как csv в перспективе будет заполнять память, поэтому csv всё-таки также пока будет промежуточный вариант между чем-то лучше (БД).

Пока сделаю на выбор либо json, либо csv.

from code-plagiarism.

Artanias avatar Artanias commented on August 29, 2024

@mirrin00, @zmm две проверки с пустым кэшом в виде csv файла и заполненным:

root@380c03d96540:/usr/src/codeplag# codeplag --verbose check --extension py --mode one_to_one --directories src/ src/codeplag/ src/webparsers/
[WARNING] 14:37 - Env file not found or not a file. Trying to get token from environment.
[DEBUG] 14:37 - Starting codeplag util ...
[DEBUG] 14:37 - Mode: one_to_one; Extension: py.
[INFO] 14:37 - Starting searching for plagiarism ...
[DEBUG] 14:37 - Getting works features from src
[DEBUG] 14:37 - Getting works features from src/codeplag
[DEBUG] 14:37 - Getting works features from src/webparsers
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/codeplag/consts.py
src/codeplag/consts.tmp.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity   100.00%    100.00%   100.00%   100.00%           100.00%

AdditionalMetrics:  Structure
Similarity            100.00%

           AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign     57.89%     57.89%     73.68%    100.00%     73.68%     57.89%     70.00%     57.89%     32.50%     57.89%     52.00%     52.00%     52.00%     62.96%     32.69%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     73.33%     73.33%     93.33%     70.00%     93.33%     73.33%    100.00%     73.33%     36.11%     73.33%     61.90%     61.90%     61.90%     60.00%     25.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     32.35%     32.35%     37.14%     32.50%     37.14%     32.35%     36.11%     32.35%    100.00%     32.35%     55.88%     55.88%     55.88%     51.28%     37.70%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     44.00%     44.00%     56.00%     62.96%     56.00%     44.00%     60.00%     44.00%     51.28%     44.00%     69.23%     69.23%     69.23%    100.00%     50.00%
AnnAssign     22.00%     22.00%     25.49%     32.69%     25.49%     22.00%     25.00%     22.00%     37.70%     22.00%     38.00%     38.00%     38.00%     50.00%    100.00% 

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/codeplag/consts.py
src/codeplag/consts.tmp.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity   100.00%    100.00%   100.00%   100.00%           100.00%

AdditionalMetrics:  Structure
Similarity            100.00%

           AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign     57.89%     57.89%     73.68%    100.00%     73.68%     57.89%     70.00%     57.89%     32.50%     57.89%     52.00%     52.00%     52.00%     62.96%     32.69%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     73.33%     73.33%     93.33%     70.00%     93.33%     73.33%    100.00%     73.33%     36.11%     73.33%     61.90%     61.90%     61.90%     60.00%     25.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     32.35%     32.35%     37.14%     32.50%     37.14%     32.35%     36.11%     32.35%    100.00%     32.35%     55.88%     55.88%     55.88%     51.28%     37.70%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     44.00%     44.00%     56.00%     62.96%     56.00%     44.00%     60.00%     44.00%     51.28%     44.00%     69.23%     69.23%     69.23%    100.00%     50.00%
AnnAssign     22.00%     22.00%     25.49%     32.69%     25.49%     22.00%     25.00%     22.00%     37.70%     22.00%     38.00%     38.00%     38.00%     50.00%    100.00% 

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/webparsers/async_github_parser.py
src/webparsers/github_parser.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity    73.53%     80.17%    67.57%    79.41%            74.72%

AdditionalMetrics:  Structure
Similarity             55.83%

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/webparsers/async_github_parser.py
src/webparsers/github_parser.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity    73.53%     80.17%    67.57%    79.41%            74.72%

AdditionalMetrics:  Structure
Similarity             55.83%

++++++++++++++++++++++++++++++++++++++++
[DEBUG] 14:38 - Time for all 6.30 s
[INFO] 14:38 - Ending searching for plagiarism ...
[DEBUG] 14:38 - Saving report to the file '/usr/src/codeplag/reports/codeplag_report.csv'
root@380c03d96540:/usr/src/codeplag# codeplag --verbose check --extension py --mode one_to_one --directories src/ src/codeplag/ src/webparsers/
[WARNING] 14:38 - Env file not found or not a file. Trying to get token from environment.
[DEBUG] 14:38 - Starting codeplag util ...
[DEBUG] 14:38 - Mode: one_to_one; Extension: py.
[INFO] 14:38 - Starting searching for plagiarism ...
[DEBUG] 14:38 - Getting works features from src
[DEBUG] 14:38 - Getting works features from src/codeplag
[DEBUG] 14:38 - Getting works features from src/webparsers
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/codeplag/consts.py
src/codeplag/consts.tmp.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity   100.00%    100.00%   100.00%   100.00%           100.00%

AdditionalMetrics:  Structure
Similarity            100.00%

           AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign     57.89%     57.89%     73.68%    100.00%     73.68%     57.89%     70.00%     57.89%     32.50%     57.89%     52.00%     52.00%     52.00%     62.96%     32.69%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     73.33%     73.33%     93.33%     70.00%     93.33%     73.33%    100.00%     73.33%     36.11%     73.33%     61.90%     61.90%     61.90%     60.00%     25.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     32.35%     32.35%     37.14%     32.50%     37.14%     32.35%     36.11%     32.35%    100.00%     32.35%     55.88%     55.88%     55.88%     51.28%     37.70%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     44.00%     44.00%     56.00%     62.96%     56.00%     44.00%     60.00%     44.00%     51.28%     44.00%     69.23%     69.23%     69.23%    100.00%     50.00%
AnnAssign     22.00%     22.00%     25.49%     32.69%     25.49%     22.00%     25.00%     22.00%     37.70%     22.00%     38.00%     38.00%     38.00%     50.00%    100.00% 

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/codeplag/consts.py
src/codeplag/consts.tmp.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity   100.00%    100.00%   100.00%   100.00%           100.00%

AdditionalMetrics:  Structure
Similarity            100.00%

           AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign  AnnAssign
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign     57.89%     57.89%     73.68%    100.00%     73.68%     57.89%     70.00%     57.89%     32.50%     57.89%     52.00%     52.00%     52.00%     62.96%     32.69%
AnnAssign     78.57%     78.57%    100.00%     73.68%    100.00%     78.57%     93.33%     78.57%     37.14%     78.57%     65.00%     65.00%     65.00%     56.00%     25.49%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     73.33%     73.33%     93.33%     70.00%     93.33%     73.33%    100.00%     73.33%     36.11%     73.33%     61.90%     61.90%     61.90%     60.00%     25.00%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     32.35%     32.35%     37.14%     32.50%     37.14%     32.35%     36.11%     32.35%    100.00%     32.35%     55.88%     55.88%     55.88%     51.28%     37.70%
AnnAssign    100.00%    100.00%     78.57%     57.89%     78.57%    100.00%     73.33%    100.00%     32.35%    100.00%     57.89%     57.89%     57.89%     44.00%     22.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     57.89%     57.89%     65.00%     52.00%     65.00%     57.89%     61.90%     57.89%     55.88%     57.89%    100.00%    100.00%    100.00%     69.23%     38.00%
AnnAssign     44.00%     44.00%     56.00%     62.96%     56.00%     44.00%     60.00%     44.00%     51.28%     44.00%     69.23%     69.23%     69.23%    100.00%     50.00%
AnnAssign     22.00%     22.00%     25.49%     32.69%     25.49%     22.00%     25.00%     22.00%     37.70%     22.00%     38.00%     38.00%     38.00%     50.00%    100.00% 

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/webparsers/async_github_parser.py
src/webparsers/github_parser.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity    73.53%     80.17%    67.57%    79.41%            74.72%

AdditionalMetrics:  Structure
Similarity             55.83%

++++++++++++++++++++++++++++++++++++++++
                                        
++++++++++++++++++++++++++++++++++++++++
May be similar:
src/webparsers/async_github_parser.py
src/webparsers/github_parser.py


FastMetrics:  JAKKAR  OPERATORS  KEYWORDS  LITERALS  WEIGHTED_AVERAGE
Similarity    73.53%     80.17%    67.57%    79.41%            74.72%

AdditionalMetrics:  Structure
Similarity             55.83%

++++++++++++++++++++++++++++++++++++++++
[DEBUG] 14:38 - Time for all 0.66 s
[INFO] 14:38 - Ending searching for plagiarism ...
[DEBUG] 14:38 - Nothing new to save to the csv report.
  1. Как можно увидеть увидеть как минимум время на повторные проверки сильно сократилось для такого простого примера, а получение самой информации из файлов, что вроде тоже хочется оптимизировать не так много времени занимает, даже есть соответствующий рисёрч https://github.com/OSLL/code-plagiarism/blob/main/docs/notebooks/time_survey.py.ipynb. На код в 1000 полезных строк 0.14 секунд в среднем уходит для вычленения информации.
  2. Та самая опция и mode one_to_one позволяет добиться как раз тех целей, что были запрошены для вызова из папки с папками, но тут в выводе единственной в ветке поправлено, что работы сами с собой не проверяются а также в stout немного ещё дублирование идёт, но в csv такого не будет.
root@380c03d96540:/usr/src/codeplag# batcat reports/codeplag_report.csv 
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       │ File: reports/codeplag_report.csv
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   │ ;date;first_modify_date;second_modify_date;first_path;second_path;first_heads;second_heads;jakkar;operators;keywords;literals;weighted_average;struct_similarity;compliance_matrix
   2   │ 0;09/09/2023 14:37:55;;;src/codeplag/consts.py;src/codeplag/consts.tmp.py;['AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAss
       │ ign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign'];['AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign
       │ ', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign', 'AnnAssign'];1.0;1.0;1.0;1.0;1.0;1.0;[[[11, 11], [11, 11], [11, 14], [11, 19], [11, 14], [11, 11], [11, 15], [11, 11], [11, 34], [11, 11], [11, 
       │ 19], [11, 19], [11, 19], [11, 25], [11, 50]], [[11, 11], [11, 11], [11, 14], [11, 19], [11, 14], [11, 11], [11, 15], [11, 11], [11, 34], [11, 11], [11, 19], [11, 19], [11, 19], [11, 25], [11, 50]], [
       │ [11, 14], [11, 14], [14, 14], [14, 19], [14, 14], [11, 14], [14, 15], [11, 14], [13, 35], [11, 14], [13, 20], [13, 20], [13, 20], [14, 25], [13, 51]], [[11, 19], [11, 19], [14, 19], [19, 19], [14, 19
       │ ], [11, 19], [14, 20], [11, 19], [13, 40], [11, 19], [13, 25], [13, 25], [13, 25], [17, 27], [17, 52]], [[11, 14], [11, 14], [14, 14], [14, 19], [14, 14], [11, 14], [14, 15], [11, 14], [13, 35], [11,
       │  14], [13, 20], [13, 20], [13, 20], [14, 25], [13, 51]], [[11, 11], [11, 11], [11, 14], [11, 19], [11, 14], [11, 11], [11, 15], [11, 11], [11, 34], [11, 11], [11, 19], [11, 19], [11, 19], [11, 25], [
       │ 11, 50]], [[11, 15], [11, 15], [14, 15], [14, 20], [14, 15], [11, 15], [15, 15], [11, 15], [13, 36], [11, 15], [13, 21], [13, 21], [13, 21], [15, 25], [13, 52]], [[11, 11], [11, 11], [11, 14], [11, 1
       │ 9], [11, 14], [11, 11], [11, 15], [11, 11], [11, 34], [11, 11], [11, 19], [11, 19], [11, 19], [11, 25], [11, 50]], [[11, 34], [11, 34], [13, 35], [13, 40], [13, 35], [11, 34], [13, 36], [11, 34], [34
       │ , 34], [11, 34], [19, 34], [19, 34], [19, 34], [20, 39], [23, 61]], [[11, 11], [11, 11], [11, 14], [11, 19], [11, 14], [11, 11], [11, 15], [11, 11], [11, 34], [11, 11], [11, 19], [11, 19], [11, 19], 
       │ [11, 25], [11, 50]], [[11, 19], [11, 19], [13, 20], [13, 25], [13, 20], [11, 19], [13, 21], [11, 19], [19, 34], [11, 19], [19, 19], [19, 19], [19, 19], [18, 26], [19, 50]], [[11, 19], [11, 19], [13, 
       │ 20], [13, 25], [13, 20], [11, 19], [13, 21], [11, 19], [19, 34], [11, 19], [19, 19], [19, 19], [19, 19], [18, 26], [19, 50]], [[11, 19], [11, 19], [13, 20], [13, 25], [13, 20], [11, 19], [13, 21], [1
       │ 1, 19], [19, 34], [11, 19], [19, 19], [19, 19], [19, 19], [18, 26], [19, 50]], [[11, 25], [11, 25], [14, 25], [17, 27], [14, 25], [11, 25], [15, 25], [11, 25], [20, 39], [11, 25], [18, 26], [18, 26],
       │  [18, 26], [25, 25], [25, 50]], [[11, 50], [11, 50], [13, 51], [17, 52], [13, 51], [11, 50], [13, 52], [11, 50], [23, 61], [11, 50], [19, 50], [19, 50], [19, 50], [25, 50], [50, 50]]]
   3   │ 1;09/09/2023 14:38:01;;;src/webparsers/async_github_parser.py;src/webparsers/github_parser.py;['AnnAssign', 'AsyncGithubParser'];['AnnAssign', 'AnnAssign', 'GitHubParser'];0.7352941176470589;0.801652
       │ 8925619835;0.6756756756756757;0.7941176470588235;0.7472148198934783;0.5582665695557174;[[[11, 11], [11, 11], [11, 2210]], [[8, 2037], [8, 2037], [1521, 2723]]]
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

from code-plagiarism.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.