GithubHelp home page GithubHelp logo

Merging Łukasz' fork about facade HOT 54 CLOSED

brianwarner avatar brianwarner commented on May 28, 2024
Merging Łukasz' fork

from facade.

Comments (54)

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Most of Your changes will be flawless.
I'm installing it from Dockerfile.
Each time I rebuild image it is regenerated from scratch.
I also need totally automatic setup of ALL without ANY user intervention.
This is why I've added some files - most notably utilities/automatic_setup.py
It uses command line arguments one by one and assumes they're answers to questions asked by setup.py
See how they're used there:
Dockerfile
Special: facade_setup.sh:
This line

python automatic_setup.py c yes facade facade localhost facade no yes admin [email protected] admin admin || exit 1

I think I'll make it without bigger problems.

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

As for the automatic_setup.py, I wonder if a new command line flag for headless input would solve that issue gracefully? If the current version of setup.py doesn't receive user input during database info, it just creates random strings and uses those for the db's name, db username, and db password. Perhaps a flag could simply suppress the raw_input prompts? I think it would work, so long as you have a secure way to pass it the root password and your preferred website credentials.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

I'll check, but I if won't expose any other port than website port from inside docker container - then this is not a problem at all.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Currently I have a problems with doing any python-mysqldb call because of utf8mb4.
Investigating.
Seems like Debian jessie used in Dockerfile has too low python-mysqldb version (1.2.3 while it should have at least 1.2.5).
Merging of files is ready.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Status update:

utf8mb4 is OK on jessie when:
Using MySQL server 5.7
Using python-mysqldb 1.2.7

To do so I had to add special server source (to force Jessie not to use default mysql-5.5):

# Initial update apt-get
RUN apt-key adv --keyserver pgp.mit.edu --recv-keys 5072E1F5
RUN echo 'deb http://repo.mysql.com/apt/debian jessie mysql-5.7' > /etc/apt/sources.list.d/mysql.list
RUN apt-get update

And configure debian backports to make installing newer python-mysqldb possible:

RUN echo 'deb http://ftp.debian.org/debian jessie-backports main' > /etc/apt/sources.list.d/sources.list
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y `cat requirements_stable.txt`
RUN DEBIAN_FRONTEND=noninteractive apt-get -t jessie-backports install -y `cat requirements_backports.txt`

And file requirements_backports.txt contains only python-mysqldb moved out from requirements_stable.txt

Now when I have updated my cncf/gitdm email mappings, I have few entries with longer Affiliations and VARCHAR(64) is not enough, so I had to change it to VARCHAR(128) in automatic_setup.py too.

They caused

python import_gitdm_configs.py -a ../cncf-config/aliases -e ../cncf-config/email-map -e ../cncf-config/domain-map -e ../cncf-config/group-map

to fail.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Now I have this issue (imported CNCF gitdm settings and attempted to start facade):

* Running facade-worker.py
Traceback (most recent call last):
  File "facade-worker.py", line 1355, in <module>
    fill_empty_affiliations()
  File "facade-worker.py", line 906, in fill_empty_affiliations
    cursor.execute(reset_committer, (discover_alias(changed_alias['alias'],changed_alias['alias'])))
TypeError: discover_alias() takes exactly 1 argument (2 given)
Facade complete

Will see what's up there
...
Typo:

cursor.execute(reset_committer, (discover_alias(changed_alias['alias'],changed_alias['alias'])))

instead of

cursor.execute(reset_committer, (discover_alias(changed_alias['alias']),changed_alias['alias']))

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Importing of CSV files is not working (from old version).
It is also not working from new version too (I mean I've added project, exported configuration to csv, removed that project to and them imported back CSV file and it was not working.
Apache logs:

[Wed Jun 21 09:15:14.411012 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: confirmimport_github in /var/www/html/manage.php on line 486, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.411017 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: import_repos in /var/www/html/manage.php on line 529, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.411022 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: confirmnew_alias in /var/www/html/manage.php on line 551, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.411027 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: new_alias in /var/www/html/manage.php on line 582, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.411032 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: delete_alias in /var/www/html/manage.php on line 601, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.411037 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: confirmnew_affiliation in /var/www/html/manage.php on line 617, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.411042 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: new_affiliation in /var/www/html/manage.php on line 664, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.411051 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: delete_affiliation in /var/www/html/manage.php on line 693, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.411056 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: export_projects_csv in /var/www/html/manage.php on line 708, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.411061 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: import_projects_csv in /var/www/html/manage.php on line 726, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.411066 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: export_repos_csv in /var/www/html/manage.php on line 768, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:14.419474 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: repo in /var/www/html/repositories.php on line 17, referer: http://85.222.70.18:8888/configure
[Wed Jun 21 09:15:17.818091 2017] [:error] [pid 46] [client 172.17.0.1:50222] PHP Notice:  Undefined index: repo in /var/www/html/repositories.php on line 17, referer: http://85.222.70.18:8888/repositories

I don't have time to debug all this stuff, I'll add my repos manually via SQL.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Managed to import repositories from SQL dump, but You need to take a look why importing is broken.
My facade is here: http://85.222.70.18:8888
You can even SSH into it:

ssh -o "Port=2222" [email protected]

Password is "facade".
when inside shell, You can become root:
sudo bash
Password "facade"
Then go to /facade or /var/www/html (which is a copy of /facade) and figure out what is happening.

To login to WWW management use:
login "admin"
password "admin"

But please let me know when You are going to do something because I need to finish single facade run on all my 69 repos and save DB dump after that.
And it takes hours to finish.

I'll add information when this single analysis finishes.

Summing up:
I've detected 3 issues that You should fix (imho):

  • Change most VARCHAR(64) column into VARCHAR(128) - I've just replaced all.
  • facade-worker.py, line 906 - typo (which calls discover_alias() with two args due to wrong position of ))
  • Broken CSV import (it is not working for me at all, and Apache logs say that You're referring to nonexisting $_POST[...] indices in PHP.

Also add info in README.md that utf8md4 requires:

  • MySQL > 5.5 (I've used 5.7)
  • python-mysqldb >= 1.2.5 (I've used 1.2.7)

It also seems to be terribly slow...
It is almost 3 hours running * Filling empty affiliations
I know this is on 70 repos and 150M line sof code but....

CNCF/Gitdm fork started on the same 70 repos finishes after about 3 minutes.

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

Thanks for all of this!

  • VARCHAR(64) -> VARCHAR(128) is fine with me. I hadn't hit any email addresses quite that long yet, but there's no harm in doing this.
  • Thanks for the catch on line 906, I'll push that fix.
  • The undefined indices notice is because I explicitly check for POST variables which may or may not be defined. This is because I don't want there to be a default action for manage.php - i.e., without the right args it should just exit. It occurs to me that using ISSET would prevent it from throwing the error. However, this shouldn't be sufficient to cause the CSV import to fail. It was working for me before I pushed it, but I will check again today to be sure. Something sounds not right.
  • I'll add the README info as well.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Not sure why but facade still process affiliations, it it that slow, or maybe some infinite loop.
I don't want to CTRL+C it because I've waited so long.
I'll leave it running...
:/

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

BTW: I have log level one more than error, where are logs?
Maybe I'll find something there....

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

You want the contents of utility_log in the database, although it also echoes anything it's logging to STDERR as well. If you run at Debug level, it'll tell you which person it's working on.

I hadn't added any sort of log viewer because up until this point I thought I was the only person who needed that, and I always have a database connection open :-) It would be both useful and trivial to add though, and I'll put it on my list.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

No, I've figured out this already.
Was just doing desc on all tables ...
Hmmm I see one problem:

mysql> select * from analysis_data limit 1;
+----------+------------------------------------------+--------------+--------------------+--------------------+-------------+--------------------+----------------+---------------------+--------------------+----------------+-----------------------+-------+---------+------------+--------------------+----------------------------+
| repos_id | commit                                   | author_name  | author_raw_email   | author_email       | author_date | author_affiliation | committer_name | committer_raw_email | committer_email    | committer_date | committer_affiliation | added | removed | whitespace | filename           | date_attempted             |
+----------+------------------------------------------+--------------+--------------------+--------------------+-------------+--------------------+----------------+---------------------+--------------------+----------------+-----------------------+-------+---------+------------+--------------------+----------------------------+
|        1 | 2fd41d83e9ddeb700170afbc77de7d3cb3c99eb0 | Daniel Smith | [email protected] | [email protected] | 2016-05-26  | NULL               | Daniel Smith   | [email protected]  | [email protected] | 2016-05-27     | NULL                  |    12 |      12 |          4 | mungers/e2e/e2e.go | 2017-06-21 09:46:19.543024 |
+----------+------------------------------------------+--------------+--------------------+--------------------+-------------+--------------------+----------------+---------------------+--------------------+----------------+-----------------------+-------+---------+------------+--------------------+----------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from analysis_data where author_affiliation is not null;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.52 sec)

mysql> select count(*) from analysis_data where author_affiliation is not null;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.50 sec)

mysql> select count(*) from analysis_data where committer_affiliation is not null;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.51 sec)

mysql> select count(*) from analysis_data;
+----------+
| count(*) |
+----------+
|  1182343 |
+----------+
1 row in set (0.56 sec)

mysql> 

It seems stalled ...
I need to add debug info to see where it halted.

And after I've stopped it I see:

Something is already running, aborting maintenance and analysis

I need to dive into it.
I see I need to use -f and update status on DB manually to Idle

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

OK, why use db.commit() inside loop?
I think You should commit in bunches - say after all loop.
It should speedup a lot
Eventually before loop:
begin()
on error rollback()
on all succeeded commit()
I've managed to make it trial & error via:
mysql: update settings set value = 'Idle' where setting = 'utility_status'; each time and then: python facade-worker.py -f

each update takes about 1 second ?? wtf?
It should take 20ms max.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Hmmm crazy..... do You have indices on columns used in queries?
Both:

UPDATE analysis_data SET committer_affiliation = NULL WHERE committer_email LIKE CONCAT('%','[email protected]')

and

UPDATE analysis_data SET committer_affiliation = NULL WHERE committer_email = '[email protected]';

Takes 1s in mysql.
Did You considered using one transaction for batch processing:
http://www.zyxware.com/articles/2599/how-to-enable-transactions-with-mysql-and-python

Say for entire this loop: for changed_affiliation in changed_affiliations:
Before loop do:

db.autocommit(False)
c = db.cursor()

Use c inside loop but do NOT commit c.commit()
Call commit after loop:

db.commit()
db.autocommit(True)

ANYWAY: problem IS missing index on committer_email, author_email columns.
Probably indices are missing everywhere!

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

I would add indices on ALL analysis_data coulms that are used in any WHERE (are used for searching).
It will speedup 100x imho or more.
Indices should be created in setup.py

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Well. Lets leave transactions.... indices are problem see this:

mysql> ALTER TABLE analysis_data ADD INDEX (committer_email);
Query OK, 0 rows affected (9.61 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> UPDATE analysis_data SET committer_affiliation = NULL WHERE committer_email LIKE CONCAT('%','[email protected]');
Query OK, 0 rows affected (1.13 sec)
Rows matched: 0  Changed: 0  Warnings: 0

mysql> UPDATE analysis_data SET committer_affiliation = NULL WHERE committer_email = '[email protected]';
Query OK, 0 rows affected (0.00 sec)
Rows matched: 0  Changed: 0  Warnings: 0

mysql> UPDATE analysis_data SET committer_affiliation = NULL WHERE committer_email like '[email protected]';
Query OK, 0 rows affected (0.03 sec)
Rows matched: 0  Changed: 0  Warnings: 0

mysql> UPDATE analysis_data SET committer_affiliation = NULL WHERE committer_email like '%[email protected]';
Query OK, 0 rows affected (1.24 sec)
Rows matched: 0  Changed: 0  Warnings: 0

I've added index. Searching uning index is superfast but using LIKE '%[email protected]' is not.
Why is LIKE used there?

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

I'll give indices a try - but.
Why do You use LIKE.
It will be 1000x slower because LIKE cannot use index (imho) - need to google for that.

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

I think that's already fixed upstream. It should be an exact match.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

OK so I'll check upstream.

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

However, for domain matching I used LIKE because it should also match subdomains. e.g., ibm.com should also match us.ibm.com

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

OK, I'm talking about developers emails.
Also You should consider adding INDEX on every column in every table that is used for searching.
I don't know Your project that well.
My suggestion is to:
Search for all queries conditions in all PY and PHP files (search for where, having etc.).
For all columns used there do:
ALTER TABLE table_name ADD INDEX (column_name); in setup.py file.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Took a look on facade-worker.py.
3 tables will need indices:
analysis_data on: *_email, *_affiliation, *_date, repos_id, commit, raw
aliases: alias
affiliations: domain

Also I suggest (then it can use indices 100-1000x faster):

                set_author_to_null = ("UPDATE analysis_data SET author_affiliation = NULL "
-                       "WHERE author_email LIKE CONCAT('%%',%s)")
+                       "WHERE author_email = %s")
 
                cursor.execute(set_author_to_null, (changed_affiliation['domain'], ))
                db.commit()
 
                set_committer_to_null = ("UPDATE analysis_data SET committer_affiliation = NULL "
-                       "WHERE committer_email LIKE CONCAT('%%',%s)")
+                       "WHERE committer_email = %s")
 
                cursor.execute(set_committer_to_null, (changed_affiliation['domain'], ))
                db.commit()
@@ -881,13 +881,13 @@ def fill_empty_affiliations():
                        changed_alias['alias'])
 
                set_author_to_null = ("UPDATE analysis_data SET author_affiliation = NULL "
-                       "WHERE author_raw_email LIKE CONCAT('%%',%s)")
+                       "WHERE author_raw_email = %s")
 
                cursor.execute(set_author_to_null,(changed_alias['alias'], ))
                db.commit()
 
                set_committer_to_null = ("UPDATE analysis_data SET committer_affiliation = NULL "
-                       "WHERE committer_raw_email LIKE CONCAT('%%',%s)")
+                       "WHERE committer_raw_email = %s")
 
                cursor.execute(set_committer_to_null, (changed_alias['alias'], ))
                db.commit()

I'll test this in docker now.
I'll add code to create indices to setup.py

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

This is what I've added to setup.py

        # `analysis_data` table indices:
        cursor.execute("alter table analysis_data add index(repos_id)")
        cursor.execute("alter table analysis_data add index(commit)")
        cursor.execute("alter table analysis_data add index(author_raw_email)")
        cursor.execute("alter table analysis_data add index(author_email)")
        cursor.execute("alter table analysis_data add index(author_date)")
        cursor.execute("alter table analysis_data add index(author_affiliation)")
        cursor.execute("alter table analysis_data add index(committer_raw_email)")
        cursor.execute("alter table analysis_data add index(committer_email)")
        cursor.execute("alter table analysis_data add index(committer_date)")
        cursor.execute("alter table analysis_data add index(committer_affiliation)")
(...)
        # `aliases` table indices: 
        cursor_people.execute("alter table aliases add index(canonical)")
        cursor_people.execute("alter table aliases add index(alias)")
(...)
                # `affiliations` table indices: 
                cursor_people.execute("alter table affiliations add index(domain)")
                cursor_people.execute("alter table affiliations add index(affiliation)")

And this :D :P

                        "('[email protected]','The Linux Foundation','2011-01-06'),"
+                       "('[email protected]','CNCF','2017-03-01'),"
                        "('[email protected]','IBM','2006-05-20')")

Actually syntax will be a bit different, final versions will live there:
https://github.com/lukaszgryglicki/facade/blob/master/utilities/automatic_setup.py
https://github.com/lukaszgryglicki/facade/blob/master/utilities/facade-worker.py

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Actually aliases and affiliations already have indices because of UNIQUE in table creation.
So only analysis_data need to create indices

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

YES FINALLY, I see about 300x speedup on affiliations computing!!

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

not yet committted

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Took Completed in 0:09:31 instead of infinity?? (I've gave up after about 6 hours today) to compute affiliations.
On such a big data (an there are 8800+ affliations in email-mapping):
screen shot 2017-06-21 at 19 49 16

You need to take a look on those 2 files:
https://github.com/lukaszgryglicki/facade/blob/master/utilities/automatic_setup.py
https://github.com/lukaszgryglicki/facade/blob/master/utilities/facade-worker.py

Most important part is:

        # `analysis_data` table indices:
        cursor.execute("alter table `analysis_data` add index `i_repos_id` (repos_id)")
	db.commit()
        cursor.execute("alter table `analysis_data` add index `i_commit` (commit)")
	db.commit()
        cursor.execute("alter table `analysis_data` add index `i_author_raw_email` (author_raw_email)")
	db.commit()
        cursor.execute("alter table `analysis_data` add index `i_author_email` (author_email)")
	db.commit()
        cursor.execute("alter table `analysis_data` add index `i_author_date` (author_date)")
	db.commit()
        cursor.execute("alter table `analysis_data` add index `i_author_affiliation` (author_affiliation)")
	db.commit()
        cursor.execute("alter table `analysis_data` add index `i_committer_raw_email` (committer_raw_email)")
	db.commit()
        cursor.execute("alter table `analysis_data` add index `i_committer_email` (committer_email)")
	db.commit()
        cursor.execute("alter table `analysis_data` add index `i_committer_date` (committer_date)")
	db.commit()
        cursor.execute("alter table `analysis_data` add index `i_committer_affiliation` (committer_affiliation)")
	db.commit()

ANd in second file replaced LIKE CONCAT('%%',%s) with = %s - this is good imho - it is an exact email matching there.

Data IS already committed.

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Now when I run python facade-worker.py -r (last stage manually) I see:

* Option set: rebuilding caches.
* Running facade-worker.py
* Caching unknown affiliations and web data for display
* Deleting old cached unknown affiliations and web data
* Deleting old cached unknown affiliations and web data (complete)
Traceback (most recent call last):
  File "facade-worker.py", line 1358, in <module>
    rebuild_unknown_affiliation_and_web_caches()
  File "facade-worker.py", line 1140, in rebuild_unknown_affiliation_and_web_caches
    cursor.execute(cache_projects_by_month)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 226, in execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorvalue
_mysql_exceptions.InternalError: (3, "Error writing file '/tmp/MYpaFxqR' (Errcode: 28 - No space left on device)")

I give up for today.
Obviously I DO have space left in /tmp/ - over 2G!

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Give my indices a shot - You will be scared how faster it is :P

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

This is NOT dataspace problem. Never.
I wonder why error message is like this.
As for CSV imports.... I'll merge Your fixes.
I've just got my project + repos config from my prefious mysqldump so no problem.
Imported manually via mysql < blablabla.sql

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Gotta go, see You tomorrow.
I'm working 13,5 hours already and I'm dead wasted.

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Would be also nice if You add percent statictics to Facade some day, something like gitdm outputs.
We're using custom gitdm for CNCF (kubenrtes repos), like here:
https://github.com/cncf/gitdm
Specially report like this one:
https://github.com/cncf/gitdm/blob/master/all.txt

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Now rerun facade (full run) on all data 157M lines of code:

Completed in 2:34:57
Facade complete

screen shot 2017-06-22 at 03 15 21
screen shot 2017-06-22 at 03 15 43
screen shot 2017-06-22 at 03 16 00
screen shot 2017-06-22 at 03 16 43
screen shot 2017-06-22 at 03 16 55

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

That's great! I'm working on the indices now. I did a bit of reading, and will share what I learned and how I'm using it to sanity check.

Single column indexing seems pretty straight-forward, but multi-column gets a little more interesting. The order of the columns in each Index statement affects whether a given INDEX will be used. In particular, if you use a multi-column index (e.g., INDEX (col1,col2,col3)), the index will be used for queries that include all three columns in the WHERE clause, or which are a subset of the columns starting from the left. So this will cover (col1), (col1,col2), and (col1,col2,col3). It does not include (col2), (col2, col3), or (col1,col3).

As such, I'm going through and making sure the order of the columns in the existing UNIQUE index columns match what is in the queries. I've found a few cases where I'll have to reorder them. I'm also going through each SELECT statement to add appropriate multi-column INDEXes. I am trying to minimize the number of new columns if at all possible.

After I do this in facade-worker.py, I'll move on to the web code. I'm putting these in the setup.py CREATE statements, but will send you a list of ALTER statements for your existing tables that will match what I'm putting in setup.py (if you don't want to rebuild the data)

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

Also creating final cache was quite slow - it uses 4 or 5 complex report like queries.
But for me it is fast enough now (2,5 hours for all of my data for all time).

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

I would give a name to all indices.
like alter table tbname add index iname(col1, col2, ...);

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

lukaszgryglicki avatar lukaszgryglicki commented on May 28, 2024

It is useful when You go to mysql> and ask:
show index from table_name;

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

from facade.

brianwarner avatar brianwarner commented on May 28, 2024

Cleaning up this open but dormant ticket.

from facade.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.