GithubHelp home page GithubHelp logo

hazyresearch / fonduer Goto Github PK

View Code? Open in Web Editor NEW
402.0 27.0 76.0 11.75 MB

A knowledge base construction engine for richly formatted data

Home Page: https://fonduer.readthedocs.io/

License: MIT License

Python 73.91% Shell 0.05% Makefile 0.06% HTML 25.86% Dockerfile 0.12%
multimodality machine-learning knowledge-base-construction

fonduer's Introduction

Fonduer

CI-CD Code Climate Codecov ReadTheDocs PyPI PyPIVersion GitHubStars License CodeStyle

Fonduer is a Python package and framework for building knowledge base construction (KBC) applications from richly formatted data.

Note that Fonduer is still actively under development, so feedback and contributions are welcome. Submit bugs in the Issues section or feel free to submit your contributions as a pull request.

Getting Started

Check out our Getting Started Guide to get up and running with Fonduer.

Learning how to use Fonduer

The Fonduer tutorials cover the Fonduer workflow, showing how to extract relations from hardware datasheets and scientific literature.

Reference

Fonduer: Knowledge Base Construction from Richly Formatted Data (blog):

@inproceedings{wu2018fonduer,
  title={Fonduer: Knowledge Base Construction from Richly Formatted Data},
  author={Wu, Sen and Hsiao, Luke and Cheng, Xiao and Hancock, Braden and Rekatsinas, Theodoros and Levis, Philip and R{\'e}, Christopher},
  booktitle={Proceedings of the 2018 International Conference on Management of Data},
  pages={1301--1316},
  year={2018},
  organization={ACM}
}

Acknowledgements

Fonduer leverages the work of Emmental and Snorkel.

fonduer's People

Contributors

annelhote avatar bhancock8 avatar hiromuhota avatar j-rausch avatar kaikun213 avatar lilacpps avatar lukehsiao avatar nicholaschiang avatar payalbajaj avatar senwu avatar wajdikhattel avatar yasushimiyata avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fonduer's Issues

Insufficient checking for missing PDFs in parser

Describe the bug
Insufficient missing_pdf checking in parser.py

# Add visual attributes
filename = self.pdf_path + document.name
missing_pdf = (
not os.path.isfile(self.pdf_path)
and not os.path.isfile(filename + ".pdf")
and not os.path.isfile(filename + ".PDF")
and not os.path.isfile(filename)
)
if missing_pdf:
logger.error("Visual parsing failed: pdf files are required")

For example, this misses the case where the pdf_path is a file, but is HTML.

Expected behavior
If a user is using HTML as input, but has visual parsing enabled, they should get an error describing that PDFs are missing.

Additional context
This could cause some users who are not using PDFs as input to get errors when doing a visual parse but not provide a useful error output to indicate that a PDF is missing. See HazyResearch/fonduer-tutorials#12.

Setup mailing list for discussions

Rather than being stuck in GitHub Issues for techincal discussion about Fonduer, we should use a mailing list (more permanence and searchability).

Make fonduer a pip-installable package

Once we have cleaned up the environment vars and dependencies, there should be nothing stopping us from improving usability and making fonduer a pip-installable package.

Transistor Image Tutorial - UnicodeEncodeError from CorpusParser

Hello,

First off, thanks for this interesting library. I have only gone through it a little bit but am quite excited to explore its full capabilities.

I am going through the transistor_image_tutorial at the moment. Running the lines:

corpus_parser = Parser(structural=True, lingual=True, visual=True, pdf_path=pdf_path, flatten=[])
%time corpus_parser.apply(doc_preprocessor, parallelism=PARALLEL)

I get the following UnicodeEncodeError.

transistor_image_tutorial - unicodeencodeerror

As a result, when I execute the next line, I get

Documents: 0
Sentences: 0
Figures: 0

I was wondering if there is a workaround for this or if I needed to do something first to avoid the error and get the same result as the original transistor_image_tutorial file. Thank you.

Integrate new parser to support pdftotree output

TODO:

  • Create simple, verifiable test data for unit testing the parser
  • Get numbers from old parser for comparison
  • Test current version of new parser
  • Rewrite to fix the mismatches. That is, build the document model, and run each paragraph through spacy for phrases, rather than chunking the whole document like was done for CoreNLP.

Switch to spaCy as the default parser

Support using spaCy as the lingual parser for the old parser (i.e. the one that does not support pdftotree output).

TODO:

  • Upgrade to spaCy 2.x (#9)
  • Compare features pre and post spaCy
  • check visual linker for mismatches. Update: (see #12). However, it looks like we don't have unicode issues.

Document Preprocessor for PDF Documents

Hello,

I was wondering if there is a document preprocessor for PDF documents. I tried using the DocPreprocessor command but it looks like the parse_file function has not been built out yet. In the tutorial notebooks, it looks like both html files and pdf files are processed together, but I was wondering how I would go about parsing sentences for only pdf files. If there is some documentation I should be referring to, please let me know. Thanks.

Fonduer max_storage_temp_tutorial error while parsing html files

To Reproduce

While doing max_storage_temp_tutorial tutorial in the attached Jupyter notebook, I get an error while trying to execute the following code:

corpus_parser = Parser(structural=True, lingual=True, visual=True, pdf_path=pdf_path)
%time corpus_parser.apply(doc_preprocessor, parallelism=PARALLEL)

Expected behavior
To complete parsing without any issue

Error Logs/Screenshots
The following is the error that I got:

UnicodeEncodeError: 'ascii' codec can't encode character '\uf0b7' in position 6282: ordinal not in range(128)

The following is the complete error stacktrace

[INFO] fonduer.utils.udf - Clearing existing...
[INFO] fonduer.utils.udf - Running UDF...
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<timed eval> in <module>()

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/fonduer/utils/udf.py in apply(self, xs, clear, parallelism, progress_bar, count, **kwargs)
     48         self.logger.info("Running UDF...")
     49         if parallelism is None or parallelism < 2:
---> 50             self.apply_st(xs, progress_bar, clear=clear, count=count, **kwargs)
     51         else:
     52             self.apply_mt(xs, parallelism, clear=clear, **kwargs)

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/fonduer/utils/udf.py in apply_st(self, xs, progress_bar, count, **kwargs)
     81 
     82         # Commit session and close progress bar if applicable
---> 83         udf.session.commit()
     84         if pb:
     85             pb.close()

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/orm/session.py in commit(self)
    941                 raise sa_exc.InvalidRequestError("No transaction is begun.")
    942 
--> 943         self.transaction.commit()
    944 
    945     def prepare(self):

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/orm/session.py in commit(self)
    465         self._assert_active(prepared_ok=True)
    466         if self._state is not PREPARED:
--> 467             self._prepare_impl()
    468 
    469         if self._parent is None or self.nested:

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/orm/session.py in _prepare_impl(self)
    445                 if self.session._is_clean():
    446                     break
--> 447                 self.session.flush()
    448             else:
    449                 raise exc.FlushError(

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/orm/session.py in flush(self, objects)
   2252         try:
   2253             self._flushing = True
-> 2254             self._flush(objects)
   2255         finally:
   2256             self._flushing = False

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/orm/session.py in _flush(self, objects)
   2378         except:
   2379             with util.safe_reraise():
-> 2380                 transaction.rollback(_capture_exception=True)
   2381 
   2382     def bulk_save_objects(

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py in __exit__(self, type_, value, traceback)
     64             self._exc_info = None   # remove potential circular references
     65             if not self.warn_only:
---> 66                 compat.reraise(exc_type, exc_value, exc_tb)
     67         else:
     68             if not compat.py3k and self._exc_info and self._exc_info[1]:

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/util/compat.py in reraise(tp, value, tb, cause)
    247         if value.__traceback__ is not tb:
    248             raise value.with_traceback(tb)
--> 249         raise value
    250 
    251 else:

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/orm/session.py in _flush(self, objects)
   2342             self._warn_on_events = True
   2343             try:
-> 2344                 flush_context.execute()
   2345             finally:
   2346                 self._warn_on_events = False

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py in execute(self)
    384                 while set_:
    385                     n = set_.pop()
--> 386                     n.execute_aggregate(self, set_)
    387         else:
    388             for rec in topological.sort(

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py in execute_aggregate(self, uow, recs)
    666                              [self.state] +
    667                              [r.state for r in our_recs],
--> 668                              uow)
    669 
    670     def __repr__(self):

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py in save_obj(base_mapper, states, uowtransaction, single)
    179         _emit_insert_statements(base_mapper, uowtransaction,
    180                                 cached_connections,
--> 181                                 mapper, table, insert)
    182 
    183     _finalize_insert_update_commands(

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py in _emit_insert_statements(base_mapper, uowtransaction, cached_connections, mapper, table, insert, bookkeeping)
    828 
    829             c = cached_connections[connection].\
--> 830                 execute(statement, multiparams)
    831 
    832             if bookkeeping:

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/engine/base.py in execute(self, object, *multiparams, **params)
    946             raise exc.ObjectNotExecutableError(object)
    947         else:
--> 948             return meth(self, multiparams, params)
    949 
    950     def _execute_function(self, func, multiparams, params):

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/sql/elements.py in _execute_on_connection(self, connection, multiparams, params)
    267     def _execute_on_connection(self, connection, multiparams, params):
    268         if self.supports_execution:
--> 269             return connection._execute_clauseelement(self, multiparams, params)
    270         else:
    271             raise exc.ObjectNotExecutableError(self)

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_clauseelement(self, elem, multiparams, params)
   1058             compiled_sql,
   1059             distilled_params,
-> 1060             compiled_sql, distilled_params
   1061         )
   1062         if self._has_events or self.engine._has_events:

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1198                 parameters,
   1199                 cursor,
-> 1200                 context)
   1201 
   1202         if self._has_events or self.engine._has_events:

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _handle_dbapi_exception(self, e, statement, parameters, cursor, context)
   1414                 )
   1415             else:
-> 1416                 util.reraise(*exc_info)
   1417 
   1418         finally:

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/util/compat.py in reraise(tp, value, tb, cause)
    247         if value.__traceback__ is not tb:
    248             raise value.with_traceback(tb)
--> 249         raise value
    250 
    251 else:

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/engine/base.py in _execute_context(self, dialect, constructor, statement, parameters, *args)
   1168                         statement,
   1169                         parameters,
-> 1170                         context)
   1171             elif not parameters and context.no_parameters:
   1172                 if self.dialect._has_events:

~/anaconda3/envs/fonduer/lib/python3.6/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py in do_executemany(self, cursor, statement, parameters, context)
    681             extras.execute_batch(cursor, statement, parameters)
    682         else:
--> 683             cursor.executemany(statement, parameters)
    684 
    685     @util.memoized_instancemethod

UnicodeEncodeError: 'ascii' codec can't encode character '\uf0b7' in position 6282: ordinal not in range(128)

Environment (please complete the following information):

  • OS: [Ubuntu 16.04 bash for Windows]
  • PostgreSQL Version: [9.5.13]
  • Poppler Utils Version: [0.41.0-0ubuntu1.7]
  • Fonduer Version: [0.2.3]

Additional context
I have used the corpus I downloaded using the download_data.sh script in the folder.

Eliminate global variable use from meta.py

Right now, snorkel uses some hacky global variables in meta.py. This poses two main issues:

  1. It forces the user to set the SNORKELDB environment variable before importing
  2. It is executing code on import alone. For example, just importing fonduer will create a snorkel.db file immediately. We should say no to import side-effects.

Perhaps this should be some sort of Session class instead, with attributes that are accessed when needed, rather than using these global variables throughout the codebase.

Using Oracle in place of Postgres for fonduer

Is your feature request related to a problem? Please describe.
I have an Oracle instance, and, want to use it instead of Postgres.

Describe the solution you'd like
What all changes are required to make the switch, and, is there an existing config/notes that can help to make the change. From my understanding the Alchemy supports both and we have to just make sure any Postgres specific intializations/imports are handled.

Remove phantomjs blob from git repo

We have a leftover commit from migrating from the snorkel repo containing a reference to the phantomjs binary that someone accidentally committed.

$ git-sizer 
Processing blobs: 10705                        
Processing trees: 13264                        
Processing commits: 4381                        
Matching commits to trees: 4381                        
Processing annotated tags: 1                        
Processing references: 70                        
| Name                         | Value     | Level of concern               |
| ---------------------------- | --------- | ------------------------------ |
| Biggest objects              |           |                                |
| * Trees                      |           |                                |
|   * Maximum entries      [1] |  2.83 k   | *                              |
| * Blobs                      |           |                                |
|   * Maximum size         [2] |  64.8 MiB | ******                         |
|                              |           |                                |
| History structure            |           |                                |
| * Maximum tag depth      [3] |     1     | *                              |
|                              |           |                                |
| Biggest checkouts            |           |                                |
| * Maximum path length    [4] |   119 B   | *                              |

[1]  d23d5cc63594e508c00ec026a2c85458f032a059 (refs/remotes/origin/newftrs:examples/old/gene_phen_relation_example/data)
[2]  d72e801ce9c8681a006994da60b76a516f7f1853 (refs/remotes/origin/fonduer_parser:snorkel/contrib/fonduer/phantomjs/bin/phantomjs)
[3]  0c6fa1639c65cd73d81e72c47db5b4c826b21008 (refs/tags/v0.4.alpha)
[4]  22ffd4e03fca0fcdb79add2578d8432e2075803e (8a3bf199abb2b05aba34447a7c5b3286cafb964a^{tree})

We should get rid of that blog completely from the repo. Most likely using BFG.

Refactor codebase into submodules for each pipeline phase

Currently, Fonduer is kind of a monolitic package, where all database tables are created on init. In order to make development easier, we want to split Fonduer into sobmodules, each of which a single task in the pipeline:

  1. parser
  2. candidates
  3. featurization
  4. supervision
  5. learning
  6. utils

Database tables will only be created in the initialization of each of these independent modules.

TODO:

  • Reorganize files as is, just fixing imports and setting up new directories
  • Use all absolute imports
  • Split up the initialization performed by Meta into the respective pipeline phases
  • Define each phase's API. Expose submodules, rather than all functions, in the root fonduer package
  • Simplify the source files where possible

Separate workers for parsing and database insertions

Is your feature request related to a problem? Please describe.
Decouple UDF processes from the backend/database session.
Right now, when we run UDFRunner.apply_mt(), we create a number of UDF worker processes. These processes all own an sqlalchemy Session object and add/commit to the database at the end of their respective parsing loop.

Describe the solution you'd like
Make the UDF processes backend-agnostic, e.g. by having a set of separate BackendWorker processes handle the insertion of sentences. One possible way: Connect the output_queue of UDF to the input of BackendWorker, which receive Sentence lists and handle the sqlalchemy commits.

This will not fully decouple UDF from the backend, because the parser returns sqlalchemy-specific Sentence objects, but it could be one step towards that goal.

Additional context
This feature request refers to decoupling of parsing and backend.
There's likely more coupling with the backend later in the processing pipeline.

Error with fontconfig during poppler installation

If you get errors during poppler installation with pkg-config or fontconfig, make sure that you have the following two packages installed on your system.

sudo apt-get install pkg-config libfontconfig1-dev

Move all snorkel code directly into Fonduer.

Remove snorkel subdirectory.

The motivation here is that the line between importing from snorkel or fonduer is a little blurry, and kind of unnecessary. It may simplify the code if we just absorb all the snorkel files into fonduer directly.

Add data model, matchers, preprocessors to docs

We need better docs for the data model. For example, given a candidate in the lf_helpers, how do I view the sentence one of it's mentions is it? What attributes does a Span have? These are frequent questions that can be answered by looking at the code directly, but would be much more user friendly if we had docs for them.

Related: #32

Host documentation on readthedocs

There's a few parts to this task.

  • Setup readthedocs so that documentation is auto built
  • Fix the import errors with readthedocs
  • Go through the code and improve docstrings throughout (this will be ongoing...)

First part is easy, second part will take time.

[Error]CalledProcessError in visual.py

Hello,
In fonduer-tutorials, after running cell:

%time corpus_parser.apply(doc_preprocessor, parallelism=PARALLEL)

I got:

  File "/home/hagen/git/fonduer/fonduer/visual.py", line 59, in extract_pdf_words
    shell=True)
......
subprocess.CalledProcessError: Command 'pdftotext -f 1 -l 1 -bbox-layout 'data/pdf/DISES00616-1.pdf' -' returned non-zero exit status 99.

When I give a blank space between '-bbox' and '-layout' in line 59, visual.py๏ผš

 html_content = subprocess.check_output(
                "pdftotext -f {} -l {} -bbox -layout '{}' -".format(
                    str(i), str(i), self.pdf_file),
                shell=True)

it turned out to be like this many times:

RuntimeError: Words could not be extracted from PDF: data/pdf/DISES00616-1.pdf

Best regards.

Add japanese tokenization support

Is your feature request related to a problem? Please describe.
My documents are written in Japanese, which is not supported by SpaCy hence not by Fonduer.

Describe the solution you'd like
According to SpaCy, tokenization of Japanese and other languages is alpha supported.
Please support these languages if tokenization is useful than nothing for Fonduer.

Can Fonduer extract arbitrary tabular data?

I have a requirement where a table has three columns. first column data represent reach row's main header. and second,third column respective value of each row (in second,third column small regular expression function required to filter digits..etc). and data are text .

i want fonduer to consider each row as one candidate. how can i do that ?
and some cases where one row might have multiple paragraphs(sub-rows), in that case i want fonduer to give me each paragraph in that row as separate row with concatenating main header(paragraph) to each sub-row.
can fonduer do this type of task ?

when i was trying to understand fonduer flow, i found it keeping data in terms of phrases in "Phrase" table with keeping all track. but to solve my/above issue how can this going to helpful ?

Thanks in Advance.

psycopg2 error in python 3.6

On the fonduer_parser branch, Travis-CI is failing in Python 3.6. Specifically, build number 16.

If you look at those logs, python2 and python3.5 are failing, but for an unrelated reason (an assertion is failing). However, python3.6 is failing due to a ProgrammingError:

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <sqlalchemy.dialects.postgresql.psycopg2.PGDialect_psycopg2 object at 0x7fc90e7cc6a0>
cursor = <cursor object at 0x7fc8e2189ce0; closed: -1>
statement = 'INSERT INTO phrase (lemmas, pos_tags, ner_tags, dep_parents, dep_labels, row_start, row_end, col_start, col_end, posi...xt)s, %(words)s, %(char_offsets)s, %(entity_cids)s, %(entity_types)s, %(abs_char_offsets)s, %(table_id)s, %(cell_id)s)'
parameters = ({'abs_char_offsets': [3726, 3733, 3741, 3745, 3755, 3769, ...], 'bottom': [[207.15112578]], 'cell_id': 4852, 'char_of...m': [-inf, -inf, -inf, -inf, -inf, -inf, ...], 'cell_id': 5269, 'char_offsets': [0, 4, 10, 16, 17, 27, ...], ...}, ...)
context = <sqlalchemy.dialects.postgresql.psycopg2.PGExecutionContext_psycopg2 object at 0x7fc8df4a6208>
    def do_executemany(self, cursor, statement, parameters, context=None):
        if self.psycopg2_batch_mode:
            extras = self._psycopg2_extras()
            extras.execute_batch(cursor, statement, parameters)
        else:
>           cursor.executemany(statement, parameters)
E           psycopg2.ProgrammingError: ARRAY types double precision and numeric[] cannot be matched
E           LINE 1: ...ty'::float, 'Infinity'::float, 'Infinity'::float, ARRAY[364....
E                                                                        ^
../../../virtualenv/python3.6.3/lib/python3.6/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py:683: ProgrammingError

Improving parser performance

The current Parser takes quite a bit of time to process a large corpus of documents (~25min for 100 PDF datasheets on a consumer laptop). It would be nice to see what can be done to improve performance.

Profiling Setup

I did some quick profiling using the e2e hardware tutorial using Fonduer v0.2.3, with two modifications:

max_docs = 10

and using only a single thread:

import cProfile

corpus_parser = Parser(structural=True, lingual=True, visual=True, pdf_path=pdf_path)
cProfile.runctx('corpus_parser.apply(doc_preprocessor, parallelism=1)', globals(), locals(), 'profile.prof')

Top 50 Cumulative Time

   List reduced from 1834 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      3/1    0.000    0.000  127.815  127.815 {built-in method builtins.exec}
      2/1    0.001    0.000  127.801  127.801 /home/lwhsiao/repos/fonduer/fonduer/utils/udf.py:31(apply)
      2/1    0.000    0.000  127.781  127.781 /home/lwhsiao/repos/fonduer/fonduer/utils/udf.py:57(apply_st)
        2    0.001    0.000  126.855   63.428 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py:909(commit)
      3/2    0.000    0.000  126.855   63.427 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py:464(commit)
      3/2    0.000    0.000  126.634   63.317 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py:433(_prepare_impl)
        8    0.077    0.010  126.634   15.829 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py:2220(flush)
        1    0.029    0.029  126.556  126.556 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/session.py:2271(_flush)
        1    0.014    0.014  126.068  126.068 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py:369(execute)
        6    0.001    0.000  119.647   19.941 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py:658(execute_aggregate)
        6    0.033    0.005  119.642   19.940 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py:131(save_obj)
       78    1.160    0.015  119.413    1.531 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py:799(_emit_insert_statements)
    16198    0.142    0.000  113.913    0.007 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py:882(execute)
    16192    0.102    0.000  113.730    0.007 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/sql/elements.py:267(_execute_on_connection)
    16192    0.686    0.000  113.628    0.007 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py:1016(_execute_clauseelement)
    16198    0.908    0.000  112.330    0.007 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py:1111(_execute_context)
21342/18958    0.118    0.000   80.031    0.004 /home/lwhsiao/repos/fonduer/fonduer/parser/parser.py:550(_parse_node)
11230/5902    0.204    0.000   79.332    0.013 /home/lwhsiao/repos/fonduer/fonduer/parser/parser.py:570(parse)
    18958    0.088    0.000   78.291    0.004 /home/lwhsiao/repos/fonduer/fonduer/parser/parser.py:419(_parse_paragraph)
    11227    0.428    0.000   67.047    0.006 /home/lwhsiao/repos/fonduer/fonduer/parser/parser.py:322(_parse_sentence)
        9    0.000    0.000   57.452    6.384 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py:678(do_executemany)
        9   57.031    6.337   57.452    6.384 {method 'executemany' of 'psycopg2.extensions.cursor' objects}
    10670    0.324    0.000   50.675    0.005 nn_parser.pyx:326(__call__)
    10670    0.088    0.000   47.024    0.004 nn_parser.pyx:727(get_batch_model)
    16193    0.074    0.000   46.940    0.003 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/engine/default.py:508(do_execute)
    16197   46.179    0.003   46.868    0.003 {method 'execute' of 'psycopg2.extensions.cursor' objects}
80025/16005    0.720    0.000   44.993    0.003 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/api.py:58(begin_update)
    10670    0.110    0.000   37.959    0.004 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/api.py:277(begin_update)
    58685    0.959    0.000   30.840    0.001 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/_classes/layernorm.py:50(begin_update)
    11227    0.611    0.000   24.486    0.002 /home/lwhsiao/repos/fonduer/fonduer/parser/spacy_parser.py:121(parse)
    42680    0.229    0.000   23.576    0.001 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/_classes/resnet.py:17(begin_update)
5912/5902    0.013    0.000   23.415    0.004 /home/lwhsiao/repos/fonduer/fonduer/parser/parser.py:124(apply)
     5902    0.003    0.000   23.271    0.004 /home/lwhsiao/repos/fonduer/fonduer/parser/visual_linker.py:30(parse_visual)
       10    0.012    0.001   21.981    2.198 /home/lwhsiao/repos/fonduer/fonduer/parser/visual_linker.py:47(extract_pdf_words)
    16005    0.464    0.000   20.554    0.001 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/api.py:365(uniqued_fwd)
     5335    0.051    0.000   20.365    0.004 pipeline.pyx:425(__call__)
     5335    0.073    0.000   20.089    0.004 pipeline.pyx:437(predict)
101365/10670    0.224    0.000   20.016    0.002 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/_classes/model.py:155(__call__)
    10670    0.106    0.000   19.680    0.002 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/api.py:291(predict)
       74    0.005    0.000   19.545    0.264 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/bs4/__init__.py:87(__init__)
    58685    1.543    0.000   19.542    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/_classes/maxout.py:66(begin_update)
32010/5335    0.183    0.000   18.830    0.004 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/api.py:53(predict)
    85360   17.542    0.000   17.542    0.000 ops.pyx:333(batch_dot)
      128    0.001    0.000   15.675    0.122 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/bs4/builder/_htmlparser.py:192(prepare_markup)
       64    0.003    0.000   15.673    0.245 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/bs4/dammit.py:344(__init__)
      128    0.001    0.000   15.662    0.122 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/bs4/dammit.py:240(encodings)
       64    0.001    0.000   15.656    0.245 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/bs4/dammit.py:33(chardet_dammit)
       64    0.003    0.000   15.654    0.245 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/chardet/__init__.py:24(detect)
       64    0.003    0.000   15.635    0.244 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/chardet/universaldetector.py:111(feed)
    10670    2.077    0.000   14.986    0.001 nn_parser.pyx:387(parse_batch)

Top 50 Total Time

   List reduced from 1834 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        9   57.031    6.337   57.452    6.384 {method 'executemany' of 'psycopg2.extensions.cursor' objects}
    16197   46.179    0.003   46.868    0.003 {method 'execute' of 'psycopg2.extensions.cursor' objects}
    85360   17.542    0.000   17.542    0.000 ops.pyx:333(batch_dot)
   901615    7.821    0.000    7.821    0.000 {method 'reduce' of 'numpy.ufunc' objects}
    10670    6.399    0.001    6.399    0.001 {built-in method numpy.core.multiarray.dot}
  4904371    3.915    0.000    3.915    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/chardet/codingstatemachine.py:66(next_state)
   586850    3.229    0.000    8.996    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/numpy/core/fromnumeric.py:2456(prod)
    80025    2.345    0.000    3.893    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/numpy/core/_methods.py:86(_var)
      808    2.241    0.003    2.241    0.003 {method 'findall' of '_sre.SRE_Pattern' objects}
       62    2.096    0.034    4.827    0.078 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/chardet/utf8prober.py:57(feed)
    10670    2.077    0.000   14.986    0.001 nn_parser.pyx:387(parse_batch)
    64020    1.875    0.000    6.569    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/_classes/hash_embed.py:48(begin_update)
       62    1.820    0.029    2.005    0.032 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/chardet/charsetprober.py:103(filter_with_english_letters)
   458810    1.725    0.000    9.098    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/mem.py:28(__getitem__)
    58685    1.543    0.000   19.542    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/_classes/maxout.py:66(begin_update)
       50    1.488    0.030    2.805    0.056 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/chardet/mbcharsetprober.py:61(feed)
    16192    1.446    0.000    3.478    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/engine/default.py:595(_init_compiled)
    80025    1.425    0.000    1.443    0.000 ops.pyx:419(maxout)
   458810    1.313    0.000   10.790    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/describe.py:35(__get__)
   631016    1.310    0.000    1.310    0.000 {built-in method numpy.core.multiarray.array}
      868    1.206    0.001    3.469    0.004 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/chardet/sbcharsetprober.py:77(feed)
   108507    1.205    0.000    2.466    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/sync.py:16(populate)
    80025    1.194    0.000    1.194    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/_classes/layernorm.py:102(_forward)
       78    1.160    0.015  119.413    1.531 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py:799(_emit_insert_statements)
    80025    1.141    0.000    2.187    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/numpy/core/_methods.py:53(_mean)
   186725    1.087    0.000    1.166    0.000 ops.pyx:168(asarray)
    32440    0.975    0.000    1.746    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py:380(_collect_insert_commands)
    58685    0.959    0.000   30.840    0.001 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/_classes/layernorm.py:50(begin_update)
       94    0.956    0.010    0.956    0.010 {method 'read' of '_io.BufferedReader' objects}
    16198    0.908    0.000  112.330    0.007 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py:1111(_execute_context)
    64020    0.908    0.000    2.544    0.000 ops.pyx:452(seq2col)
   736230    0.801    0.000    0.801    0.000 {method 'reshape' of 'numpy.ndarray' objects}
   128040    0.775    0.000    2.938    0.000 ops.pyx:158(allocate)
   298389    0.772    0.000    1.465    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/attributes.py:699(set)
       74    0.757    0.010    0.757    0.010 {built-in method posix.read}
    58685    0.756    0.000    3.397    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/_classes/layernorm.py:70(_begin_update_scale_shift)
80025/16005    0.720    0.000   44.993    0.003 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/api.py:58(begin_update)
   311497    0.708    0.000    2.042    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/unitofwork.py:193(get_attribute_history)
    80025    0.689    0.000    7.659    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/thinc/neural/_classes/layernorm.py:81(_get_moments)
    16192    0.686    0.000  113.628    0.007 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/engine/base.py:1016(_execute_clauseelement)
   101365    0.649    0.000    0.649    0.000 {built-in method numpy.core.multiarray.concatenate}
    32362    0.637    0.000    2.376    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/persistence.py:1168(_postfetch)
1862054/1861624    0.631    0.000    0.786    0.000 {built-in method builtins.isinstance}
   108741    0.614    0.000    0.614    0.000 {method 'search' of '_sre.SRE_Pattern' objects}
    11227    0.611    0.000   24.486    0.002 /home/lwhsiao/repos/fonduer/fonduer/parser/spacy_parser.py:121(parse)
       74    0.588    0.008    0.588    0.008 {built-in method _posixsubprocess.fork_exec}
    48585    0.577    0.000    0.577    0.000 {built-in method _codecs.utf_8_decode}
519893/519599    0.549    0.000    0.586    0.000 {built-in method builtins.hasattr}
       10    0.534    0.053    1.217    0.122 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/chardet/sjisprober.py:56(feed)
   390715    0.527    0.000    0.579    0.000 /home/lwhsiao/repos/tutorials/.venv/lib/python3.6/site-packages/sqlalchemy/orm/state.py:665(_modified_event)

Clean up featurization code

There appears to be several bugs in the featurization code. Examples below.

In content_features.py:

  • We import global variables like Mention, Indicator, etc from treedlib without importing.
  • span.parent should probably be span.sentence

In structural_features.py

  • unary_tdl_feats should be unary_strlib_feats, I believe

only parsing first page of 136 page pdf

When running the code in the fonduer-tutorials max_storage_temp_tutorial notebook to parse the pdf linked to below, only the first page is parsed.
I only get 19 sentences and all of them are from the first page.

To Reproduce
Steps to reproduce the behavior:

  1. download https://www.theice.com/publicdocs/regulatory_filings/ICUS_Rules_Clean_up.pdf .
  2. convert the pdf to html with poppler's pdftohtml.
  3. rename html file to match pdf file.
  4. run through code the same way as the fonduer-tutorials max_storage_temp_tutorial notebook.

Expected behavior
Get a list of all the sentences in the 136 page document.

Environment (please complete the following information):

  • OS: Ubuntu 16.04 running on Windows Subsystem for Linux
  • PostgreSQL Version: 10+192.pgdg16.04+1
  • Poppler Utils Version: 0.41.0-0ubuntu1.7
  • Fonduer Version: 0.2.3

Enforce match between DB and candidate encodings

Suggest putting a check in the baseline fonduer docpreprocessor method that checks to ensure that the database encoding is the same as the candidate text encodings, else you'll end up with DB errors when running the featurizers.

Switch README to RST.

The standard format for PyPi is reStructuredText, not Markdown. It would simplify things by letting us cut out the conversion using pandoc.

Generating/checking matching pdf file paths for visual linking outside of parser

Is your feature request related to a problem? Please describe.
At the moment we use HtmlDocPreprocessor to separately generate pre-processed documents that are fed into the parser.

If we want to extract visual features, we currently need corresponding pdf files for each input document. Fetching the pdf file path currently happens inside parser, which is initialized with a pdf_path argument. This couples the parser with input data generation. Furthermore, we can only test whether a matching pdf file exists, when the ParserUDF.apply() method is called, because we have no knowledge about the html input files before.

Describe the solution you'd like
Have a (separate) generator that handles generation and checking of the matching pdf file paths, which are fed into the parser.apply() method, e.g. parser.apply((doc,text), pdf_path, **kwargs).

Describe alternatives you've considered
Extend HtmlDocPreprocessor to return tuples of three values (doc,text,pdf_path), if a visual_linking_pdf_path is provided.

Additional context
One thing to consider is that there are also other ways of visual linking that would not require PDF files in the future.

Word mismatch between HTML and PDF for visual linker

In the test md document we use an ordered HTML list, which renders as numbers in the PDF. This is causing a mismatch of words when doing the visual parse. The list of words in that document are

HTML PDF
Sample Sample
Markdown Markdown
This This
is is
some some
basic basic
, ,
sample sample
markdown markdown
. .
Second Second
Heading Heading
Unordered Unordered
lists lists
, ,
and and
: :
One 1
Two .
Three One
More 2
Blockquote .
And Two
bold 3
, .
italics Three
, More
and Blockquote
even And
italics bold
and ,
later italics
. ,
Even and
bold even
strikethrough italics
. and
A later
link bold
to .
somewhere Even
. strikethrough
Here .
is A
a link
table to
Name somewhere
Lunch .
order Here
Spicy is
Owes a
Joan table
saag Name
paneer Lunch
medium order
$ Spicy
11 Owes
Sally Joan
vindaloo saag
mild paneer
$ medium
14 $11
Erin Sally
lamb vindaloo
madras mild
HOT $14
$ Erin
5 lamb
Or madras
inline HOT
code $5
like Or
var inline
foo code
= like
'", 'var
bar foo
'", '=
; 'bar';
. .
Or Or
an an
image image
of of
bears bears
The The
end end
... ...

Notice that in the PDF words, the 1, 2, and 3 appear, whereas they do not in the HTML.

Ideally this will be resolved when we switch to pdftotree and the new parser.

Error when copying feature updates to Postgresql database

Hello,
I am getting this error when applying the batch feature annotator:
CalledProcessError: Command 'cat /tmp/reg_topic_reg_topic_feature_0_*.tsv | psql reg_topic -p 5432 -c "COPY reg_topic_feature_updates(candidate_id, keys, values) FROM STDIN" --set=ON_ERROR_STOP=true' returned non-zero exit status 1.

These are the info that I got:

[INFO] fonduer.udf - Clearing existing...
[INFO] fonduer.udf - Running UDF...
[INFO] fonduer.async_annotations - Copying reg_topic_feature_updates to postgres

Thank you in advance.

Version conflicts in dependencies

tensorboard 1.8.0 has requirement bleach==1.5.0, but you'll have bleach 2.1.3 which is incompatible.  
bleach 2.1.3 has requirement html5lib!=1.0b1,!=1.0b2,!=1.0b3,!=1.0b4,!=1.0b5,!=1.0b6,!=1.0b7,!=1.0b8,>
=0.99999999pre, but you'll have html5lib 0.9999999 which is incompatible.

bleach 2.1.3 is installed by jupyter. We also had

error: html5lib 1.0b8 is installed but html5lib==0.9999999 is required by {'tensorboard'}

which will hopefully get resolved in the next release of tensorboard [1].

Split tutorials into a separate repo

This separate repo can then simply

pip install fonduer

(once we get that working) and it will be a nice clean boilerplate for future fonduer apps.

New parser needs to handle angle brackets

With the INFNS19372-1.pdf document, after passing through pdftotree we have the following issue.

<section_header char='  F o r   f u r t h e r   i n f o r m a t i o n   o n t e
c h n o l o g y ,   d e l i v e r y   t e r m s   a n d   c o n d i t i o n s
a n d   p r i c e s , p l e a s e   c o n t a c t   t h e   n e a r e s t   I n
f i n e o n   T e c h n o l o g i e s   O f f i c e   ( < w w w . i n f i n e o
    n . c o m > ) . ' , [leaving off coordinates for brevity...] '>F or further
information on technology, delivery terms and conditions and prices, please
contact the nearest Infineon Technologies Office ( <www.infineon.com>).
</section_header>

This <www.infineon.com> is being parsed at the html_tag, rather than as part of the content.

As a related issue, we are currently swapping all ' for " in pdftotree, and swapping those back in the new parser. In both these cases, we want to treat these as normal strings, but because we're using standard parsing tools these can be problematic.

Should we instead just HTML escape these values? E.g., to &lt;, &gt;, &quot;?

[1] https://stackoverflow.com/questions/44430571/how-to-get-text-in-angle-brackets-with-lxml-or-bs

Move BatchFeatureAnnotator

Currently this is in async_annotators within the supervision directory. It should be split out under features.

Make fonduer a pip-installable package?

This should be very doable, as it just requires the user to install phantomjs and poppler themselves, which seems reasonable.

The other main thing would be to actually use proper import paths so we don't have a bunch of environment variables that we need to modify.

Wrong NER tag

I'm not really sure, but it looks like a bug unless I'm missing something.

Describe the bug

test_parser.py::test_parse_md_details tests NER tags at here like below:

assert header.ner_tags == ["ORG", "ORG"]

where header.words == ['Sample', 'Markdown'], but neither 'Sample' nor 'Markdown' is ORG (organization).

To Reproduce

You can confirm what is the right NET tag as follows:

import spacy
nlp = spacy.load('en')
doc = nlp(u'Apple is looking at buying U.K. startup for $1 billion')
print("Recognized number of NER: %d" % len(doc.ents))
"""This should print
Recognized number of NER: 3
"""

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

"""This should print  
Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY
"""

doc = nlp(u'Sample Markdown')
print("Recognized number of NER: %d" % len(doc.ents))
"""This should print
Recognized number of NER: 0
"""

for token in doc:
    print(token.text, token.pos_, token.dep_, token.ent_type_ if token.ent_type_ else "O")

"""This should print  
Sample NOUN compound O
Markdown PROPN ROOT O
"""

Expected behavior

The test should assert like this instead and should pass.

assert header.ner_tags == ["O", "O"]

Environment (please complete the following information):

  • Fonduer Version: 0.3.0 (d5c1e9b)
  • spacy: 2.0.12

[Errno 32] Broken pipe for Parser in parallel execution on OSX

Hi,

In fonduer-tutorials, after running cell:

corpus_parser = OmniParser(structural=True, lingual=True, visual=True, pdf_path=pdf_path)
%time corpus_parser.apply(doc_preprocessor, parallelism=PARALLEL)

whenever is PARALLEL smaller than max_docs, I've got:

Traceback (most recent call last):
  File "/anaconda3/lib/python3.6/multiprocessing/queues.py", line 240, in _feed
    send_bytes(obj)
  File "/anaconda3/lib/python3.6/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/anaconda3/lib/python3.6/multiprocessing/connection.py", line 398, in _send_bytes
    self._send(buf)
  File "/anaconda3/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

Otherwise (with PARALLEL bigger or equal than max_docs) result is empty tables in Postgresql.
When turning off parallelisation, it works.

Best regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.