Comments (2)
Output from CoreNLP on the simple documents:
---------------------------------------- Captured log call -----------------------------------------
[DEBUG] Starting new HTTP connection (1): 127.0.0.1
[DEBUG] http://127.0.0.1:12345 "POST /?properties=%7B%22annotators%22:%20%22tokenize,ssplit,pos,lemma,depparse,ner%22,%22outputFormat%22:%20%22json%22,%22tokenize.options%22:%22escapeForwardSlashAsterisk=false,asciiQuotes=false,unicodeQuotes=false,normalizeOtherBrackets=fal
se,ptb3Ellipsis=false,normalizeParentheses=false,normalizeCurrency=false,unicodeEllipsis=false,latexQuotes=false,normalizeSpace=false,strictTreebank3=true,ptb3Dashes=false,normalizeFractions=false%22,%22ssplit.htmlBoundariesToDiscard%22:%20%22NB%22%7D HTTP/1.1" 200 31713
[DEBUG] http://127.0.0.1:12345 "POST /?properties=%7B%22annotators%22:%20%22tokenize,ssplit,pos,lemma,depparse,ner%22,%22outputFormat%22:%20%22json%22,%22tokenize.options%22:%22escapeForwardSlashAsterisk=false,asciiQuotes=false,unicodeQuotes=false,normalizeOtherBrackets=fal
se,ptb3Ellipsis=false,normalizeParentheses=false,normalizeCurrency=false,unicodeEllipsis=false,latexQuotes=false,normalizeSpace=false,strictTreebank3=true,ptb3Dashes=false,normalizeFractions=false%22,%22ssplit.htmlBoundariesToDiscard%22:%20%22NB%22%7D HTTP/1.1" 200 62272
[DEBUG] Doc: diseases
[DEBUG] Phrase: Types of viruses, coughs, and colds
[DEBUG] Phrase: Here isa line break
[DEBUG] Phrase: I don't have Brain Canceror the hiccups
[DEBUG] Phrase: See Table 1 Below.
[DEBUG] Phrase: Common Ailments
[DEBUG] Phrase: In between the tables there is a nasty case of heart attack
[DEBUG] Phrase: And here is a final sentence with warts.
[DEBUG] Phrase: Table 1: Infectious diseases and where to find them.
[DEBUG] Phrase: Table 2: Three ways to get Pneumonia and how much they cost.
[DEBUG] Phrase: Disease
[DEBUG] Phrase: Location
[DEBUG] Phrase: Year
[DEBUG] Phrase: Polio and BC546 is -55OC cold.
[DEBUG] Phrase: -Dublin to Milwaukee
[DEBUG] Phrase: 2001
[DEBUG] Phrase: I don't like TIPL761 or Chicken Pox or pizza.
[DEBUG] Phrase: Shingles is also bad.
[DEBUG] Phrase: whooping cough
[DEBUG] Phrase: 2009
[DEBUG] Phrase: Scurvy
[DEBUG] Phrase: Annapolis
[DEBUG] Phrase: Junction and Storage Temperature -55 to 150 o ?
[DEBUG] Phrase: C
[DEBUG] Phrase: Problem
[DEBUG] Phrase: Cause
[DEBUG] Phrase: Cost
[DEBUG] Phrase: Arthritis
[DEBUG] Phrase: Pokemon Go
[DEBUG] Phrase: Free
[DEBUG] Phrase: Yellow
[DEBUG] Phrase: Fever
[DEBUG] Phrase: Unicorns
[DEBUG] Phrase: $17.75
[DEBUG] Phrase: Hypochondria
[DEBUG] Phrase: Fear
[DEBUG] Phrase: $100
[DEBUG] Doc: md
[DEBUG] Phrase: Sample Markdown
[DEBUG] Phrase: This is some basic, sample markdown.
[DEBUG] Phrase: Second Heading
[DEBUG] Phrase: Unordered lists, and:
[DEBUG] Phrase: One
[DEBUG] Phrase: Two
[DEBUG] Phrase: Three
[DEBUG] Phrase: More
[DEBUG] Phrase: Blockquote
[DEBUG] Phrase: And
[DEBUG] Phrase: bold
[DEBUG] Phrase: ,
[DEBUG] Phrase: italics
[DEBUG] Phrase: , and even
[DEBUG] Phrase: italics and later
[DEBUG] Phrase: .
[DEBUG] Phrase: Even
[DEBUG] Phrase: bold
[DEBUG] Phrase: strikethrough
[DEBUG] Phrase: .
[DEBUG] Phrase: A link
[DEBUG] Phrase: to somewhere.
[DEBUG] Phrase: Here is a table
[DEBUG] Phrase: Or inline code like
[DEBUG] Phrase: var foo = 'bar';
[DEBUG] Phrase: .
[DEBUG] Phrase: Or an image of bears
[DEBUG] Phrase: The end ...
[DEBUG] Phrase: Name
[DEBUG] Phrase: Lunch order
[DEBUG] Phrase: Spicy
[DEBUG] Phrase: Owes
[DEBUG] Phrase: Joan
[DEBUG] Phrase: saag paneer
[DEBUG] Phrase: medium
[DEBUG] Phrase: $11
[DEBUG] Phrase: Sally
[DEBUG] Phrase: vindaloo
[DEBUG] Phrase: mild
[DEBUG] Phrase: $14
[DEBUG] Phrase: Erin
[DEBUG] Phrase: lamb madras
[DEBUG] Phrase: HOT
[DEBUG] Phrase: $5
CoreNLP is splitting different formatting (e.g. italics, bold, etc) into different phrases.
from fonduer.
Inspecting 5 candidates using the code:
from fonduer.features import features
cand = []
log = open('scapy_log_features.txt', 'w')
for i, c in enumerate(train_cands):
if c[0].get_span().startswith('BC856') and c[1].get_span() == '150':
print("###", i)
cand.append(c)
print("Candidates: {}".format(len(cand)))
for c in cand:
log.write("Candidate: {}\n".format(c))
for f in list(features.get_all_feats([c])):
log.write(" Feature: {}\n".format(f))
log.close()
at the end of the stg_temp_max tutorial.
from fonduer.
Related Issues (20)
- Extracting Information from tables without Borders HOT 4
- get_sentence_ngrams, get_neighbor_sentence_ngrams, same_sentence should be fonduer.utils.data_model_utils.textual?
- BBox value errors HOT 3
- Is this the right way to test the saved emmental models? HOT 5
- ReadTheDocs error HOT 4
- Featurizer.get_keys() does not honor candidate classes in context
- HTMLDocPreprocessor for PDF documents is it always required HOT 3
- How can i extract a paragraph and all associated sentences in document HOT 1
- Suggestion required: Getting error while applying Featurizer HOT 3
- Parser is not splitting multiple lines sentences properly HOT 3
- unable to read images in the pdf file HOT 8
- Tokens not aligned error when spacy < 2.3.0 HOT 3
- hOCR preprocessor not available in latest release despite documentation suggesting othwerwise HOT 2
- Parser can't handle big tables? HOT 3
- Its dead slow with Win10 + PY 3.6 HOT 2
- HOCRParser fails to multiline Japanese strings HOT 2
- UDF hangs with no exception / warning HOT 5
- Tables aren't redefined for re-runs of UDF apply HOT 5
- Test code "test_postgres.py" failes with sqlalchemy delete method
- CandidateExtractor doesn't scale for larger relations HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fonduer.