fastai / course-nlp Goto Github PK

View Code? Open in Web Editor NEW

3.4K 3.4K 1.5K 16.7 MB

A Code-First Introduction to NLP course

Home Page: https://www.fast.ai/2019/07/08/fastai-nlp/

Jupyter Notebook 99.73% Python 0.27%

data-science machine-learning nlp python

course-nlp's Introduction

Welcome to fastai

Installing

You can use fastai without any installation by using Google Colab. In fact, every page of this documentation is also available as an interactive notebook - click “Open in colab” at the top of any page to open it (be sure to change the Colab runtime to “GPU” to have it run fast!) See the fast.ai documentation on Using Colab for more information.

You can install fastai on your own machines with conda (highly recommended), as long as you’re running Linux or Windows (NB: Mac is not supported). For Windows, please see the “Running on Windows” for important notes.

We recommend using miniconda (or miniforge). First install PyTorch using the conda line shown here, and then run:

conda install -c fastai fastai

To install with pip, use: pip install fastai.

If you plan to develop fastai yourself, or want to be on the cutting edge, you can use an editable install (if you do this, you should also use an editable install of fastcore to go with it.) First install PyTorch, and then:

git clone https://github.com/fastai/fastai
pip install -e "fastai[dev]"

Learning fastai

The best way to get started with fastai (and deep learning) is to read the book, and complete the free course.

To see what’s possible with fastai, take a look at the Quick Start, which shows how to use around 5 lines of code to build an image classifier, an image segmentation model, a text sentiment model, a recommendation system, and a tabular model. For each of the applications, the code is much the same.

Read through the Tutorials to learn how to train your own models on your own datasets. Use the navigation sidebar to look through the fastai documentation. Every class, function, and method is documented here.

To learn about the design and motivation of the library, read the peer reviewed paper.

About fastai

fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying patterns of many deep learning and data processing techniques in terms of decoupled abstractions. These abstractions can be expressed concisely and clearly by leveraging the dynamism of the underlying Python language and the flexibility of the PyTorch library. fastai includes:

A new type dispatch system for Python along with a semantic type hierarchy for tensors
A GPU-optimized computer vision library which can be extended in pure Python
An optimizer which refactors out the common functionality of modern optimizers into two basic pieces, allowing optimization algorithms to be implemented in 4–5 lines of code
A novel 2-way callback system that can access any part of the data, model, or optimizer and change it at any point during training
A new data block API
And much more…

fastai is organized around two main design goals: to be approachable and rapidly productive, while also being deeply hackable and configurable. It is built on top of a hierarchy of lower-level APIs which provide composable building blocks. This way, a user wanting to rewrite part of the high-level API or add particular behavior to suit their needs does not have to learn how to use the lowest level.

Migrating from other libraries

It’s very easy to migrate from plain PyTorch, Ignite, or any other PyTorch-based library, or even to use fastai in conjunction with other libraries. Generally, you’ll be able to use all your existing data processing code, but will be able to reduce the amount of code you require for training, and more easily take advantage of modern best practices. Here are migration guides from some popular libraries to help you on your way:

Windows Support

Due to python multiprocessing issues on Jupyter and Windows, num_workers of Dataloader is reset to 0 automatically to avoid Jupyter hanging. This makes tasks such as computer vision in Jupyter on Windows many times slower than on Linux. This limitation doesn’t exist if you use fastai from a script.

See this example to fully leverage the fastai API on Windows.

We recommend using Windows Subsystem for Linux (WSL) instead – if you do that, you can use the regular Linux installation approach, and you won’t have any issues with num_workers.

Tests

To run the tests in parallel, launch:

nbdev_test

For all the tests to pass, you’ll need to install the dependencies specified as part of dev_requirements in settings.ini

pip install -e .[dev]

Tests are written using nbdev, for example see the documentation for test_eq.

Contributing

After you clone this repository, make sure you have run nbdev_install_hooks in your terminal. This install Jupyter and git hooks to automatically clean, trust, and fix merge conflicts in notebooks.

After making changes in the repo, you should run nbdev_prepare and make additional and necessary changes in order to pass all the tests.

Docker Containers

For those interested in official docker containers for this project, they can be found here.

course-nlp's People

Contributors

Stargazers

Watchers

Forkers

jshuadvd ikanez shubhampachori12110095 r0mer0m maciejkpl shaileshj2803 bigdatasciencegroup pandinosaurus arapfaik abhinavm24 alfords mingkin hillhillll awesome-archive gvvynplaine tanjm1988 jinjim chenjun0210 allen0308 sdushnnosuke robertisandor sethips salchem saurabhjha21 ashishkej amitgupta911 puneet-ag 13778032371 bhagone dguardia johnrjj itsmuriuki jesserobertson shmalex ike-okonkwo scanfyu jiarongyan ianliyi1996 18201143193 ilovenjoylife kevinyang007 birthtospring shenlanyilang jjwangnlp rayman96 miniamisha xiaoshengjun leofionn xudamao2015 abhishek9sharma sumanthmeenan hundred06 bi4o ksnnkyo leishenvictoria stenpiren prashant0598 bingo8 yt3437 rishisinha hzitoun rubenszimbres isinghgithub amir22010 gyanachand1 avinash-nahar dunovank aaronjoseph varunkashyapks gongqiong swapnil2095 ttcong194 phoenixfury007 shun-liang joshainglis fffinale doit01 fudp yynst2 seeledu hunterhantao albertvillanova jkhenry520 sharonsyra michaelyryi avb008 uyshero apustar lfpljy olatechie prhldk serzaraisk ansvad fatflower sjyttkl f-monkey abiraja2004 astangul vamshirapolu amin-nejad

course-nlp's Issues

License?

EDIT: rewording because I was confused about who the author was :|

Do you have a license you'd like to use for this? I'm happy to submit a pull request with the appropriate license.

I'm kicking the tires on this now in a Gigantum project, but before I share it, it'd be good to know what your wishes are.

Keep-Alive-Actions

`BrokenProcessPool` Error in the `3-logreg-nb-imdb.ipynb` notebook

In the 3-logreg-nb-imdb.ipynb notebook from the Code-first Introduction to Natural Language Processing course, a call to TextList.from_folder() throws a BrokenProcessPool error. I am running Windows 10 64-bit.

Has anyone else encountered this problem and been able to solve it?

reviews_full = (TextList.from_folder(path)
#grab all the text files in path
.split_by_folder(valid='test')
#split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
.label_from_folder(classes=['neg', 'pos']))
#label them all with their folders

Below is the full error message:

BrokenProcessPool Traceback (most recent call last)
in
3 .split_by_folder(valid='test')
4 #split by train and valid folder (that only keeps 'train' and 'test' so no need to filter)
----> 5 .label_from_folder(classes=['neg', 'pos']))
6 #label them all with their folders

~\Anaconda3\envs\fastai\lib\site-packages\fastai\data_block.py in _inner(*args, **kwargs)
478 self.valid = fv(*args, from_item_lists=True, **kwargs)
479 self.class = LabelLists
--> 480 self.process()
481 return self
482 return _inner

~\Anaconda3\envs\fastai\lib\site-packages\fastai\data_block.py in process(self)
532 "Process the inner datasets."
533 xp,yp = self.get_processors()
--> 534 for ds,n in zip(self.lists, ['train','valid','test']): ds.process(xp, yp, name=n)
535 #progress_bar clear the outputs so in some case warnings issued during processing disappear.
536 for ds in self.lists:

~\Anaconda3\envs\fastai\lib\site-packages\fastai\data_block.py in process(self, xp, yp, name, max_warn_items)
712 p.warns = []
713 self.x,self.y = self.x[~filt],self.y[~filt]
--> 714 self.x.process(xp)
715 return self
716

~\Anaconda3\envs\fastai\lib\site-packages\fastai\data_block.py in process(self, processor)
82 if processor is not None: self.processor = processor
83 self.processor = listify(self.processor)
---> 84 for p in self.processor: p.process(self)
85 return self
86

~\Anaconda3\envs\fastai\lib\site-packages\fastai\text\data.py in process(self, ds)
295 tokens = []
296 for i in progress_bar(range(0,len(ds),self.chunksize), leave=False):
--> 297 tokens += self.tokenizer.process_all(ds.items[i:i+self.chunksize])
298 ds.items = tokens
299

~\Anaconda3\envs\fastai\lib\site-packages\fastai\text\transform.py in process_all(self, texts)
118 if self.n_cpus <= 1: return self._process_all_1(texts)
119 with ProcessPoolExecutor(self.n_cpus) as e:
--> 120 return sum(e.map(self._process_all_1, partition_by_cores(texts, self.n_cpus)), [])
121
122 class Vocab():

~\Anaconda3\envs\fastai\lib\concurrent\futures\process.py in _chain_from_iterable_of_lists(iterable)
474 careful not to keep references to yielded objects.
475 """
--> 476 for element in iterable:
477 element.reverse()
478 while element:

~\Anaconda3\envs\fastai\lib\concurrent\futures_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
--> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.monotonic())

~\Anaconda3\envs\fastai\lib\concurrent\futures_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
--> 432 return self.__get_result()
433 else:
434 raise TimeoutError()

~\Anaconda3\envs\fastai\lib\concurrent\futures_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Further experimentation shows that the command sometimes succeeds without throwing the `BrokenProcessPool` error.

This is still a problem that should be addressed.
Breaking down the command that generates reviews_full into its three separate parts shows that the third part is the origin of the BrokenProcessPool error:

reviews_full0 = TextList.from_folder(path)
reviews_full1 = reviews_full0.split_by_folder(valid=‘test’)
reviews_full = reviews_full1.label_from_folder(classes=[‘neg’, ‘pos’]))

TextBunch and TextList creation fails

In notebook 3 and 5, when creating TextBunch and TextList objects the process fails with error:

TypeError: blank() got an unexpected keyword argument 'disable'

I suspect this is due to a change in the spacy.blank() function not supporting the disable arguement anymore. Could you specify which version of spacy was used in the notebooks?

full paste of the error:
https://pastebin.com/jZdTKzsh

FileNotFoundError: [Errno 2] No such file or directory: '/root/.fastai/data/giga-fren/data_save.pkl'

When I try to execute this lines of code it shows ups this error:
CODE:
data = load_data(path)

ERROR:
FileNotFoundError: [Errno 2] No such file or directory: '/root/.fastai/data/giga-fren/data_save.pkl'

What should I do to in order to solve it??
Thanks.

Quick Fix if trouble getting an error when assigning wiki_itos in 5-nn-imdb.ipynb

At least in the Google Colab version doc, the line below points to a file that doesn't exist and thus causes an error:

wiki_itos = pickle.load(open(Config().model_path()/'wt103-1/itos_wt103.pkl', 'rb'))

Quick fix is to change the name of directory as shown below:

wiki_itos = pickle.load(open(Config().model_path()/'wt103-fwd/itos_wt103.pkl', 'rb'))

Hope this helps!

Keep-Alive-Actions

wrong fine_tuned_encoder being loaded in "nn-vietnamese-bwd" notebook?

It seems that the wrong encoder is being loaded in the nn-vietnamese-bwd.ipynb notebook ("Classifier" section):

learn_c = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5, metrics[accuracy,f1]).to_fp16()
learn_c.load_encoder(f'{lang}fine_tuned_enc')
learn_c.freeze()

I think the backwards encoder should be loaded here, changing the line to:
learn_c.load_encoder(f'{lang}fine_tuned_enc_bwd')

BrokenProcessPool

Hi!
I have been training a Language Model from Wikipedia in order to create a text classifier in FastAi. I have been using Google colab for it. But after a few minutes of training, the process stops with the following error:

get_wiki(path,lang)
dest = split_wiki(path,lang)

bs=64
data = (TextList.from_folder(dest)
.split_by_rand_pct(0.1, seed=42)
.label_for_lm()
.databunch(bs=bs, num_workers=1))
data.save('tmp_lm')

BrokenProcessPool Traceback (most recent call last)
in ()
1 bs=64
2 data = (TextList.from_folder(dest)
----> 3 .split_by_rand_pct(0.1, seed=42)
4 .label_for_lm()
5 .databunch(bs=bs, num_workers=1))

9 frames
/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in _inner(*args, **kwargs)
478 self.valid = fv(*args, from_item_lists=True, **kwargs)
479 self.class = LabelLists
--> 480 self.process()
481 return self
482 return _inner

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in process(self)
532 "Process the inner datasets."
533 xp,yp = self.get_processors()
--> 534 for ds,n in zip(self.lists, ['train','valid','test']): ds.process(xp, yp, name=n)
535 #progress_bar clear the outputs so in some case warnings issued during processing disappear.
536 for ds in self.lists:

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in process(self, xp, yp, name, max_warn_items)
712 p.warns = []
713 self.x,self.y = self.x[~filt],self.y[~filt]
--> 714 self.x.process(xp)
715 return self
716

/usr/local/lib/python3.6/dist-packages/fastai/data_block.py in process(self, processor)
82 if processor is not None: self.processor = processor
83 self.processor = listify(self.processor)
---> 84 for p in self.processor: p.process(self)
85 return self
86

/usr/local/lib/python3.6/dist-packages/fastai/text/data.py in process(self, ds)
295 tokens = []
296 for i in progress_bar(range(0,len(ds),self.chunksize), leave=False):
--> 297 tokens += self.tokenizer.process_all(ds.items[i:i+self.chunksize])
298 ds.items = tokens
299

/usr/local/lib/python3.6/dist-packages/fastai/text/transform.py in process_all(self, texts)
118 if self.n_cpus <= 1: return self._process_all_1(texts)
119 with ProcessPoolExecutor(self.n_cpus) as e:
--> 120 return sum(e.map(self._process_all_1, partition_by_cores(texts, self.n_cpus)), [])
121
122 class Vocab():

/usr/lib/python3.6/concurrent/futures/process.py in _chain_from_iterable_of_lists(iterable)
364 careful not to keep references to yielded objects.
365 """
--> 366 for element in iterable:
367 element.reverse()
368 while element:

/usr/lib/python3.6/concurrent/futures/_base.py in result_iterator()
584 # Careful not to keep a reference to the popped future
585 if timeout is None:
--> 586 yield fs.pop().result()
587 else:
588 yield fs.pop().result(end_time - time.monotonic())

/usr/lib/python3.6/concurrent/futures/_base.py in result(self, timeout)
430 raise CancelledError()
431 elif self._state == FINISHED:
--> 432 return self.__get_result()
433 else:
434 raise TimeoutError()

/usr/lib/python3.6/concurrent/futures/_base.py in __get_result(self)
382 def __get_result(self):
383 if self._exception:
--> 384 raise self._exception
385 else:
386 return self._result

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

I was trying to solve this by varying the value of bs to 64,32,16. Also changing the value of num_workers but still failing.

The process is as follows, the ram memory begins to fill and at some point stops the execution of the script.

Details of the Google Colab Machine:

GPU Machine, RAM: 25.51 GB, Disk: 358.27 GB.

Is there any chance to run it on that environment?

Best Regards!

Lesson 10 notebooks: `bunzip` throws an error when unzipping `.bz2` files

On a Windows 10 64-bit machine:

bunzip throws "EOFError: Compressed file ended before the end-of-stream marker was reached" when processing these files:
viwiki-latest-pages-articles.xml.bz2I
trwiki-latest-pages-articles.xml.bz2

Attaching a screenshot:

Windows version of 7-zip throws a similar error

Note 1: A valid .xml format file is still saved.

Note 2: The problem was resolved when I downloaded the files directly from https://archive.org/details/wikipediadumps

OSError in split_wiki

When the title contains '?' (or a backslash in Windows), an OSError is raised:

OSError: [Errno 22] Invalid argument

Why pads 'xxpad'? but not 'xxbos' in func shift_tfm(b) ?

In 8-translation-transformer.ipynb:
def shift_tfm(b):
x,y = b
y = F.pad(y, (1, 0), value=1) # v.stoi['xxpad']==1
return [x,y[:,:-1]], y[:,1:]

Why pads 'xxpad'? but not 'xxbos' ?

notebook 5-nn-imdb, issue: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

It happens at:

np.allclose(enc.weight[vocab.stoi["30-something"], :],
enc.weight[vocab.stoi["linklater"], :])

numpy version 1.19.1
python 3.6.11 (can't upgrade due to compatibility with tensorflow version for CUDA capability 3.0)
pythorch 1.1.0
fastAI 1.0.61

Thanks

WikiText-103 pickled file path is different than in notebook

In the notebook 5-nn-imdb.ipynb, executing the cell with

wiki_itos = pickle.load(open(Config().model_path()/'wt103-1/itos_wt103.pkl', 'rb'))

returns a FileNotFoundError: [Errno 2] No such file or directory because the directory structure is incorrect. On my test run (using Google Colab notebooks, with GPU runtime) the file should be in the directory wt103-fwd rather than wt103-1.

The line

wiki_itos = pickle.load(open(Config().model_path()/'wt103-fwd/itos_wt103.pkl', 'rb'))

loads correctly for me.

All files generated by split_wiki end with line </doc>

While writing lines in split_wiki, the line

</doc>

should not be written to file.

Kernel dies in the 3-logreg-nb-imdb.ipynb notebook of "A Code-first Introduction to NLP"

Kernel dies at this step in the 3-logreg-nb-imdb.ipynb notebook, between the second Naive Bayes section and the second Binarized Naive Bayes section (running Windows 10 Home edition, 64-bit):

    p0 = np.squeeze(np.array(xx[neg].sum(0)))
    p1 = np.squeeze(np.array(xx[pos].sum(0)))

Kernel dies when reading the data.

The kernel dies everytime I execute below line in notebook : 7-seq2seq-translation.ipynb

"
with open(path/'giga-fren.release2.fixed.fr') as f: fr = f.read().split('\n')
"

Kindly help.

Keep-Alive-Actions

3-logreg-nb-imdb

In notebook 3 I got an error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-4-e25b28312e1c> in <module>
      1 from fastai import *
      2 from fastai.text import *
----> 3 from fastai.utils.mem import GPUMemTrace #call with mtrace

ModuleNotFoundError: No module named 'fastai.utils'

Notebook 7 link to other fastai notebook results in a 404 error

Currently reviewing notebook 7 (https://github.com/fastai/course-nlp/blob/master/7-seq2seq-translation.ipynb) and there are a few places that links to another fastai notebook.

At the beginning of notebook 7 it says:

This notebook is modified from [this one](https://github.com/fastai/fastai_docs/blob/master/dev_course/dl2/translation.ipynb) created by Sylvain Gugger.

And later:

[fast.ai implementation of seq2seq with QRNNs](https://github.com/fastai/fastai_docs/blob/master/dev_course/dl2/translation.ipynb)

I see the fastai_docs repo is now actually fastai_dev which is rapidly changing right now. Perhaps this is the notebook that should be referenced?

https://github.com/fastai/fastai_dev/blob/master/dev_nb/translation_pretrained.ipynb

SpaCy Lemmatizer use in Lesson 2

In 2-svd-nmf-topic-modeling.ipynb under the section Spacy you use:

from spacy.lemmatizer import Lemmatizer
lemmatizer = Lemmatizer()
[lemmatizer.lookup(word) for word in word_list]

Unfortunately this creates an empty lemmatizer that will just always return what's input, and may give the wrong impression.

Instead you should use something like:

nlp = spacy.load("en_core_web_sm")
lemmatizer = nlp.Defaults.create_lemmatizer()
[lemmatizer.lookup(word) for word in word_list]

Also the command to download the English model at the start of this section is written as:
spacy -m download en_core_web_sm
when it should either be python -m spacy download en_core_web_sm or spacy download en_core_web_sm

Thanks

UnicodeDecodeError when using split_wiki

On Windows, a UnicodeDecodeError is raised when using split_wiki:

----> 1 dest = split_wiki(path,lang)

~\projects\forks\course-nlp\nlputils.py in split_wiki(path, lang, encoding)
     46             if f: f.close()
     47             f = (dest/f'{title}.txt').open('w')
---> 48         else: f.write(l)
     49     f.close()
     50     return dest

~\AppData\Local\Continuum\anaconda3\envs\fastai-v1-py37\lib\encodings\cp1252.py in encode(self, input, final)
     17 class IncrementalEncoder(codecs.IncrementalEncoder):
     18     def encode(self, input, final=False):
---> 19         return codecs.charmap_encode(input,self.errors,encoding_table)[0]
     20
     21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode character '\u0103' in position 21: character maps to <undefined>

Updated version of 3-logreg-nb-imdb.ipynb

In this updated version of the 3-logreg-nb-imdb.ipynb notebook, I've resolved all the issues, as well as done a fair bit of refactoring, annotation, and development along the lines of the original lesson.

kernel dies when reading a data

device : Mac M1 Pro

I have practiced through colab so far, but I wanted to implement fastai locally after I recently encountered Pythorch's gpu support.

I succeeded in making blocks, but I found the kernel down when I loaded the dataset from the block.

|| code ||
pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_items = get_image_files,
splitter = RandomSplitter(seed=42),
get_y = using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
item_tfms = Resize(460),
batch_tfms=aug_transforms(size=244, min_scale=0.75))

no problems until here

dls = pets.dataloaders(path/'images') << when run this line, the kernel down

I also modified the memory capacity, but it didn't change.

fastai 2.7.9 pypi_0 pypi
fastbook 0.0.26 pypi_0 pypi
torch 1.13.0.dev20220807 pypi_0 pypi
torchaudio 0.14.0.dev20220603 pypi_0 pypi
torchvision 0.14.0.dev20220807 pypi_0 pypi
this is version what i installed using conda.

what can i do? or Is the support still unstable?

BrokenPipeError in Lesson 12 notebook 7-seq2seq-translation.ipynb

Under my Windows 10 64-bit system, the command

xb,yb = next(iter(data.valid_dl))

in the section labeled "Our Model"

fails with

BrokenPipeError Traceback (most recent call last)
in
----> 1 xb,yb = next(iter(data.valid_dl))

~\Anaconda3\envs\fastai\lib\site-packages\fastai\basic_data.py in iter(self)
73 def iter(self):
74 "Process and returns items from DataLoader."
---> 75 for b in self.dl: yield self.proc_batch(b)
76
77 @classmethod

~\Anaconda3\envs\fastai\lib\site-packages\torch\utils\data\dataloader.py in iter(self)
276 return _SingleProcessDataLoaderIter(self)
277 else:
--> 278 return _MultiProcessingDataLoaderIter(self)
279
280 @Property

~\Anaconda3\envs\fastai\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
680 # before it starts, and del tries to join but will get:
681 # AssertionError: can only join a started process.
--> 682 w.start()
683 self.index_queues.append(index_queue)
684 self.workers.append(w)

~\Anaconda3\envs\fastai\lib\multiprocessing\process.py in start(self)
110 'daemonic processes are not allowed to have children'
111 _cleanup()
--> 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel
114 # Avoid a refcycle if the target function holds an indirect

~\Anaconda3\envs\fastai\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

~\Anaconda3\envs\fastai\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):

~\Anaconda3\envs\fastai\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
87 try:
88 reduction.dump(prep_data, to_child)
---> 89 reduction.dump(process_obj, to_child)
90 finally:
91 set_spawning_popen(None)

~\Anaconda3\envs\fastai\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

BrokenPipeError: [Errno 32] Broken pipe

Link to British Literature Dataset in 2b-odds-and-ends.ipynb doesn't work

Link to British Literature for Topic Modeling using SVD/NMF in Notebook 2B doesn't work anymore. Here's the link. Any alternate hosting of this dataset?

Lesson 10: WikiExtractor outputs 1 when processing Turkish and Vietnamese files

WikiExtractor throws error 1 when processing the viwiki-latest-pages-articles.xml file extracted from the viwiki-latest-pages-articles.xml.bz2 file.

Same error occurs with the trwiki-latest-pages-articles.xml file extracted from the trwiki-latest-pages-articles.xml.bz2 file.

Include requirement.txt and instruction on setting up development environment

Include requirement.txt and instruction on setting up development environment for the course.

Version issue in nn-vietnamese.ipynb

There are some version issues in "nn-vietnamese.ipynb" file. Therefore most of the functionalities are not working properly. Could you please give the latest one?

Nlputils

This code does not work in Nlputils

os.system("python -m wikiextractor.WikiExtractor --processes 4 --no_templates " +
f"--min_text_length 1800 --filter_disambig_pages --log_file log -b 100G -q {xml_fn}")
shutil.move(str(path/'text/AA/wiki_00'), str(path/name))
shutil.rmtree(path/'text')

Problem with get_wiki (I think because of possible changes to wiki_extractor)

I am trying to rerun the https://github.com/fastai/course-nlp/blob/master/nn-vietnamese.ipynb Vietnamese notebook and am getting the file not found error at

get_wiki(path,lang)

This seems to be the case with any language. A manual check revealed that the text directory did not have an AA\wiki_00.

I don't know what the problem here is.

FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\\.fastai\data\viwiki\text\AA\wiki_00'

how to beam search or top k sample in transformer

In the transformer for translation course, how can we do beam search or Top K, Top P sampling?

Thanks

fastai / course-nlp Goto Github PK

course-nlp's Introduction

Welcome to fastai

Installing

Learning fastai

About fastai

Migrating from other libraries

Windows Support

Tests

Contributing

Docker Containers

course-nlp's People

Contributors

Stargazers

Watchers

Forkers

course-nlp's Issues

Below is the full error message:

Further experimentation shows that the command sometimes succeeds without throwing the BrokenProcessPool error.

Recommend Projects

Recommend Topics

Recommend Org

Jobs

Further experimentation shows that the command sometimes succeeds without throwing the `BrokenProcessPool` error.