GithubHelp home page GithubHelp logo

miguelgfierro / pybase Goto Github PK

View Code? Open in Web Editor NEW
26.0 3.0 12.0 2.32 MB

Codebase for Python

Home Page: https://miguelgfierro.com

License: Other

JavaScript 0.70% HTML 0.77% CSS 0.50% Python 93.30% Jupyter Notebook 4.74%
python codebase programming-tools

pybase's Introduction

Hello there ๐Ÿ‘‹, my name is Miguel Fierro.

๐Ÿค– I lead the Personalization team at Microsoft. We are a team of Data Scientists and Software Engineers working on Recommendation Systems, NLP, Computer Vision and other Machine Learning solutions.

๐Ÿ’ฌ In addition, I help people understand and apply AI. Whether you want to switch your career to Data Science, land a job in a big tech company, grow your Data Science career, or apply AI to your business, I can help you. โžก๏ธโžก๏ธ Join my email list โฌ…๏ธโฌ…๏ธ

๐Ÿ’ป I'm a maintainer of Recommenders, the top open source repository in Recommendation Systems. I have also contributed to the deep learning frameworks MXNet and CNTK. In this repo you can find a portfolio of machine learning projects. Finally, outside machine learning, I have built my own blog from scratch, that looks like a LaTeX paper.

โšก Fun fact: The picture in my profile is the HOAP-3 humanoid robot. I did my PhD thesis with it. Here is a fun video of HOAP dancing.

PROFILE VIEWS

Metrics

Check out my latest blog posts:

pybase's People

Contributors

miguelgfierro avatar simonyansenzhao avatar trellixvulnteam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pybase's Issues

BUG simplex

=================================== FAILURES ===================================
_______ [doctest] pybase.optimization.downhill_simplex.optimize_function _______
017     
018     Returns:
019         np.array: Result of the optimization.
020         float: Value of function at minimum.
021     
022     Examples:
023         >>> from .functions import rosenbrock
024         >>> x0 = np.array([0, 0, 0, 0, 0])
025         >>> xopt, fopt = optimize_function(rosenbrock, x0)
026         >>> xopt # Real solution [1,1,1,1,1]
Expected:
    array([0.9999974 , 0.99999158, 0.99998042, 0.999[96](https://github.com/miguelgfierro/pybase/actions/runs/6559839765/job/17816243441?pr=73#step:7:97)58 , 0.99993196])
Got:
    array([1.00002005, 1.00004272, 1.00005929, 1.0000[98](https://github.com/miguelgfierro/pybase/actions/runs/6559839765/job/17816243441?pr=73#step:7:99)4 , 1.00020735])

error with newer opencv with cv2.findContours

_____________________________________ [doctest] pybase.image_base.opencv_features.largest_contour ______________________________________
006
007     Args:
008         mask (np.array): Binary image.
009
010     Returns:
011         np.array: Array of points.
012
013     Examples:
014         >>> mask = cv2.imread('share/Lenna_mask.png', 0)
015         >>> cnts = largest_contour(mask)
UNEXPECTED EXCEPTION: ValueError('not enough values to unpack (expected 3, got 2)',)
Traceback (most recent call last):

  File "/home/miguel/anaconda/envs/codebase/lib/python3.6/doctest.py", line 1330, in __run
    compileflags, 1), test.globs)

  File "<doctest pybase.image_base.opencv_features.largest_contour[1]>", line 1, in <module>

  File "/home/miguel/repos/pybase/image_base/opencv_features.py", line 19, in largest_contour
    _, contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

ValueError: not enough values to unpack (expected 3, got 2)

Memory usage by a program

$ python
Python 3.5.6 |Anaconda, Inc.| (default, Jun  4 2021, 13:57:47)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import psutil
>>> import os
>>> psutil.Process(os.getpid()).memory_info().rss
13553664
>>> int(psutil.Process(os.getpid()).memory_info().rss)/1024**2 #MB
12.92578125

Transform the type of a column in pyspark

from pyspark.sql.types import IntegerType
train = train.withColumn("ProductId", train["ProductId"].cast(IntegerType()))
test = test.withColumn("ProductId", test["ProductId"].cast(IntegerType()))

review segmentation utils

def test_apply_mask_to_image(test_image):
    img = cv2.imread(test_image["lenna"])
    mask = cv2.imread(test_image["lenna_mask"], 0)
    masked = apply_mask_to_image(img, mask)

    assert np.count_nonzero(masked == 0) == 416754
    assert np.count_nonzero(masked != 0) == 369678

416754 and 369678 is too big for a mask, this looks a 3 channel matrix

download_from_google_drive

def download_from_google_drive(file_id, file_name):
    # download a file from the Google Drive link
    !rm -f ./cookie
    !curl -c ./cookie -s -L "https://drive.google.com/uc?export=download&id={file_id}" > /dev/null
    confirm_text = !awk '/download/ {print $NF}' ./cookie
    confirm_text = confirm_text[0]
    !curl -Lb ./cookie "https://drive.google.com/uc?export=download&confirm={confirm_text}&id={file_id}" -o {file_name}

Review newest errors in tests

$ pytest --doctest-modules --continue-on-collection-errors --durations 0
================================================================================================ test session starts ================================================================================================
platform linux -- Python 3.8.13, pytest-7.1.1, pluggy-1.0.0
rootdir: /home/miguel/run3x/pybase
collected 210 items / 6 errors

api/flask_basic.py ..                                                                                                                                                                                         [  0%]
api/flask_json.py ...                                                                                                                                                                                         [  2%]
data_structures/abstract.py .                                                                                                                                                                                 [  2%]
data_structures/aliases.py ..                                                                                                                                                                                 [  3%]
data_structures/binary_heaps.py .                                                                                                                                                                             [  4%]
data_structures/binary_search_tree.py .                                                                                                                                                                       [  4%]
data_structures/binary_tree.py .                                                                                                                                                                              [  5%]
data_structures/data_types.py .                                                                                                                                                                               [  5%]
data_structures/deque.py .                                                                                                                                                                                    [  6%]
data_structures/dictionary.py .                                                                                                                                                                               [  6%]
data_structures/exceptions.py ..                                                                                                                                                                              [  7%]
data_structures/factory_classmethod.py .                                                                                                                                                                      [  8%]
data_structures/factory_dependency_injection.py ..                                                                                                                                                            [  9%]
data_structures/factory_dictionary.py .                                                                                                                                                                       [  9%]
data_structures/generator.py .                                                                                                                                                                                [ 10%]
data_structures/graph.py .                                                                                                                                                                                    [ 10%]
data_structures/graph_search.py ..                                                                                                                                                                            [ 11%]
data_structures/hash_table.py .                                                                                                                                                                               [ 11%]
data_structures/linked_list.py .                                                                                                                                                                              [ 12%]
data_structures/list_manipulation.py ............                                                                                                                                                             [ 18%]
data_structures/list_search.py ...                                                                                                                                                                            [ 19%]
data_structures/list_sort.py .....                                                                                                                                                                            [ 21%]
data_structures/method_resolution_order.py .....                                                                                                                                                              [ 24%]
data_structures/overload.py ...                                                                                                                                                                               [ 25%]
data_structures/queue.py .                                                                                                                                                                                    [ 26%]
data_structures/stack.py .                                                                                                                                                                                    [ 26%]
database/sqlite/create_table.py .                                                                                                                                                                             [ 27%]
database/sqlite/insert_values.py ..                                                                                                                                                                           [ 28%]
image_base/conversion.py ........                                                                                                                                                                             [ 31%]
image_base/opencv_features.py F                                                                                                                                                                               [ 32%]
image_base/opencv_segmentation.py .FF..                                                                                                                                                                       [ 34%]
image_base/opencv_transformation.py ........                                                                                                                                                                  [ 38%]
image_base/pil_transformation.py .....                                                                                                                                                                        [ 40%]
image_base/skimage_transformation.py .                                                                                                                                                                        [ 41%]
io_base/argument_io.py .                                                                                                                                                                                      [ 41%]
io_base/csv_io.py ..                                                                                                                                                                                          [ 42%]
io_base/dask_io.py ..                                                                                                                                                                                         [ 43%]
io_base/fastparquet_io.py ..                                                                                                                                                                                  [ 44%]
io_base/file_io.py ....                                                                                                                                                                                       [ 46%]
io_base/hdf5_io.py ..                                                                                                                                                                                         [ 47%]
io_base/json_io.py ..                                                                                                                                                                                         [ 48%]
io_base/numpy_io.py ..                                                                                                                                                                                        [ 49%]
io_base/opencv_io.py F..                                                                                                                                                                                      [ 50%]
io_base/pandas_io.py ....                                                                                                                                                                                     [ 52%]
io_base/pickle_io.py ..                                                                                                                                                                                       [ 53%]
io_base/pil_io.py ...                                                                                                                                                                                         [ 55%]
io_base/pyspark_io.py FFFFF                                                                                                                                                                                   [ 57%]
io_base/scikit_image_io.py ...                                                                                                                                                                                [ 59%]
io_base/yaml_io.py ..                                                                                                                                                                                         [ 60%]
log_base/formatting.py ....                                                                                                                                                                                   [ 61%]
log_base/logger.py .                                                                                                                                                                                          [ 62%]
log_base/timer.py .                                                                                                                                                                                           [ 62%]
machine_learning/activations.py .......                                                                                                                                                                       [ 66%]
machine_learning/dataset_split.py ..                                                                                                                                                                          [ 67%]
machine_learning/k_nearest_neighbor.py .                                                                                                                                                                      [ 67%]
machine_learning/metrics.py ..F........                                                                                                                                                                       [ 72%]
numpy_base/array_evaluation.py .....                                                                                                                                                                          [ 75%]
numpy_base/array_manipulation.py ....                                                                                                                                                                         [ 77%]
optimization/differential_evolution.py .                                                                                                                                                                      [ 77%]
optimization/downhill_simplex.py .                                                                                                                                                                            [ 78%]
optimization/functions.py .                                                                                                                                                                                   [ 78%]
pandas_base/apply_functions.py ...                                                                                                                                                                            [ 80%]
pandas_base/clean.py .....                                                                                                                                                                                    [ 82%]
pandas_base/conversion.py ......                                                                                                                                                                              [ 85%]
pandas_base/value_selection.py .................                                                                                                                                                              [ 93%]
pyspark_base/spark_conf.py F                                                                                                                                                                                  [ 93%]
system/paths.py ........                                                                                                                                                                                      [ 97%]
test/papermill_test.py ..                                                                                                                                                                                     [ 98%]
url_base/download_file.py ..                                                                                                                                                                                  [ 99%]
url_base/url_common.py .                                                                                                                                                                                      [100%]

====================================================================================================== ERRORS =======================================================================================================
_____________________________________________________________________________________ ERROR collecting io_base/azure_blob_io.py _____________________________________________________________________________________
io_base/azure_blob_io.py:10: in <module>
    from azure.storage.blob import BlockBlobService
E   ModuleNotFoundError: No module named 'azure'
_____________________________________________________________________________________ ERROR collecting numpy_base/benchmark.py ______________________________________________________________________________________
numpy_base/benchmark.py:3: in <module>
    from numba import vectorize
E   ModuleNotFoundError: No module named 'numba'
_____________________________________________________________________________________ ERROR collecting optimization/pytorch.py ______________________________________________________________________________________
optimization/pytorch.py:1: in <module>
    import torch
E   ModuleNotFoundError: No module named 'torch'
____________________________________________________________________________________ ERROR collecting optimization/tensorflow.py ____________________________________________________________________________________
optimization/tensorflow.py:1: in <module>
    import tensorflow as tf
E   ModuleNotFoundError: No module named 'tensorflow'
_________________________________________________________________________________________ ERROR collecting system/memory.py _________________________________________________________________________________________
system/memory.py:4: in <module>
    from numba import cuda
E   ModuleNotFoundError: No module named 'numba'
______________________________________________________________________________________ ERROR collecting system/system_info.py _______________________________________________________________________________________
system/system_info.py:11: in <module>
    from numba import cuda
E   ModuleNotFoundError: No module named 'numba'
===================================================================================================== FAILURES ======================================================================================================
____________________________________________________________________________ [doctest] pybase.image_base.opencv_features.largest_contour ____________________________________________________________________________
006
007     Args:
008         mask (np.array): Binary image.
009
010     Returns:
011         np.array: Array of points.
012
013     Examples:
014         >>> mask = cv2.imread('share/Lenna_mask.png', 0)
015         >>> cnts = largest_contour(mask)
UNEXPECTED EXCEPTION: ValueError('not enough values to unpack (expected 3, got 2)')
Traceback (most recent call last):
  File "/home/miguel/anaconda/envs/pybase/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pybase.image_base.opencv_features.largest_contour[1]>", line 1, in <module>
  File "/home/miguel/run3x/pybase/image_base/opencv_features.py", line 19, in largest_contour
    _, contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
ValueError: not enough values to unpack (expected 3, got 2)
/home/miguel/run3x/pybase/image_base/opencv_features.py:15: UnexpectedException
___________________________________________________________________________ [doctest] pybase.image_base.opencv_segmentation.bounding_box ____________________________________________________________________________
027     Args:
028         mask (np.array): Binary image.
029         max_contours (int): Maximum number of contours to consider for computing the bounding box.
030
031     Returns:
032         tuple: A tuple of integers defining x, y, width and height.
033
034     Examples:
035         >>> mask = cv2.imread('share/Lenna_mask.png', 0)
036         >>> bounding_box(mask)
UNEXPECTED EXCEPTION: ValueError('not enough values to unpack (expected 3, got 2)')
Traceback (most recent call last):
  File "/home/miguel/anaconda/envs/pybase/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pybase.image_base.opencv_segmentation.bounding_box[1]>", line 1, in <module>
  File "/home/miguel/run3x/pybase/image_base/opencv_segmentation.py", line 39, in bounding_box
    _, cnts, hierarchy = cv2.findContours(
ValueError: not enough values to unpack (expected 3, got 2)
/home/miguel/run3x/pybase/image_base/opencv_segmentation.py:36: UnexpectedException
______________________________________________________________________ [doctest] pybase.image_base.opencv_segmentation.color_clustering_kmeans ______________________________________________________________________
123     Args:
124         img (np.array): An image.
125         n_clusters (int): Number of clusters.
126
127     Returns:
128         list: A list of segmented masks.
129
130     Examples:
131         >>> img = cv2.imread('share/home.jpg')
132         >>> mask_list = color_clustering_kmeans(img, n_clusters=4, n_jobs=-1, n_init=10, max_iter=100)
UNEXPECTED EXCEPTION: TypeError("__init__() got an unexpected keyword argument 'n_jobs'")
Traceback (most recent call last):
  File "/home/miguel/anaconda/envs/pybase/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pybase.image_base.opencv_segmentation.color_clustering_kmeans[1]>", line 1, in <module>
  File "/home/miguel/run3x/pybase/image_base/opencv_segmentation.py", line 141, in color_clustering_kmeans
    model = KMeans(n_clusters=n_clusters, **kwargs).fit(reshaped)
TypeError: __init__() got an unexpected keyword argument 'n_jobs'
/home/miguel/run3x/pybase/image_base/opencv_segmentation.py:132: UnexpectedException
___________________________________________________________________________________ [doctest] pybase.io_base.opencv_io.read_image ___________________________________________________________________________________
028
029     Args:
030         filename (str): Name of the file.
031         is_color (bool): Read the image in color.
032
033     Returns:
034         np.array: An image.
035
036     Examples:
037         >>> img = read_image('share/Lenna.png')
UNEXPECTED EXCEPTION: TypeError("Argument 'flags' must be integer, not bool")
Traceback (most recent call last):
  File "/home/miguel/anaconda/envs/pybase/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pybase.io_base.opencv_io.read_image[0]>", line 1, in <module>
  File "/home/miguel/run3x/pybase/io_base/opencv_io.py", line 47, in read_image
    return cv2.imread(filename, is_color)
TypeError: Argument 'flags' must be integer, not bool
/home/miguel/run3x/pybase/io_base/opencv_io.py:37: UnexpectedException
_________________________________________________________________________________ [doctest] pybase.io_base.pyspark_io.read_csv_file _________________________________________________________________________________
069 Read a csv file using PySpark.
070
071     Args:
072         filename (str): Name of the file.
073
074     Returns:
075         spark.DataFrame: An dataframe.
076
077     Examples:
078         >>> import pyspark.sql.types as sptypes
UNEXPECTED EXCEPTION: ModuleNotFoundError("No module named 'pyspark'")
Traceback (most recent call last):
  File "/home/miguel/anaconda/envs/pybase/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pybase.io_base.pyspark_io.read_csv_file[0]>", line 1, in <module>
ModuleNotFoundError: No module named 'pyspark'
/home/miguel/run3x/pybase/io_base/pyspark_io.py:78: UnexpectedException
________________________________________________________________________________ [doctest] pybase.io_base.pyspark_io.read_csv_folder ________________________________________________________________________________
097
098     Args:
099         folder (str): Folder path.
100
101     Returns:
102         spark.DataFrame: An dataframe.
103
104     Examples:
105         >>> path = os.path.join("share", "traj_spark")
106         >>> df = read_csv_folder(spark, path, header=True, inferSchema=True)
UNEXPECTED EXCEPTION: AttributeError("'NoneType' object has no attribute 'read'")
Traceback (most recent call last):
  File "/home/miguel/anaconda/envs/pybase/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pybase.io_base.pyspark_io.read_csv_folder[1]>", line 1, in <module>
  File "/home/miguel/run3x/pybase/io_base/pyspark_io.py", line 112, in read_csv_folder
    return spark.read.csv(folder, **kwargs)
AttributeError: 'NoneType' object has no attribute 'read'
/home/miguel/run3x/pybase/io_base/pyspark_io.py:106: UnexpectedException
_________________________________________________________________________________ [doctest] pybase.io_base.pyspark_io.save_csv_file _________________________________________________________________________________
050     Args:
051         dataframe (spark.DataFrame): A dataframe.
052         filename (str): Name of the file.
053
054     Examples:
055         >>> if os.path.isfile("df_spark.csv"):
056         ...     os.remove("df_spark.csv")
057         >>> columns = ['id', 'dogs', 'cats']
058         >>> vals = [(1, 2, 0), (2, 0, 1)]
059         >>> df = spark.createDataFrame(vals, columns)
UNEXPECTED EXCEPTION: AttributeError("'NoneType' object has no attribute 'createDataFrame'")
Traceback (most recent call last):
  File "/home/miguel/anaconda/envs/pybase/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pybase.io_base.pyspark_io.save_csv_file[3]>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'createDataFrame'
/home/miguel/run3x/pybase/io_base/pyspark_io.py:59: UnexpectedException
________________________________________________________________________________ [doctest] pybase.io_base.pyspark_io.save_csv_folder ________________________________________________________________________________
006
007     Args:
008         dataframe (spark.DataFrame): A dataframe.
009         folder (str): Folder path.
010
011     Examples:
012         >>> shutil.rmtree("test_spark", ignore_errors=True)
013         >>> columns = ['id', 'dogs', 'cats']
014         >>> vals = [(1, 2, 0), (2, 0, 1)]
015         >>> df = spark.createDataFrame(vals, columns)
UNEXPECTED EXCEPTION: AttributeError("'NoneType' object has no attribute 'createDataFrame'")
Traceback (most recent call last):
  File "/home/miguel/anaconda/envs/pybase/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pybase.io_base.pyspark_io.save_csv_folder[3]>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'createDataFrame'
/home/miguel/run3x/pybase/io_base/pyspark_io.py:15: UnexpectedException
_____________________________________________________________________________ [doctest] pybase.io_base.pyspark_io.save_csv_folder_1file _____________________________________________________________________________
027
028     Args:
029         dataframe (spark.DataFrame): A dataframe.
030         folder (str): Folder path.
031
032     Examples:
033         >>> shutil.rmtree("test_spark_one", ignore_errors=True)
034         >>> columns = ['id', 'dogs', 'cats']
035         >>> vals = [(1, 2, 0), (2, 0, 1)]
036         >>> df = spark.createDataFrame(vals, columns)
UNEXPECTED EXCEPTION: AttributeError("'NoneType' object has no attribute 'createDataFrame'")
Traceback (most recent call last):
  File "/home/miguel/anaconda/envs/pybase/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pybase.io_base.pyspark_io.save_csv_folder_1file[3]>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'createDataFrame'
/home/miguel/run3x/pybase/io_base/pyspark_io.py:36: UnexpectedException
____________________________________________________________________ [doctest] pybase.machine_learning.metrics.classification_metrics_multilabel ____________________________________________________________________
076         y_pred (list or np.array): Predicted labels.
077         labels (list): Label index or name.
078
079     Returns:
080         dict: Dictionary with metrics.
081
082     Examples:
083         >>> y_true = [0,1,2,0,1]
084         >>> y_pred = [0,1,0,1,1]
085         >>> result = classification_metrics_multilabel(y_true, y_pred, [0,1,2])
UNEXPECTED EXCEPTION: TypeError('f1_score() takes 2 positional arguments but 3 positional arguments (and 1 keyword-only argument) were given')
Traceback (most recent call last):
  File "/home/miguel/anaconda/envs/pybase/lib/python3.8/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest pybase.machine_learning.metrics.classification_metrics_multilabel[2]>", line 1, in <module>
  File "/home/miguel/run3x/pybase/machine_learning/metrics.py", line 92, in classification_metrics_multilabel
    m_f1 = f1_score(y_true, y_pred, labels, average="weighted")
TypeError: f1_score() takes 2 positional arguments but 3 positional arguments (and 1 keyword-only argument) were given
/home/miguel/run3x/pybase/machine_learning/metrics.py:85: UnexpectedException
__________________________________________________________________________________ [doctest] pybase.pyspark_base.spark_conf.spark ___________________________________________________________________________________
032         >>> config = {"spark.executor.cores": "8"}
033         >>> config.update({"spark.executor.memory": "16g"})
034         >>> config.update({"spark.memory.fraction": "0.9"})
035         >>> config.update({"spark.memory.stageFraction": "0.3"})
036         >>> config.update({"spark.executor.instances": 1})
037         >>> config.update({"spark.executor.heartbeatInterval": "36000s"})
038         >>> config.update({"spark.network.timeout": "10000000s"})
039         >>> config.update({"spark.driver.maxResultSize": "50g"})
040         >>> spark = spark(config=config) # doctest: +SKIP
041         >>> spark is not None
Expected:
    True
Got:
    False

/home/miguel/run3x/pybase/pyspark_base/spark_conf.py:41: DocTestFailure
================================================================================================= warnings summary ==================================================================================================
../../anaconda/envs/pybase/lib/python3.8/site-packages/fastparquet/util.py:20
  /home/miguel/anaconda/envs/pybase/lib/python3.8/site-packages/fastparquet/util.py:20: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    PANDAS_VERSION = LooseVersion(pd.__version__)

notebooks/notebook_memory_management.py:35
  /home/miguel/run3x/pybase/notebooks/notebook_memory_management.py:35: UserWarning: Not running on notebook
    warnings.warn("Not running on notebook")

../../anaconda/envs/pybase/lib/python3.8/site-packages/ansiwrap/core.py:6
  /home/miguel/anaconda/envs/pybase/lib/python3.8/site-packages/ansiwrap/core.py:6: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

test/pytest_general.py:10
  /home/miguel/run3x/pybase/test/pytest_general.py:10: PytestUnknownMarkWarning: Unknown pytest.mark.system - is this a typo?  You can register custom marks to avoid this warning - for details, see https://docs.pytest.org/en/stable/how-to/mark.html
    @pytest.mark.system

data_structures/list_manipulation.py::pybase.data_structures.list_manipulation.split_list
  /home/miguel/run3x/pybase/data_structures/list_manipulation.py:203: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    splits = splits.round().astype(np.int)

image_base/skimage_transformation.py::pybase.image_base.skimage_transformation.resize_image
  /home/miguel/run3x/pybase/image_base/skimage_transformation.py:25: FutureWarning: The use of this function is discouraged as its behavior may change dramatically in scikit-image 1.0. This function will be removed in scikit-image 1.0.
    return convert(img_new, dtype=img.dtype)

test/papermill_test.py::test_notebook_runs
test/papermill_test.py::test_notebook_fails
  /home/miguel/anaconda/envs/pybase/lib/python3.8/site-packages/traitlets/config/configurable.py:85: DeprecationWarning: Passing unrecognized arguments to super(PapermillNotebookClient).__init__(input_path='test/papermill_notebook.ipynb').
  object.__init__() takes exactly one argument (the instance to initialize)
  This is deprecated in traitlets 4.2.This error will be raised in a future release of traitlets.
    super(Configurable, self).__init__(**kwargs)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================================================================================= slowest durations =================================================================================================
7.01s call     optimization/differential_evolution.py::pybase.optimization.differential_evolution.optimize_function
3.10s call     test/papermill_test.py::test_notebook_fails
2.47s call     test/papermill_test.py::test_notebook_runs
2.00s call     log_base/timer.py::pybase.log_base.timer.Timer
1.64s call     machine_learning/k_nearest_neighbor.py::pybase.machine_learning.k_nearest_neighbor.knn
0.76s call     image_base/opencv_segmentation.py::pybase.image_base.opencv_segmentation.grabcut_rect
0.45s call     io_base/pil_io.py::pybase.io_base.pil_io.read_image_url
0.33s call     io_base/opencv_io.py::pybase.io_base.opencv_io.read_image_url
0.33s call     io_base/scikit_image_io.py::pybase.io_base.scikit_image_io.read_image_url
0.26s call     url_base/download_file.py::pybase.url_base.download_file.maybe_download
0.25s call     image_base/opencv_segmentation.py::pybase.image_base.opencv_segmentation.grabcut_mask
0.25s call     url_base/download_file.py::pybase.url_base.download_file.download_path
0.15s call     image_base/opencv_transformation.py::pybase.image_base.opencv_transformation.convert_to_colorspace
0.13s call     image_base/opencv_transformation.py::pybase.image_base.opencv_transformation.normalize_image
0.10s call     io_base/pandas_io.py::pybase.io_base.pandas_io.save_to_sqlite
0.08s call     io_base/dask_io.py::pybase.io_base.dask_io.read_csv
0.07s call     database/sqlite/create_table.py::pybase.database.sqlite.create_table.create_table
0.06s call     image_base/conversion.py::pybase.image_base.conversion.image_cv2pil
0.05s call     optimization/downhill_simplex.py::pybase.optimization.downhill_simplex.optimize_function
0.04s call     image_base/skimage_transformation.py::pybase.image_base.skimage_transformation.resize_image
0.04s call     io_base/pandas_io.py::pybase.io_base.pandas_io.read_csv
0.04s call     pandas_base/conversion.py::pybase.pandas_base.conversion.add_row
0.04s call     image_base/conversion.py::pybase.image_base.conversion.image_cv2plt
0.03s call     io_base/scikit_image_io.py::pybase.io_base.scikit_image_io.read_image
0.03s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.symmetric_difference
0.02s call     io_base/opencv_io.py::pybase.io_base.opencv_io.save_image
0.02s call     pandas_base/conversion.py::pybase.pandas_base.conversion.replace_column_values
0.02s call     image_base/conversion.py::pybase.image_base.conversion.image_pil2plt
0.02s call     image_base/conversion.py::pybase.image_base.conversion.image_pil2scipy_array
0.02s call     image_base/conversion.py::pybase.image_base.conversion.image_plt2pil
0.02s call     io_base/pil_io.py::pybase.io_base.pil_io.save_image
0.02s call     image_base/conversion.py::pybase.image_base.conversion.image_scipy_numpy2pil
0.02s call     image_base/conversion.py::pybase.image_base.conversion.image_plt2cv
0.02s call     io_base/scikit_image_io.py::pybase.io_base.scikit_image_io.save_image
0.02s call     pandas_base/conversion.py::pybase.pandas_base.conversion.split_text_in_column
0.02s call     pandas_base/conversion.py::pybase.pandas_base.conversion.convert_cols_numeric_to_categorical
0.02s call     api/flask_basic.py::pybase.api.flask_basic.hello_user
0.02s call     io_base/pandas_io.py::pybase.io_base.pandas_io.read_from_sqlite
0.02s call     database/sqlite/insert_values.py::pybase.database.sqlite.insert_values.insert_row
0.02s call     image_base/conversion.py::pybase.image_base.conversion.image_pil2cv
0.02s call     database/sqlite/insert_values.py::pybase.database.sqlite.insert_values.insert_csv
0.02s teardown url_base/url_common.py::pybase.url_base.url_common.get_image_name
0.02s call     pandas_base/conversion.py::pybase.pandas_base.conversion.convert_related_cols_categorical_to_numeric
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.split_rows_by_condition
0.01s call     pandas_base/clean.py::pybase.pandas_base.clean.replace_nan
0.01s call     image_base/pil_transformation.py::pybase.image_base.pil_transformation.normalize_image
0.01s call     machine_learning/metrics.py::pybase.machine_learning.metrics.classification_metrics_binary_prob
0.01s call     image_base/pil_transformation.py::pybase.image_base.pil_transformation.resize_image
0.01s call     pandas_base/apply_functions.py::pybase.pandas_base.apply_functions.apply_function_elementwise_dataframe
0.01s call     io_base/dask_io.py::pybase.io_base.dask_io.save_csv
0.01s call     pandas_base/conversion.py::pybase.pandas_base.conversion.convert_cols_categorical_to_numeric
0.01s call     pandas_base/clean.py::pybase.pandas_base.clean.drop_duplicates
0.01s call     image_base/pil_transformation.py::pybase.image_base.pil_transformation.equalize_image
0.01s call     log_base/logger.py::pybase.log_base.logger.setup_logger
0.01s call     image_base/opencv_segmentation.py::pybase.image_base.opencv_segmentation.apply_mask_to_image
0.01s call     image_base/opencv_transformation.py::pybase.image_base.opencv_transformation.convert_to_binary
0.01s call     machine_learning/dataset_split.py::pybase.machine_learning.dataset_split.split_train_val_test
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.intersection
0.01s call     image_base/pil_transformation.py::pybase.image_base.pil_transformation.convert_to_grayscale
0.01s call     image_base/pil_transformation.py::pybase.image_base.pil_transformation.crop_image
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.get_random_number_of_rows
0.01s call     image_base/opencv_transformation.py::pybase.image_base.opencv_transformation.resize_image
0.01s call     io_base/fastparquet_io.py::pybase.io_base.fastparquet_io.read_file
0.01s call     image_base/opencv_transformation.py::pybase.image_base.opencv_transformation.convert_to_grayscale
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.set_value_where_condition
0.01s call     pandas_base/clean.py::pybase.pandas_base.clean.remove_nan
0.01s call     image_base/opencv_transformation.py::pybase.image_base.opencv_transformation.crop_image
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.select_any_cols_where_operation_on_value
0.01s call     machine_learning/metrics.py::pybase.machine_learning.metrics.classification_metrics_binary
0.01s call     pandas_base/clean.py::pybase.pandas_base.clean.drop_columns
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.get_random_fraction_of_rows
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.select_all_cols_where_operation_on_value
0.01s call     numpy_base/array_manipulation.py::pybase.numpy_base.array_manipulation.concatenate_arrays
0.01s call     io_base/fastparquet_io.py::pybase.io_base.fastparquet_io.save_file
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.set_value_where_multiple_condition
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.select_values_by_range
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.select_rows_where_list_equal
0.01s call     pandas_base/clean.py::pybase.pandas_base.clean.drop_rows
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.select_values_by_index
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.select_cols_with_nan
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.select_all_columns_except_some
0.01s call     pandas_base/apply_functions.py::pybase.pandas_base.apply_functions.apply_function_on_axis_dataframe
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.select_rows_where_value_equal
0.01s call     api/flask_basic.py::pybase.api.flask_basic.hello_world
0.01s call     numpy_base/array_evaluation.py::pybase.numpy_base.array_evaluation.array_difference
0.01s call     pandas_base/value_selection.py::pybase.pandas_base.value_selection.select_cols_without_nan

(544 durations < 0.005s hidden.  Use -vv to show these durations.)
============================================================================================== short test summary info ==============================================================================================
FAILED image_base/opencv_features.py::pybase.image_base.opencv_features.largest_contour
FAILED image_base/opencv_segmentation.py::pybase.image_base.opencv_segmentation.bounding_box
FAILED image_base/opencv_segmentation.py::pybase.image_base.opencv_segmentation.color_clustering_kmeans
FAILED io_base/opencv_io.py::pybase.io_base.opencv_io.read_image
FAILED io_base/pyspark_io.py::pybase.io_base.pyspark_io.read_csv_file
FAILED io_base/pyspark_io.py::pybase.io_base.pyspark_io.read_csv_folder
FAILED io_base/pyspark_io.py::pybase.io_base.pyspark_io.save_csv_file
FAILED io_base/pyspark_io.py::pybase.io_base.pyspark_io.save_csv_folder
FAILED io_base/pyspark_io.py::pybase.io_base.pyspark_io.save_csv_folder_1file
FAILED machine_learning/metrics.py::pybase.machine_learning.metrics.classification_metrics_multilabel
FAILED pyspark_base/spark_conf.py::pybase.pyspark_base.spark_conf.spark
ERROR io_base/azure_blob_io.py - ModuleNotFoundError: No module named 'azure'
ERROR numpy_base/benchmark.py - ModuleNotFoundError: No module named 'numba'
ERROR optimization/pytorch.py - ModuleNotFoundError: No module named 'torch'
ERROR optimization/tensorflow.py - ModuleNotFoundError: No module named 'tensorflow'
ERROR system/memory.py - ModuleNotFoundError: No module named 'numba'
ERROR system/system_info.py - ModuleNotFoundError: No module named 'numba'
=============================================================================== 11 failed, 199 passed, 8 warnings, 6 errors in 25.80s ========================================

uncompress tar

def extract_tar_gz(filename, path):
    with tarfile.open(filename, "r:gz"):
        tar.extractall(path=path)
    return os.path.join(path, filename.split(".tar.gz")[0])

BUG not possible to get the filename where the data is downloaded

def _save_image(raw_image, image_type, save_directory):
    os.makedirs(save_directory, exist_ok=True)
    extension = image_type if image_type else "jpg"
    file_name = str(uuid.uuid4().hex) + "." + extension
    save_path = os.path.join(save_directory, file_name)
    with open(save_path, "wb+") as image_file:
        image_file.write(raw_image)

the file name is a random name instead of a name related to the file

related to #17

coocurrence

#df = pd.DataFrame({"users": [0,1,2,0,0,1], "items":[0,1,0,0,0,0]}) #ok
#df = pd.DataFrame({"users": [0,1,2,0,0,1], "items":[1,1,0,0,0,0]}) #ok
df = pd.DataFrame({"users": [0,1,1,0,0,1], "items":[0,1,1,0,0,0]})

print(df["users"].nunique())
print(df["items"].nunique())
df

user_item_hits = sparse.coo_matrix(
            (np.repeat(1, df.shape[0]), 
             (df["users"], df["items"])
            ),
            shape=(df["users"].nunique(), df["items"].nunique()),
        ).tocsr()
user_item_hits.toarray()

item_cooccurrence = user_item_hits.transpose().dot(user_item_hits)
item_cooccurrence.toarray()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.