Comments (8)
@brylie Thanks for your feedback!
We have been actively developing the SFrame for over 4 years now. There are many reasons why we like SFrame.
- Its out of core so you can work with really large datasets
- Lazy evaluation lets you work interactively even on really large datasets
- Its fast! and has some awesome compression techniques that make sure you can do more with less
- Because it is written in C++, we can have support for multiple-languages in the future
- Its parallel
- Built in visualization tools that can handle a lot of data (in a streaming way, so you can see your plots right away)
We love the pandas project and have met with the creators and contributors many times and have all the respect for pandas. For that reason Pandas and SFrame are fully interoperable through to_dataframe and construction.
What can help is the following:
- We can provide more clarification with the user guide on the differences with Pandas and SFrame
- Add a user guide chapter on inter-op between Pandas and SFrame
from turicreate.
@brylie To clarify re: SFrame project status, the SFrame codebase is still under active development; however, it's no longer developed or released as its own project. Development of SFrame has been folded into turicreate and is ongoing here in this repo.
from turicreate.
Ah, it wasn't clear ay first glance that SFrame is part of Turi Create. With such a tight relationship between SFrame and Turi Create, will there also be pandas support?
We can provide more clarification with the user guide on the differences with Pandas and SFrame
At the risk of drifting off topic, what is the overlap between SFrame and pandas, and what are significant differences?
For note, I am coming from a JavaScript background where there is a lot of churn and bikeshedding. I am concerned, in general, when there is duplication among open source projects where resources are somewhat scarce. One thing I have appreciated about Python data science tools is that ecosystem matters - meaning projects seem to build on common foundations, share consistent APIs, and attain higher levels of usability and abstraction than would otherwise be possible in a more fragmented environment.
from turicreate.
Also, how does SFrame compare with the design goals and foundations of Dask? E.g.
Dask is composed of ... "Big Dataβ collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments.
from turicreate.
Ah, it wasn't clear ay first glance that SFrame is part of Turi Create. With such a tight relationship between SFrame and Turi Create, will there also be pandas support?
As I mentioned, SFrame and Pandas data frames are already deeply compatible. The APIs working with SFrames allows us to push a lot of optimizations into the model creation process that would otherwise not be possible by using Pandas.
At the risk of drifting off topic, what is the overlap between SFrame and pandas, and what are significant differences?
I tried outlining the key differences above. The overlap in terms of functionality is pretty high. The SFrame is pretty full featured and we plan to keep building on it and adding more things as we find gaps. The APIs are pretty similar so it should be quite natural to use them.
Also, how does SFrame compare with the design goals and foundations of Dask? E.g.
Dask is another great project that we are aware of. It started around a year after the SFrame and at its core is a task scheduler that can help scale up Pandas using pure python primitives. It has some of the advantages of SFrame (parallelism, scale, interop with pandas etc.) but not all of them (can be extended to multiple languages, lazy evaluation, compression, interop with pandas etc.)
from turicreate.
@srikris thanks for your clarifications. It would be interesting to read a more thorough article outlining the above concepts, similarities, differences - as well as perhaps a plan to be good stewards of the data science ecosystem. Thanks for your innovative work :-)
from turicreate.
I would just like to clarify that my intention here is not to diminish this project or the talent of its contributors.
For a slightly broader context, companies like Apple, Facebook, Google, and Microsoft exert a lot of influence over the developer community, with starry eyed developers eager to try the latest offerings. In the sometimes competitive platform landscape (iOS, Android, and cloud services such as Azure, GCE and AWS) as well as in an effort to attract talented developers, the large players often release open source platforms/frameworks to exert some leverage over the ecosystem. This can be somewhat evidenced by looking at the JavaScript frontend ecosystem, where many options with significantly overlapping purpose and design compete for adoption and mindshare. Competition comes at a cost of fragmentation and somewhat subverts efforts to standardize technologies in a vendor neutral, cross-cutting manner (e.g. web standards).
I recognize that evolution takes diversity and redundancy. I am just concerned about varying motives (both pragmatic and competitive), and for the overall health and cooperation of the open source community.
from turicreate.
How does SFrame relate to Apache Arrow? Might there be some parallel goals between Arrow and SFrame that might serve as a broader foundation for Turi Create? E.g.
The Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead. It is also focused on supporting a wide variety of industry-standard programming languages. Apache Arrow is backed by key developers of 13 major open source projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it the de-facto standard for columnar in-memory analytics.
from turicreate.
Related Issues (20)
- Object detection - Segfault after a large number of iterations
- available data sets in turicreate
- Mac M2 model.export_coreml('.mlmodel') Unable to export model HOT 1
- TuriCreate still doesn't work on M1 using rosetta terminal HOT 7
- While training object_detector in colab randomly Using CPU/GPU to create model.
- Trying to create a model on a larger dataset - Loss stuck at the same number and not moving, resulting model predictions detect nothing
- Support Python 3.9 HOT 1
- pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
- Simple Image Classification Model gives different confidence level (Between Coreml UI and iOS App)
- pip dependency conflicts: conda-repo-cli 1.0.20 requires nbformat==5.4.0, but you have nbformat 5.7.3 which is incompatible. HOT 1
- AttributeError: module 'numpy' has no attribute 'typeDict' HOT 1
- Cannot install and import TuriCreate HOT 1
- Columns and DataType Not Explicitly Set on line 611 of sgraph.py
- Error While Installing Turicreate to my Windows via WSL HOT 1
- Benzinga error
- when you planning run it on windows natively (not wsl)
- MacOS ,When install dydx-python ,encounter some ERRORS , how to solve the problem? A lot thanks.
- Can't run DreamBooth in Gcolab
- Converting sframe to csv
- TuriCreate: Human Activity Classifier Model Deployment and result on unseen test dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from turicreate.