Comments (15)
Thanks for your feedback.
Are you able to submit your data so I can test this?
from data-forge-ts.
Attached
test-data.csv.zip
from data-forge-ts.
Thanks. Now you need to help me understand the code that produces the problem.
I've tried the following code and it seems ok:
const dataForge = require("data-forge");
require("data-forge-fs");
const df = await dataForge.readFile("./test-data.csv").parseCSV();
display(df.detectTypes());
Please let me know what changes you would make to that code to produce the problem.
I used Data-Forge Notebook to test that code.
from data-forge-ts.
This is what the result looks like in Data-Forge Notebook:
from data-forge-ts.
const dataFrame = fromCSV(csvInput, { dynamicTyping: true });
console.log(df.detectTypes().toString());
__index__ Type Frequency Column
--------- ------- ------------------ -------------
0 string 8.1 phone
1 object 91.9 phone
const updatedDf = dataFrame.transformSeries({ ['phone']: value => value && `+1 ${value}` });
console.log(df.detectTypes().toString());
__index__ Type Frequency Column
--------- --------- ------------------ -------------
0 string 8.1 phone
1 undefined 91.9 phone
Seems like theres a discrepancy between how types are evaluated in fromCSV and after a transformSeries. I have also tried something like:
dataFrame.transformSeries({ ['phone']: value => value ? `+1 ${value}` : null });
from data-forge-ts.
I think your use of transformSeries is wrong.
I suspect it should look like this:
dataFrame.transformSeries({ phone: value => value ? `+1 ${value}` : null });
Also I'm not really sure why you are outputting null as value.
Typically Data-Forge treats undefined as an 'empty' value. But I don't think it does anything special with null.
I think the main problem here is that PapaParse (the CSV parser that Data-Forge uses) is returning null values for empty fields when dynamicTyping
is enabled. I might be able to fix this by making Data-Forge treat nulls the same as undefined.
from data-forge-ts.
I need to think this through.
For now though you might use this workaround:
Basically rewrite null with undefined and it might work the way you expect.
In the meantime I'll see if I can fix this properly.
from data-forge-ts.
@smohiuddin Just letting you know I've just published Data-Forge version 1.7.7 and it now treats null values like it does undefined. This should solve your problem.
Please try it out and let me know if it works for you.
from data-forge-ts.
Thanks for this fix. After further investigation, it appears that data forge is only checking the first element of a column to determine the type instead of the distribution from detectTypes.
I am attaching the same data set as before except I added a string as the first element in the zip column and a number as the first element in the city state column.
data forge now detects these column types as string and number respectively.
from data-forge-ts.
Ah, well that's something you can already change!
When you construct your dataframe you should use the considerAllRows
option:
const dataframe = new DataFrame({ values: [...], considerAllRows: true });
The reason it doesn't do that by default is because it can be quite expensive to compute for large data sets!
from data-forge-ts.
makes sense!
Whats the best way to do that if using readFile or parseCSV since those options are not available through there.
from data-forge-ts.
I was about to say you shouldn't have this problem if you are using readFile/parseCSV, but then I realise it's again related to the use of dynamicTyping
.
You might be best to remove the dynamicTyping
field because it doesn't seem to play nicely with your data (this is a feature of PapaParse and I don't have much control over it).
If you remove that you can manually type your data.
See the example on the landing page where it calls parseInts
and parseDates
to manually parse data types from columns:
from data-forge-ts.
The issue is we want to auto-detect the types as we may not know it in advance. can the data forge constructor accept the output of readFileSync?
from data-forge-ts.
This is a change that could definitely be made. Would you like to have a go at it? Be great to have you as a contributor.
I think you would need to make a change to the fromCSV
function in Data-Forge and the parseCSV
in Data-Forge FS.
If you are interested in doing this maybe we can discuss it more in the Data-Forge Gitter channel?
from data-forge-ts.
Closing due to no activity. Please reopen if this issue still needs attention.
from data-forge-ts.
Related Issues (20)
- 'withSeries' of DataFrame not working correctly after filtering by 'where' HOT 2
- Rolling window with select function not behaving correctly HOT 4
- What am I missing? (inconsistent math when generating a new column) HOT 3
- Console logs in construction DataFrame HOT 3
- Tyopescript compatible new column / generateSeris HOT 1
- Data frame case sensitivity is not carried over after Merge Transform is applied HOT 3
- How to stringify entire table without boiler plate content HOT 2
- Case Sensitivity Bug in Several Transforms HOT 1
- do we have resample function similar in Pandas? HOT 1
- Can data-forage-ts transpose index and columns? HOT 1
- DataFrame distinct with multiple columns in the selector is not working if the name of the columns is variable HOT 4
- Bug on the Series toArray method removing all the null values HOT 2
- Explode functionallity HOT 3
- Improve documentation for `DataFrame.concat` in relation to index HOT 5
- Columns are badly formatted HOT 5
- Pivot seems to not respect lazy evaluation HOT 2
- getColumns(), getColumnNames(), hasSeries() don't see all columns if df is loaded fromJson() HOT 3
- merge() results in concat if index is a Date object HOT 3
- generateSeries() on a df with duplicated index uses last row results for the given index value HOT 2
- fromCSV : TypeError: row.map is not a function HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from data-forge-ts.