absaoss / fixed-width Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Improve or add code coverage support to be able measure code quality.
New ability to measure current code coverage as one of QA metrics.
Add GH action to check changed file coverage.
Add support for spark 3.2.1
If a special column, which name is specified by Spark spark.sql.columnNameOfCorruptRecord
setting, exists in the schema, it is used to log parsing errors into it.
If the column of the name specified in spark.sql.columnNameOfCorruptRecord
exists in the schema, use it for logging parsing errors there.
Related to and dependent on #1
If the compulsory width
metadata value type is not string, it's not recognized. Integer type value should be accepted too.
Define a schema, where the metadata value of a field is an integer value, not string. (e.g. 5 instead of 5). Processing will fail with an exception: "Unable to parse metadata: width of column:..."
Integer definition of width
is accepted.
Options are passed to Spark data sources as key-value pairs. It is very easy to make a typo in an option name. If the data source has a default behaviour for this option (e.g. the option is not mandatory), the passed option will be unrecognized and therefore ignored.
It would be very useful if the data source would detect unrecognized options.
.option("pedantic", true)
, which, when enabled, will fail the Spark Application if there is at least one unrecognized option passed to the source.As an inspiration for possible implementation, Cobrix uses a wrapper that takes incoming options. All configuration is read from that wrapper instead of the original configuration. The wrapper records which options are queried at least once. At any given point it allows getting the list of options that haven't been used.
An advantage of such implementation is that if there are dependencies between options, e.g. a particular option is only used if some other option is enabled, it is tracked automatically.
Description
build.sbt
was not ready for the cross-compilation signed release. Figure out how to do it best. Steal it from Cobrix.
When dateFormat
is specified it either represents a date or a date + time. That effectively disables to be able both timestamp and date within the same file.
Add timestampFormat
parameter used similarly as dateFormat
and make each of them exclusive to use with timestamp and date respectively.
Spark has the mode
setting to describe readers behavior in case of parsing error.
Reflect PERMISSIVE
and FAILFAST
settings in case of parsing errors.
FAILFAST
- in case of parsing error, the process immediately fails with exceptionPERMISSIVE
- the parsing continues with best effort resultsAdd code coverage support to be able measure code quality.
New ability to measure current code coverage as one of QA metrics.
usually the values in fixed width forma columns might have some trailing spaces. That can disable parsing of non-string types.
Create a code accepting a metadata setting driving the trimming of the column values, overriding the global setting.
Suggested metadata field name: trim
.
Accepted type boolean
or string
convertible to boolean.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.