GithubHelp home page GithubHelp logo

typecobolteam / typecobol Goto Github PK

View Code? Open in Web Editor NEW
77.0 12.0 25.0 59.17 MB

TypeCobol is an Incremental Cobol parser for IBM Enterprise Cobol 6 for zOS syntax. TypeCobol is also an extension of Cobol 85 language which can then be converted to Cobol85.

License: Other

C# 80.40% ANTLR 9.10% COBOL 10.39% Smalltalk 0.01% C++ 0.04% PHP 0.01% PowerShell 0.04%
cobol parser transpiler cobol85 incremental-cobol-parser languageserver

typecobol's Introduction

Build status

TypeCobol

TypeCobol is two things:

  • An open source Cobol 85 incremental parser (+Typedef of Cobol 2002)
  • An extension of Cobol 85 language (named TypeCobol) which can then be converted to Cobol 85
    • Like TypeScript with JavaScript

Open source parser

Our parser is based on IBM Enterprise Cobol 5.1 for zOS syntax. We'll certainly implement IBM Enterprise Cobol 6 in 2018/19.

This parser can be used :

Extension of Cobol 85 syntax

TypeCobol extends Cobol 85 with the following features:

  • Type mechanism (like TypeDef of Cobol 2002)
    • TypeCobol comes with intrinsic types: Boolean, Date, ...
  • Procedures :
    • Procedure looks like nested program but with shorter syntax and parameters are clearly categorized as input, in-out or output
    • Arguments of a procedure must match between Caller and procedure signature
    • We support procedures overloading
  • Operator :: which allow to qualify a variable starting with the top most variable
    • Same behavior as operators of and in, but you have to start with the parent variable

TypeCobol code is then translated to Cobol 85 compliant with IBM Enterprise Cobol 5.1 for zOS syntax.

Integration with IDE

We provide minimal integration with RDZ.

We also have an integration with RDZ and our LanguageServer. This is still a work in progress and the RDZ plugin is currently private. Maybe this will change in the future.

The LanguageServer allows us to provide:

  • Errors in real time as you type your code
  • Code completion for Type, variables, procedures and operator ::
  • Go to a definition of a variable

Project status and documentation

This project is currently maintained by 4 persons and our company starts to use it since July 2017.

The documentation is still very limited. If you are interested don't hesitate to contact us so we can give you more information.

Architecture overview

Visual Studio solution and projects

The best way to test this project is to download and install both tools (for free) on your local machine, login to Github from Visual Studio Team Explorer, then refresh this page and click on the Open in Visual Studio button which should appear on the right of the repository : this action will clone the solution in your local Git repository and open it in Visual Studio.

The solution contains these projects :

  • TypeCobol is the main project, it implements a complete Cobol compiler front-end
  • TypeCobol.Test provides unit tests which can be launched from the Test Explorer in Visual Studio
  • Codegen provides the mechanism to transform TypeCobol code to Cobol 85
  • TypeCobol.LanguageServer implements the LanguageServer protocol
  • TypeCobol.Transform is useful to store the TypeCobol source code and the generated Cobol85 code into one single file.

Dependencies on third party librairies

The following librairies are included in the Visual Studio projects by the Nuget package manager :

  • ANTLR 4 : The C# target of the ANTLR 4 parser generator for Visual Studio 2010+ projects.

  • ANTLR 4 Runtime : The runtime library for parsers generated by the C# target of ANTLR 4. This package supports projects targeting .NET 2.0 or newer, and built using Visual Studio 2008 or newer.

  • Reactive Extensions - Main Library : Reactive Extensions Main Library combining the interfaces, core, LINQ, and platform services libraries. The Reactive Extensions (Rx) is a library to compose asynchronous and event-based programs using observable collections and LINQ-style query operators.

  • System.Collections.Immutable : This package provides collections that are thread safe and guaranteed to never change their contents, also known as immutable collections. Like strings, any methods that perform modifications will not change the existing instance but instead return a new instance. For efficiency reasons, the implementation uses a sharing mechanism to ensure that newly created instances share as much data as possible with the previous instance while ensuring that operations have a predictable time complexity.

typecobol's People

Contributors

brochato avatar cobarzanc avatar collarbe avatar delevoye avatar delmasgu avatar efr15 avatar fm-117 avatar grespise avatar laabidihend avatar lanoydo avatar laurentprudhon avatar maxime645 avatar mayanje avatar osmedile avatar parandba avatar prudholu avatar reydelpa avatar rooksdo avatar smedilol avatar tenember avatar vavans avatar wiztigers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

typecobol's Issues

DEBUG MODE

Each COBOL program has to be valid wether or not the DEBUG flag is active or not.
Such a DEBUG flag should be added on each token at first phase and checked at second phase

Beginner's Guide

Create a wiki page containing a crash course on github and the project.

ON ERROR

Is the following syntax valid COBOL ?

READ fileName
   NOT AT END
      * imperative statements *
   AT END
      * imperative statements *
   AT END
      * imperative statements *
   NOT AT END
      * imperative statements *

The spec says AT? END must come first (if present), that there must be only one, and that it can be followed by only none or one NOT AT? END.

Upgrade test engine

  • One must be able to just put a .cbl source file somewhere under TypeCobol.Test/Compiler/Parser/Samples and a .txt expected results file somewhere under TypeCobol.Test/Compiler/Parser/ResultFiles.
  • The unit test engine should find the two files (provided the result file has the same filename without extension as the source file), parse the .cbl file and compare it against the contents of the .txt file.
  • The contents of the expected result files should not be too hard to write, and as lisible as possible. In particular, expected results should be written/read sequentially, line by line.

Memory leak on syntax errors

I was testing ADD statements that should be in error. I tested just writing "ADD" in my test file, ran the tests, and got a OutOfMemoryException after a few moments.

This seems to happen each time I write a line that is unsupported by the grammar.
Parsing of multiple line instructions that are syntactically correct do end, though.

Worth investigating ! ಠ_ಠ

.dll Purity Check

Check that TypeCobol binaries do not contain neither disk acess nor network access.

MnemonicForEnvironmentName can use the same name as an EnvironmentName

In Cobol, it's possible to define the mnemonic SYSIN which refers to environmentName SYSOUT.
Parser (CodeElementBuilder), can't know if the name encountered after the keyword "upon" in a display statement refers to a MnemonicForEnvironmentName or an EnvironmentName.

The current code assume the opposite.

"Invisible" exceptions

If a statement to be tested throws an exception somewhere in the parser (currently: NotImplementedExceptions), it is not directly seen by the test runner.
For example : you have a clean test, you add a statement at the end that provokes an exception, but your test will be a success nevertheless.

Identifiers beginning with digits

BLOCKS ISSUE #23

Seen in real-life code: PERFORM 0000-INIT-STD, as the first line of a PROCEDURE DIVISION.
Are identifier names allowed to begin with digits ?
If yes, in which context(s) ? Always ?

Error if 2 or more consecutive line continuations.

In fixed length format COBOL source files, each line beginning with a dash is appended to the previous. When two such lines are consecutive, the parsing fails with an extraneous input error.

For example, these lines provoke the error :

061020                 MOVE 'Lorem ipsum dolor sit amet, consectetur adi0000000
061030-               'piscing elit, sed do eiusmod tempor incididunt ut0000000
061040-               'labore et dolore magna aliqua                 '  0000000
061050                                          TO  SOMEWHERE           0000000

Orphan END-STATEMENT

BLOCKS ISSUE #23

It seems one can find statement scope terminators (END-IF, END-ADD and so on) anywhere in the code.
I've seen real-life examples of IF condition THEN statements END-IF END-IF, and the program compiled. I suppose the "orphaned" statement terminator is treated like a CONTINUE statement and has thus no effect.

However, is it possible to find such an orphan statement in the middle of another statement ?
For example between two nested statements, like:
IF condition THEN statement1 END_ADD statement2 END_IF
Or even in the middle of a statement, like:
ADD x TO y END-IF GIVING z

Intrinsic functions

Implement all intrinsic functions.
They may have to be sorted by return type to allow proper substitution as identifiers.

For example, these do not pass:

MOVE FUNCTION RANDOM (x) TO x.
MOVE FUNCTION CURRENT-DATE TO x.
MOVE FUNCTION COS(x) TO x.
MOVE FUNCTION MAX ( x y z ) TO x.

These DO, pass, but I'm afraid it's because they are reserved words:

MOVE FUNCTION WHEN-COMPILED TO x.
MOVE FUNCTION RANDOM TO x.

Newline in fixed length COBOL source files

Seen in a production source file : some lines contain newline characters (although normally these cannot exist in such sources, it seems this one was manually/hexadecimally edited).
This ruins the indexing, and results in missing PeriodSeparator errors on the first "half" of the line (the second "half" being after the newline character).

Unit Tests must be locale invariant

Whereas compiler should output message in the user's locale, we cannot allow unit tests to depend of their execution environment. In particular, current scanner tests compare numbers, separators and stuff against their string equivalents in the french locale.

All tests should run in an invariant culture, so their results are predictable.
Hint: add Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture; before any test runs.

Conditional Statement > Imperative Statement

p280:

A DELIMITED SCOPE statement uses an explicit scope terminator to turn a conditional statement into an imperative statement. The resulting imperative statement can then be nested. (...) Unless explicitly specified otherwise, a delimited scope statement can be specified wherever an imperative statement is allowed by the rules of the language.

Since 65d0242 however, conditional statements are allowed as imperative statements without restriction.
This should be fixed.

Identifiers

Because they are so widely used everywhere in the grammar, identifiers must be properly parsed.
These are not CodeElements, but nodes of our symbol tree that have a lot of other nodes (subscripts, reference modifiers, ...) under them.

Two-steps parsing: CodeElements > Instructions

Parsing is done in two steps:

  • Firstly, only unit CodeElements are recognized, thus the following are individually matched:
    • instruction start
    • condition clause
    • imperative instructions to be executed
    • instruction end
  • Secondly, unit CodeElements are grouped to build the whole COBOL instruction.

I have lost some of this separation of responsibilities during my work of the last few weeks.
Let's clean up things.

ParserTests failures

  • Check_EntryCodeElements continued failure is due to a "no viable alternative to EOF" error. This seems related to this issue; however we DO have cobolCodeElements: codeElement* EOF; as our start rule so I'm scratching my fluffy head, here.
  • TestParser.Check_ParserIntegration() fails because it tests type compatibility during a move statement. However, TypeChecker is commented out. Moreover, TypeChecker commented code makes intensive use of unimplemented classes and parameters, even taking into account the fact that I branched the current HEAD of TypeCobol.

Link Diagnostics to CodeElements

Diagnostics is currently a member of SyntaxDocument.
As errors are to be linked at least to problematic text lines, it should be a member of CodeElement.

ZERO handling

See ef784cb
The ZERO/ZEROS/ZEROES figurative constant must be properly handled in ADD statement.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.