GithubHelp home page GithubHelp logo

sillsdev / machine Goto Github PK

View Code? Open in Web Editor NEW
22.0 13.0 15.0 16.57 MB

Machine is a natural language processing library for .NET that is focused on providing tools for processing resource-poor languages.

License: MIT License

C# 91.07% PLSQL 2.74% TeX 5.80% Shell 0.09% Python 0.15% PowerShell 0.02% CMake 0.04% C++ 0.05% C 0.03%
language-translation machine-translation natural-language-processing

machine's Introduction

codecov

Machine for .NET

Machine is a natural language processing library. It is specifically focused on providing tools and techniques that are useful for processing languages that are very resource-poor. The library is also useful as a foundation for building more advanced language processing techniques. The library currently only provides a basic set of algorithms, but the goal is to include many more in the future.

Features

Translation

Machine provides a set of translation engines. It currently includes a SMT engine based on a fork of Thot and a rule-based engine based on the HermitCrab morhphological parser.

Word Alignment

Machine provides implementations of many common statistical word alignment models, such as IBM models 1-4, HMM, and FastAlign. These models are implemented in the Thot library.

Morphology

Machine contains a rule-based morphological/phonological parser called HermitCrab.

Feature Structures

Machine provides a flexible implementation of feature structures with efficient unification, subsumption, and priority union operations. Feature values can be atomic symbols, strings, or variables.

Annotations

An annotation is a tagged portion of data with its associated metadata. The metadata for an annotation is represented as a feature structure, which is essentially a set of feature-value pairs. Annotations can also be hierarchical; an annotation can contain other annotations. Annotations are normally used on textual data, but Machine can support annotations on any type of data.

Patterns

Machine contains a regex-like pattern matching engine. Machine is different than most pattern matching engines, which specify patterns that match strings of characters. Instead, Machine can specify patterns that match annotations on data. An annotation describes the metadata for a part of the data. Data can be tagged in any way that is desired. For example, all the words in a document can be tagged with their part of speech. Because Machine works on metadata, instead of the underlying data, it provides a very powerful, flexible pattern matching capability that is difficult to duplicate with normal regular expressions. Machine compiles patterns in to a format that allows for efficient matching (in most cases, linear to the number of annotations on the input).

A pattern in Machine supports many of the features that normal regular expressions support, such as alternation, repetition, Kleene star, optionality, capturing groups, etc. It does not support backtracking. As mentioned earlier, the patterns are not matched against characters, but instead against feature structures, since this is how annotations are represented. Machine does not check for exact matches between feature structures, but uses an operation called unification. Unification is a way of combining two feature structures, but only if they are compatible. Two feature structures are not compatible, if they have contradictory values for the same feature. An annotation matches a feature structure constraint in a pattern if the feature structures can be unified. Machine patterns handle matching of hierarchical annotations by searching for matches in a depth-first manner.

Patterns are represented as finite state automata (FSA). FSAs provide a natural model for the type of regular languages that Machine patterns represent. In addition, FSAs can be determinized so that pattern matching can be performed efficiently.

Rules

Machine also provides a rules module, which can be used to specify rules for manipulating annotated data. Pattern rules provide a mechanism for modifying parts of data that match the specified pattern. Rule application behavior is specified as code. Pattern rules can be applied iteratively or simultaneously. Rules can be aggregated using rule batches and rule cascades. Rule batches can be used to apply a set of rules disjunctively. Rule cascades can be used to apply multiple rules in successive order.

Statistical Methods

Probability Distributions

Machine includes various methods for estimating probability distributions from observed data. The current discounting techniques include Witten-Bell, Simple Good-Turing, maximum likelihood, and Lidstone.

n-gram Model

Machine includes a generic n-gram model implementation. The n-gram model is smoothed using Modified Kneser-Ney smoothing.

Clustering

Machine provides implementations of various clustering algorithms. These include density-based algorithms, such as DBSCAN and OPTICS, and hierarchical algorithms, such as UPGMA and Neighbor-joining.

Sequence Alignment

Pairwise

Pairwise sequence alignment is implemented using a dynamic programming approach similar to most common implementations of the Levenshtein distance. It supports substitution, insertion, deletion, expansion, and compression. It also supports the following alignment modes: global, local, half-local, and semi-global.

Multiple

The implementation of multiple sequence alignment is based on the CLUSTAL W algorithm.

Stemming

Machine provides an unsupervised stemming algorithm specifically designed for resource-poor languages. The stemmer is trained using a list of words either derived from a corpus or a lexicon. The algorithm can also be used to identify possible affixes. It is based on the unsupervised stemming algorithm proposed in Harald Hammarström's doctoral dissertation.

Installation

Machine is available as a set of NuGet packages:

Machine is also available as a command-line tool that can be installed as a .NET tool.

dotnet tool install -g SIL.Machine.Tool

Tutorials

If you would like to find out more about how to use Machine, check out the tutorial Jupyter notebooks:

Development

CSharpier

All C# code should be formatted using CSharpier. The best way to enable support for CSharpier is to install the appropriate IDE extension and configure it to format on save.

Development locally

  • Install MongoDB 6.0 and MongoDBCompass and run it on localhost:27017
    • Create the following folders:
    • C:\var\lib\machine\data
    • C:\var\lib\machine\machine
  • set the following environment variables:
    • ASPNETCORE_ENVIRONMENT=Development
  • Open "Machine.sln" and debug the ApiServer
  • Now, you are running the complete environment where everything is being debugged and the mongodb is exposed.

Develop with serval

  • Install https://github.com/sillsdev/serval in an adjacent folder
  • Follow the instructions in serval for develoment
  • To debug machine and machine_job together, launch "DockerComb" in VSCode

machine's People

Contributors

andrewdt97 avatar ddaspit avatar enkidu93 avatar javamonkey79 avatar johnml1135 avatar mhosken avatar rmunn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machine's Issues

Corpus is not there after adding through API

You can verify this on the Swagger API:

  • post: /corpora (creates a new corpora)
  • get: /corpora (lists all available corpora)

The only problem is, the corpora just created doesn't show up on the list!

RefId's don't get populated for Paratext Projects

This is what is returned from pretranslations:

  {
    "textId": "3JN",
    "refs": [],
    "translation": "To Gaius, whom I love in the truthI, the elder, write to you:"
  },
  {
    "textId": "3JN",
    "refs": [],
    "translation": "My dear friend, I pray that you may be perfect in everything, healthy and sound in every way."
  },

The Refs should be populated with the verse reference.

NMT API MVP

The NMT is "primarily" implemented in machine.

  • Current state:
    • no translate sentence (storing and serving the pre-translations)
    • Need to add "give me a chapter" - this is all @ddaspit.
  • Get dev talking to real clearML server
  • Get qa talking to real clearML server
  • Get something building on the swagger API
  • Add some Cucumber tests for NMT (how can I make them small? Including translate verse - can we find something "deterministic enough"? - or to make sure it worked? A blu score?)
  • Test all endpoints for NMT - no train-segment, no word-graph, no translate sentence, only translate from verse reference
  • Document these limitations in the swagger API and the HTTP responses when querying
  • Test limitations with cucumber

S3 remove sync write (async only)

From @ddaspit's review:

Calling async methods from a synchronous method is not safe unless you are very careful. Since the S3 SDK doesn't provide synchronous implementations of their methods, I think it would be better to not provide an implementation for the Write method. I don't think we need it. We can simply throw a NotSupportedException.

public override void Write(byte[] buffer, int offset, int count)
{
using Stream inputStream = new MemoryStream(buffer, offset, count);
using var transferUtility = new TransferUtility(_client);
var uploadRequest = new TransferUtilityUploadRequest
{
BucketName = _bucketName,
InputStream = inputStream,
Key = _key,
PartSize = count
};
transferUtility.Upload(uploadRequest);
}

@ddaspit - is this called anywhere? It is used to Initialize a BufferedStream which has a sync write - we would just need to be careful to not use it - and to have the other in-memory and local drive implementations not include a sync write/read.

Throw error for unbuilt engine

This test fails:

  Failed CircuitousRouteGetWordGraphAsync [3 s]
  Error Message:
     Expected: <Serval.Client.ServalApiException>
  But was:  null

  Stack Trace:
     at Serval.E2ETests.ServalApiTests.CircuitousRouteGetWordGraphAsync() in /home/runner/work/serval/serval/tests/Serval.E2ETests/ServalApiTests.cs:line 155
   at NUnit.Framework.Internal.TaskAwaitAdapter.GenericAdapter`1.BlockUntilCompleted()
   at NUnit.Framework.Internal.MessagePumpStrategy.NoMessagePumpStrategy.WaitForCompletion(AwaitAdapter awaiter)
   at NUnit.Framework.Internal.AsyncToSyncAdapter.Await(Func`1 invoke)
   at NUnit.Framework.Internal.Commands.TestMethodCommand.RunTestMethod(TestExecutionContext context)
   at NUnit.Framework.Internal.Commands.TestMethodCommand.Execute(TestExecutionContext context)
   at NUnit.Framework.Internal.Commands.BeforeAndAfterTestCommand.<>c__DisplayClass1_0.<Execute>b__0()
   at NUnit.Framework.Internal.Commands.DelegatingTestCommand.RunTestMethodInThreadAbortSafeZone(TestExecutionContext context, Action action)

1)    at Serval.E2ETests.ServalApiTests.CircuitousRouteGetWordGraphAsync() in /home/runner/work/serval/serval/tests/Serval.E2ETests/ServalApiTests.cs:line 155
   at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)`

We need to return back the proper error code saying in effect "the engine needs to be built first." This check was in Serval but needs to be moved to machine.

Dynamic provisioning of Job server on AWS (faster build)

Enable machine service to provision k8s/aws dynamic resources for building a SMT engine. Realistically, even with 50 projects each building each week - if each project only takes 20 minutes to build, we have < 20 hours of utilization (~ 10%). It would be cheaper get dynamic resources both to have a faster build time and to save money.

This is mainly to have the builds be faster and concurrent. To have results in 15 minutes rather than 2 hours.

Database migration - TrainSize to TrainCorpusSize

Here is the error:

Fri, Jul 22 2022 4:18:22 pm | Connection id "0HMJC3RKQD719", Request id "0HMJC3RKQD719:00000002": An unhandled exception was thrown by the application.
Fri, Jul 22 2022 4:18:22 pm | System.FormatException: Element 'trainSize' does not match any field or property of class SIL.Machine.WebApi.Models.TranslationEngine.
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Bson.Serialization.BsonClassMapSerializer`1.DeserializeClass(BsonDeserializationContext context)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Bson.Serialization.BsonClassMapSerializer`1.Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Bson.Serialization.IBsonSerializerExtensions.Deserialize[TValue](IBsonSerializer`1 serializer, BsonDeserializationContext context)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Bson.Serialization.Serializers.EnumerableSerializerBase`2.Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Bson.Serialization.IBsonSerializerExtensions.Deserialize[TValue](IBsonSerializer`1 serializer, BsonDeserializationContext context)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.Core.Operations.AggregateOperation`1.CursorDeserializer.Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Bson.Serialization.IBsonSerializerExtensions.Deserialize[TValue](IBsonSerializer`1 serializer, BsonDeserializationContext context)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.Core.Operations.AggregateOperation`1.AggregateResultDeserializer.Deserialize(BsonDeserializationContext context, BsonDeserializationArgs args)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Bson.Serialization.IBsonSerializerExtensions.Deserialize[TValue](IBsonSerializer`1 serializer, BsonDeserializationContext context)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ProcessResponse(ConnectionId connectionId, CommandMessage responseMessage)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1.ExecuteAsync(IConnection connection, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.Core.Servers.Server.ServerChannel.ExecuteProtocolAsync[TResult](IWireProtocol`1 protocol, ICoreSession session, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.Core.Operations.RetryableReadOperationExecutor.ExecuteAsync[TResult](IRetryableReadOperation`1 operation, RetryableReadContext context, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.Core.Operations.ReadCommandOperation`1.ExecuteAsync(RetryableReadContext context, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.Core.Operations.AggregateOperation`1.ExecuteAsync(RetryableReadContext context, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.Core.Operations.AggregateOperation`1.ExecuteAsync(IReadBinding binding, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.OperationExecutor.ExecuteReadOperationAsync[TResult](IReadBinding binding, IReadOperation`1 operation, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.MongoCollectionImpl`1.ExecuteReadOperationAsync[TResult](IClientSessionHandle session, IReadOperation`1 operation, ReadPreference readPreference, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.MongoCollectionImpl`1.AggregateAsync[TResult](IClientSessionHandle session, PipelineDefinition`2 pipeline, AggregateOptions options, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at MongoDB.Driver.IAsyncCursorSourceExtensions.ToListAsync[TDocument](IAsyncCursorSource`1 source, CancellationToken cancellationToken)
Fri, Jul 22 2022 4:18:22 pm | at SIL.Machine.WebApi.DataAccess.MongoRepository`1.GetAllAsync(Expression`1 filter, CancellationToken cancellationToken) in /app/src/SIL.Machine.WebApi/DataAccess/MongoRepository.cs:line 59
Fri, Jul 22 2022 4:18:22 pm | at SIL.Machine.WebApi.Services.TranslationEngineService.GetAllAsync(String owner) in /app/src/SIL.Machine.WebApi/Services/TranslationEngineService.cs:line 94
Fri, Jul 22 2022 4:18:22 pm | at SIL.Machine.WebApi.Controllers.TranslationEnginesController.GetAllAsync() in /app/src/SIL.Machine.WebApi/Controllers/TranslationEnginesController.cs:line 40
Fri, Jul 22 2022 4:18:22 pm | at lambda_method612(Closure , Object )
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Awaited\|12_0(ControllerActionInvoker invoker, ValueTask`1 actionResultValueTask)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited\|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeInnerFilterAsync>g__Awaited\|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeNextExceptionFilterAsync>g__Awaited\|26_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow(ExceptionContextSealed context)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited\|20_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Awaited\|17_0(ResourceInvoker invoker, Task task, IDisposable scope)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Awaited\|17_0(ResourceInvoker invoker, Task task, IDisposable scope)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask\|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Authorization.Policy.AuthorizationMiddlewareResultHandler.HandleAsync(RequestDelegate next, HttpContext context, AuthorizationPolicy policy, PolicyAuthorizationResult authorizeResult)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
Fri, Jul 22 2022 4:18:22 pm | at NSwag.AspNetCore.Middlewares.SwaggerUiIndexMiddleware.Invoke(HttpContext context)
Fri, Jul 22 2022 4:18:22 pm | at NSwag.AspNetCore.Middlewares.RedirectToIndexMiddleware.Invoke(HttpContext context)
Fri, Jul 22 2022 4:18:22 pm | at NSwag.AspNetCore.Middlewares.OpenApiDocumentMiddleware.Invoke(HttpContext context)
Fri, Jul 22 2022 4:18:22 pm | at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext]

This was caused when TrainSize changed to TrainCorpusSize. This will crash the server when reading old data. We should be able to deal gracefully with old data - and give more intelligent errors than HTTP 500 - Strict-Transport-Security.

Refine S3 bucket usage

A collection of comments

  • When the stream buffer is opened, it uses 100MB. 3 are opened at the same time per ClearML job while uploading files. This should be ok - but with 8 going at the same time - that is 2.4 GB - a bit much. Can we:
    • Have "WriteFilesAsync" be only opened once at time?
    • Should we use the low level implementation?
    • Throw more memory at it - now at 2.5 GB.

No ClearML agents available - add queue to message

Here is the message recieved:

Value: [ var='B0' metric='{app="machine-engine", container="machine-engine", container_id="54811b781af216d756574e2155336981ab41c2eeb45118369d8673140fff3da5", host="kb3", log="\x1b[41m\x1b[30mfail\x1b[39m\x1b[22m\x1b[49m: Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService[103]
 Health check ClearML Health Check with status Unhealthy completed after 127.112ms with message 'No ClearML agents are available'
", namespace="serval", pod="machine-engine-7b8558b94c-gpcl9", pod_id="0fbfc55d-e88b-4222-83fa-1eabbada9f85", pod_template_hash="7b8558b94c",

It would have been nice to have the queue that had no agents on it printed here - as it would help with diagnosis.

Machine S3 health - warn on 1 missing, fail on >86 seconds down?

S3 buckets only claim 99.9% uptime - which is down around 86 seconds down per day. Now, a few things that I can foresee:

  1. All S3 communication is long-running and can wait for a few minutes without hurting anything
  2. We should be able to live with down time of at least 86 contiguous seconds (0.1% of the day) without erroring out any jobs - gracefully retry for at least that long (90 seconds? 2 minutes? 5 minutes?)
  3. While we can start giving "warning" logs and health check failures the S3 bucket is down for a few seconds, we should only start registering "failures" in the log (which sends alerts to Google chat) if it is down for over 86 seconds (or 90 seconds? or 2 minutes? or 5 minutes?) to not spam Google chat.

Add versioning to docker files

It was half there - but didn't work.
Docker image tags should follow nuget.
Right now, use a dev tag and just do it on release (not master).

pod errors after helm install

When running get pods, after setting everything up I get the following errors:

PS C:\WINDOWS\system32> kubectl get pods
NAME                                  READY   STATUS             RESTARTS         AGE
machine-job-server-7d895b8874-5k7jg   0/1     CrashLoopBackOff   25 (4m37s ago)   108m
machine-server-78fd7665cb-5xtm5       0/1     CrashLoopBackOff   25 (4m48s ago)   108m
mongo-5cdbb46856-d7d6d                1/1     Running            0                108m
PS C:\WINDOWS\system32> kubectl describe pod machine-job-server-7d895b8874-5k7jg
Name:         machine-job-server-7d895b8874-5k7jg
Namespace:    default
Priority:     0
Node:         minikube/192.168.49.2
Start Time:   Mon, 16 May 2022 17:08:25 -0700
Labels:       io.kompose.service=machine-job-server
              pod-template-hash=7d895b8874
Annotations:  kompose.cmd: C:\Users\johnm\Documents\repos\machine\docker\development\kompose.exe convert -c --volumes hostPath
              kompose.version: 1.26.0 (40646f47)
Status:       Running
IP:           172.17.0.6
IPs:
  IP:           172.17.0.6
Controlled By:  ReplicaSet/machine-job-server-7d895b8874
Containers:
  machine-job-server:
    Container ID:  docker://836a4966152b0bd069e7d8e719720d974220d2a76c2aff9c01f5ff887d1d222f
    Image:         ghcr.io/sillsdev/machine:latest
    Image ID:      docker-pullable://ghcr.io/sillsdev/machine@sha256:49f92a2bc838a9e0e9097e147fd584ec8dc8d81cfdd675a824e40b6395d4c53e
    Port:          <none>
    Host Port:     <none>
    Command:
      dotnet
      /app/SIL.Machine.WebApi.JobServer.dll
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      error while creating mount source path '/host/engines': mkdir /host: file exists

Word alignment and word glossing generation API service

The guts of this is already there in Machine, it just needs an API in front of it.
It may be very helpful for NMT translations.
Functionality:
Machine Alignment service:

  • Create an alignment using a specific backend tech (FastAlign, HMM, etc.)
  • Retrieve the alignment
    • would we save the model?

Glossing Corpus Type and auto-creation service

  • Create glossing - auto-create a corpus of glosses/keyterms
  • Allow user to update the keyterms (corpus)
  • Update the glosses created, except if they have been manually updated? What mechanism could be used for this?

Hangfire server crashed - and didn't restart

Here is the error:

�[41m�[1m�[37mcrit�[39m�[22m�[49m: Microsoft.Extensions.Hosting.Internal.Host[10]
      The HostOptions.BackgroundServiceExceptionBehavior is configured to StopHost. A BackgroundService has thrown an unhandled exception, and the IHost instance is stopping. To avoid this behavior, configure this to Ignore; however the BackgroundService will not be restarted.
      System.Net.Http.HttpRequestException: Resource temporarily unavailable (api.sil.hosted.allegro.ai:443)
       ---> System.Net.Sockets.SocketException (11): Resource temporarily unavailable
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
         at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|277_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
         --- End of inner exception stack trace ---
         at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(HttpRequestMessage request)
         at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
         at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at Microsoft.Extensions.Http.Logging.LoggingHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
         at Microsoft.Extensions.Http.Logging.LoggingScopeHttpMessageHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
         at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
         at SIL.Machine.AspNetCore.Services.ClearMLAuthenticationService.AuthorizeAsync(CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/ClearMLAuthenticationService.cs:line 81
         at SIL.Machine.AspNetCore.Services.ClearMLAuthenticationService.ExecuteAsync(CancellationToken stoppingToken) in /app/src/SIL.Machine.AspNetCore/Services/ClearMLAuthenticationService.cs:line 47
         at Microsoft.Extensions.Hosting.Internal.Host.TryExecuteBackgroundServiceAsync(BackgroundService backgroundService)

And then, there was a steady stream of failing health checks:

	
{
  "log": "\u001b[41m\u001b[30mfail\u001b[39m\u001b[22m\u001b[49m: Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService[103]\n      Health check Hangfire with status Unhealthy completed after 0.9272ms with message 'There are no Hangfire servers running.'\n",
  "stream": "stdout",
  "time": "2023-08-03T01:18:43.492804482Z",
  "type": "fail",
  "source": "Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService"
}

We should:

  1. check the underlying issue
  2. Make sure that the service auto-restarts
  3. Setup an alert on the failing health check

Failure - paratext backup settings file

From this error:

Value: [ var='B0' metric='{app="machine-job", container="machine-job", container_id="46f655e7e06d58a24c56db4bd52291042a698781f49ea1ce236e8728381d3bcd", host="kb3", log="\x1b[41m\x1b[30mfail\x1b[39m\x1b[22m\x1b[49m: SIL.Machine.AspNetCore.Services.ClearMLNmtEngineBuildJob[0]
      Build faulted (64f9faa3304d4eb4f261a899) because of exception ArgumentException:The project backup does not contain a settings file. (Parameter 'fileName').
      System.ArgumentException: The project backup does not contain a settings file. (Parameter 'fileName')
         at SIL.Machine.Corpora.ParatextBackupTextCorpus..ctor(String fileName, Boolean includeMarkers) in /app/src/SIL.Machine/Corpora/ParatextBackupTextCorpus.cs:line 25
         at SIL.Machine.AspNetCore.Services.CorpusService.CreateTextCorpus(IReadOnlyList`1 files) in /app/src/SIL.Machine.AspNetCore/Services/CorpusService.cs:line 8
         at SIL.Machine.AspNetCore.Services.ClearMLNmtEngineBuildJob.<>c__DisplayClass9_0.<g__ProcessRowsAsync|0>d.MoveNext() in /app/src/SIL.Machine.AspNetCore/Services/ClearMLNmtEngineBuildJob.cs:line 261
      --- End of stack trace from previous location ---
         at SIL.Machine.AspNetCore.Services.ClearMLNmtEngineBuildJob.<>c__DisplayClass9_0.

What is causing this?

Delete failing with files in directory

{"log":"\u001B[41m\u001B[30mfail\u001B[39m\u001B[22m\u001B[49m: Grpc.AspNetCore.Server.ServerCallHandler[6]
      Error when executing service method 'Delete'.
      System.IO.IOException: Directory not empty : '/var/lib/machine/engines/64e6752b6f044714a98cf08f/tm'
         at System.IO.FileSystem.RemoveDirectoryInternal(DirectoryInfo directory, Boolean recursive, Boolean throwOnTopLevelDirectoryNotFound)
         at System.IO.FileSystem.RemoveDirectory(String fullPath, Boolean recursive)
         at System.IO.Directory.Delete(String path, Boolean recursive)
         at SIL.Machine.AspNetCore.Services.ThotSmtModelFactory.Cleanup(String engineId) in /app/src/SIL.Machine.AspNetCore/Services/ThotSmtModelFactory.cs:line 71
         at SIL.Machine.AspNetCore.Services.SmtTransferEngineState.DeleteDataAsync() in /app/src/SIL.Machine.AspNetCore/Services/SmtTransferEngineState.cs:line 69
         at SIL.Machine.AspNetCore.Services.SmtTransferEngineService.DeleteAsync(String engineId, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/SmtTransferEngineService.cs:line 60
         at SIL.Machine.AspNetCore.Services.SmtTransferEngineService.DeleteAsync(String engineId, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/SmtTransferEngineService.cs:line 60
         at SIL.Machine.AspNetCore.Services.ServalTranslationEngineServiceV1.Delete(DeleteRequest request, ServerCallContext context) in /app/src/SIL.Machine.AspNetCore/Services/ServalTranslationEngineServiceV1.cs:line 33
         at Grpc.Shared.Server.UnaryServerMethodInvoker`3.ResolvedInterceptorInvoker(TRequest resolvedRequest, ServerCallContext resolvedContext)
         at Grpc.Shared.Server.UnaryServerMethodInvoker`3.ResolvedInterceptorInvoker(TRequest resolvedRequest, ServerCallContext resolvedContext)
         at SIL.Machine.AspNetCore.Services.UnimplementedInterceptor.UnaryServerHandler[TRequest,TResponse](TRequest request, ServerCallContext context, UnaryServerMethod`2 continuation) in /app/src/SIL.Machine.AspNetCore/Services/UnimplementedInterceptor.cs:line 21
         at Grpc.Shared.Server.InterceptorPipelineBuilder`2.<>c__DisplayClass5_0.<<UnaryPipeline>b__1>d.MoveNext()
      --- End of stack trace from previous location ---
         at Grpc.Shared.Server.InterceptorPipelineBuilder`2.<>c__DisplayClass5_0.<<UnaryPipeline>b__1>d.MoveNext()
      --- End of stack trace from previous location ---
         at Grpc.AspNetCore.Server.Internal.CallHandlers.UnaryServerCallHandler`3.HandleCallAsyncCore(HttpContext httpContext, HttpContextServerCallContext serverCallContext)
         at Grpc.AspNetCore.Server.Internal.CallHandlers.ServerCallHandlerBase`3.<HandleCallAsync>g__AwaitHandleCall\|8_0(HttpContextServerCallContext serverCallContext, Method`2 method, Task handleCall)
","stream":"stdout","time":"2023-08-23T22:30:47.507725379Z","type":"fail","source":"Grpc.AspNetCore.Server.ServerCallHandler"}
--

Likely a flag is not being passed. Needs more investigation.

Text corpus accepting unique identifiers (verse references)

If in the text corpus files, there is a \t (tab) character, everything before the character will be understood to be the unique identifier of the sentence (whitespace trimmed). This would be such as:

GEN 1:1<\t>In the beginning ...
GEN 1:2<\t>And the earth was ...

This allows for keyterms/glosses functionality in SMT and/or NMT. Also, it allows small updates to be sent to the API backend without having to create, zip, and send the whole Bible as a paratext project.

Error copying to stream

When making a job the first time, there was an error code in machine.job:

�[41m�[30mfail�[39m�[22m�[49m: Hangfire.AutomaticRetryAttribute[0]
      Failed to process the job '64814c54761d983ab9eaeeb2': an exception occurred.
      System.Net.Http.HttpRequestException: Error while copying content to a stream.
       ---> System.IO.IOException: Unable to write data to the transport connection: Broken pipe.
       ---> System.Net.Sockets.SocketException (32): Broken pipe
         --- End of inner exception stack trace ---
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
         at System.Net.Security.SslStream.<WriteSingleChunk>g__CompleteWriteAsync|182_1[TIOAdapter](ValueTask writeTask, Byte[] bufferToReturn)
         at System.Net.Security.SslStream.WriteAsyncChunked[TIOAdapter](TIOAdapter writeAdapter, ReadOnlyMemory`1 buffer)
         at System.Net.Security.SslStream.WriteAsyncInternal[TIOAdapter](TIOAdapter writeAdapter, ReadOnlyMemory`1 buffer)
         at System.Net.Http.HttpConnection.WriteAsync(ReadOnlyMemory`1 source, Boolean async)
         at System.Net.Http.HttpContent.<CopyToAsync>g__WaitAsync|56_0(ValueTask copyTask)
         --- End of inner exception stack trace ---
         at System.Net.Http.HttpContent.<CopyToAsync>g__WaitAsync|56_0(ValueTask copyTask)
         at System.Net.Http.HttpConnection.SendRequestContentAsync(HttpRequestMessage request, HttpContentWriteStream stream, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
         at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at SIL.Machine.AspNetCore.Services.S3AuthHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/S3AuthHandler.cs:line 36
         at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
         at SIL.Machine.AspNetCore.Services.S3FileStorage.UploadPartAsync(String key, String uploadId, Int32 partNumber, Byte[] buffer, Int32 count) in /app/src/SIL.Machine.AspNetCore/Services/S3FileStorage.cs:line 206
         at SIL.Machine.AspNetCore.Services.S3WriteStream.WriteAsync(Byte[] buffer, Int32 offset, Int32 count, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/S3WriteStream.cs:line 53
         at System.IO.BufferedStream.WriteToUnderlyingStreamAsync(ReadOnlyMemory`1 buffer, CancellationToken cancellationToken, Task semaphoreLockTask)
         at System.Text.Json.JsonSerializer.WriteStreamAsync[TValue](Stream utf8Json, TValue value, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken)
         at System.Text.Json.JsonSerializer.WriteStreamAsync[TValue](Stream utf8Json, TValue value, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken)
         at System.Text.Json.JsonSerializer.WriteStreamAsync[TValue](Stream utf8Json, TValue value, JsonTypeInfo jsonTypeInfo, CancellationToken cancellationToken)
         at SIL.Machine.AspNetCore.Services.ClearMLNmtEngineBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/ClearMLNmtEngineBuildJob.cs:line 293
         at SIL.Machine.AspNetCore.Services.ClearMLNmtEngineBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/ClearMLNmtEngineBuildJob.cs:line 299
         at SIL.Machine.AspNetCore.Services.ClearMLNmtEngineBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/ClearMLNmtEngineBuildJob.cs:line 299
         at SIL.Machine.AspNetCore.Services.ClearMLNmtEngineBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/ClearMLNmtEngineBuildJob.cs:line 299
         at SIL.Machine.AspNetCore.Services.ClearMLNmtEngineBuildJob.RunAsync(String engineId, String buildId, IReadOnlyList`1 corpora, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/ClearMLNmtEngineBuildJob.cs:line 58
         at SIL.Machine.AspNetCore.Services.ClearMLNmtEngineBuildJob.RunAsync(String engineId, String buildId, IReadOnlyList`1 corpora, CancellationToken cancellationToken) in /app/src/SIL.Machine.AspNetCore/Services/ClearMLNmtEngineBuildJob.cs:line 234
         at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

Let's fix this.

Train corpus Pair

I am having some issues with adding a train pair segment, namely that "something" happens to the alignment probabilities, but not what I would expect. Here is the test case that is failing:
Scenario: Add training segment
Given a new SMT engine for John from es to en
When a new text corpora named C1 for John
And 1JN.txt, 2JN.txt, 3JN.txt are added to corpora C1 in es and en
And the engine is built for John
And the translation for John for "ungidos mundo" is "ungidos world"
And a translation for John is added with "unction world" for "ungidos mundo"
Then the translation for John for "ungidos mundo" should be "unction world"
The final translation is "ungidos world" again, even though the translation pair was added.

Needs Version Numbers

in SIL.Machine.1.0.0-build00034, the AssemblyFileVersion for SIL.Collections.dll and SIL.Machine.dll are both 1.0.0.0 In future builds, these numbers should be updated with each build (e.g. 1.0.0.35) to allow these DLL's to be updated by Windows patch installers (such as we are using for FieldWorks)

Load testing SMT and NMT in preparation for 50 projects

Here are some ways to increase the number of simultaneous users on Machine:

  • Throw more memory at Machine? How much do we have? 32GB? How long will this last for? Assuming 300MB/project, we could support 100 simultaneous projects...
  • If we can't get it, then C# can look at the amount of memory being used and if it is too high, offload the oldest model (last used longest ago).
  • If that is too slow, we can then look at horizontal scaling...
  • Pull the machine inferencing to a different microservice?

Cancellation token - inconsistent SMT training state

As per damien:

cancellationToken.ThrowIfCancellationRequested();

Cancellation tokens should not be passed to any calls after this line. It can result in an inconsistent model state.

@ddaspit can you clarify? If we don't throw in the main for loop (below), how will we cancel in the middle of a job?

foreach (TrainSegmentPair segmentPair in segmentPairs)
{
await smtModel.TrainSegmentAsync(
segmentPair.Source,
segmentPair.Target,
cancellationToken: cancellationToken
);
cancellationToken.ThrowIfCancellationRequested();

New engine type - NmtSmall

New engine type for doing an Nmt model, but the absolute smallest one we can find - that trains in < 1 hour.

Add support for running NMT training stage on Hangfire

Currently, ClearML is required to train a NMT engine. If we add a Hangfire build job that executes the training stage of the NMT build, then an entire NMT build can be executed locally for development and testing purposes.

S3 hosted machine and ClearML integration

This is an interesting endeavor - we want to move data from the hosted Machine API to the ClearML server. S3 is the chosen solution for this and it has a few parts:

  • Use the aqua-ml-data s3 bucket
  • Use these naming conventions:
    • /machine-qa
    • /machine-production
    • /machine-production/parents/parent1
    • /machine-production/jobs/<build_id>
  • Use environment variables to set the path. Use secrets (ClearML and Rancher) for S3 bucket secrets
    The work that needs to be done:
  • Put at least one parent on he s3 bucket
  • Update machine.py to to use s3 bucket data for an experiment just like silnlp
  • Update machine to understand s3 bucket data and push the right (preprocessed) text files to the correct job, and read the correct "back translation"

S3 bucket - lockdown when creds are exposed

If S3 cred get exposed, there is a partial lock-down placed on the account, preventing deletion of data. This caused our system to crash. We should not fail a build if we cannot delete the files at the end - log the error, but not fail. This will prevent a catastrophic downtime for Serval if it happens again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.