GithubHelp home page GithubHelp logo

sassoftware / sas-airflow-provider Goto Github PK

View Code? Open in Web Editor NEW
18.0 11.0 14.0 69 KB

Apache Airflow Provider for creating tasks in Airflow to execute SAS Studio Flows and Jobs.

License: Apache License 2.0

Python 100.00%

sas-airflow-provider's Issues

Feature Request: Submit to SAS Viya Batch process

The SAS Viya Batch api behaves slightly differently from the SAS Job Execution service.
As a batch scheduler I would like to scheduler some jobs to use the sas batch api instead of Job Execution service.

Connection always failed to connect to Viya4 (with AzureAD) from Airflow

Hi experts,

May I know what's the steps to obtain the correct token and permission to access Viya (e.g., create compute session and then run code) from Airflow? The Viya is using Azure AD to login, while Airflow webserver use default (admin/admin).
I have tried many ways to create the Viya access_token , but when I start a DAG in airflow, there are always errors returned like this:

This is the defined connection:
image

Below are current steps I tried to create access_token, and the error messages:

  1. Sample 1:
    [ERROR MSG]
    _File "/home/airflow/.local/lib/python3.8/site-packages/sas_airflow_provider/util/util.py", line 186, in create_or_connect_to_session
    raise RuntimeError(f"Failed to create session: {response.text}")
    RuntimeError: Failed to create session: {"version":2,"httpStatusCode":500,"errorCode":30081,"message":"Invalid user: "scim5.idp"","details":["traceId: 077fd31fc37606ae","path: /launcher/processes","path: /compute/contexts/4d13c061-10a8-4419-8e5c-bf9017d97d97/sessions","correlator: e8c3e59a-1048-4ea7-a4a0-e96ea1ea3b25"]}

[Steps to get access_token]
BEARER_TOKEN=curl -sk -X POST "${INGRESS_URL}/SASLogon/oauth/clients/consul?callback=false&serviceId=scim5.idp" \ -H "X-Consul-Token: $CONSUL_TOKEN"| awk -F: '{print $2}'|awk -F\" '{print $2}'
echo "The registration access-token is: " ${BEARER_TOKEN}

curl -k -X POST "${INGRESS_URL}/SASLogon/oauth/clients"
-H "Content-Type: application/json"
-H "Authorization: Bearer $BEARER_TOKEN"
-d '{
"client_id": "scim5.idp",
"client_secret": "idpsecret",
"authorities": ["SCIM"],
"authorized_grant_types": ["client_credentials"],
"access_token_validity": 473040000
}'

ACCESS_TOKEN=curl -skX POST "${INGRESS_URL}/SASLogon/oauth/token" \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "grant_type=client_credentials" \ -u "scim5.idp:idpsecret"| awk -F: '{print $2}'|awk -F\" '{print $2}';
echo "The client access-token is: " ${ACCESS_TOKEN};

  1. Sample2:
    [ERROR MSG]
    [2024-05-07, 13:51:14 CST] {taskinstance.py:1937} ERROR - Task failed with exception
    Traceback (most recent call last):
    File "/home/airflow/.local/lib/python3.8/site-packages/sas_airflow_provider/operators/sas_studio.py", line 164, in execute
    compute_session = create_or_connect_to_session(self.connection,
    File "/home/airflow/.local/lib/python3.8/site-packages/sas_airflow_provider/util/util.py", line 166, in create_or_connect_to_session
    raise RuntimeError(f"Find context named {context_name} failed: {response.status_code}")
    RuntimeError: Find context named SAS Studio compute context failed: 403
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
    File "/home/airflow/.local/lib/python3.8/site-packages/sas_airflow_provider/operators/sas_studio.py", line 200, in execute
    raise AirflowException(f"SASStudioOperator error: {str(e)}")
    airflow.exceptions.AirflowException: SASStudioOperator error: Find context named SAS Studio compute context failed: 403

[Steps to get access_token]
BEARER_TOKEN=curl -sk -X POST "${INGRESS_URL}/SASLogon/oauth/clients/consul?callback=false&serviceId=scim4.idp" \ -H "X-Consul-Token: $CONSUL_TOKEN"| awk -F: '{print $2}'|awk -F\" '{print $2}'
echo "The registration access-token is: " ${BEARER_TOKEN}

curl -k -X POST "${INGRESS_URL}/SASLogon/oauth/clients"
-H "Content-Type: application/json"
-H "Authorization: Bearer $BEARER_TOKEN"
-d '{
"client_id": "scim4.idp",
"client_secret": "idpsecret",
"scope": ["openid"],
"authorized_grant_types": ["authorization_code","client_credentials","refresh_token"],
"redirect_uri": "urn:ietf:wg:oauth:2.0:oob",
"access_token_validity": 473040000,
"refresh_token_validity": 473040000
}'

authorization_code: https://xxx.xx.xx.com/SASLogon/oauth/authorize?client_id=scim4.idp&response_type=code

ACCESS_TOKEN=curl -k -X POST ${INGRESS_URL}/SASLogon/oauth/token \ -H "Accept: application/json" \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "grant_type=authorization_code&code=${authorization_code}" \ -u 'scim4.idp:idpsecret'
echo "The client access-token is: " ${ACCESS_TOKEN};

  1. Sample3:
    [ERROR MSG]
    RuntimeError: Failed to create session: {"version":2,"httpStatusCode":500,"errorCode":30175,"message":"Unable to generate a new OAuth token for current user","details":["traceId: 91de15f83df9aa89","path: /launcher/processes","path: /compute/contexts/4d13c061-10a8-4419-8e5c-bf9017d97d97/sessions","correlator: 8005f2e2-37b3-4e8b-b772-1fc59884c456"]}

[Steps to get access_token]
BEAREER_TOKEN=curl -sk -X POST "https://${INGRESS_URL}/SASLogon/oauth/token" \ -u "sas.cli:" \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "grant_type=password&username=sasboot&password=lnxsas"
echo "The registration access-token is: " ${BEARER_TOKEN}

curl -k -X POST "${INGRESS_URL}/SASLogon/oauth/clients"
-H "Content-Type: application/json"
-H "Authorization: Bearer $BEARER_TOKEN"
-d '{
"client_id": "scim7.idp",
"client_secret": "idpsecret",
"scope": ["openid"],
"authorized_grant_types": ["authorization_code"],
"redirect_uri": "urn:ietf:wg:oauth:2.0:oob",
"access_token_validity": 473040000,
"refresh_token_validity": 31622400
}'

authorization_code: https://xxx.xx.xx.com/SASLogon/oauth/authorize?client_id=scim7.idp&response_type=code

ID_TOKEN=curl -k -X POST ${INGRESS_URL}/SASLogon/oauth/token \ -H "Accept: application/json" \ -H "Content-Type: application/x-www-form-urlencoded" \ -d "grant_type=authorization_code&code=dWKOg2_ZVDTX_KDEN_O-eGBpxmbnpIDx" \ -u 'scim7.idp:idpsecret'
echo "The client access-token is: " ${ID_TOKEN};

Logging Not Full Downloaded When Log is Large

If you have a piece of code that takes less time to run than to download the log, the SASStudioOperator will not finish downloading the log. This is due to how the method _run_job_and_wait works. It will continue to loop until the status of the job changes out of running or pending (let's ignore the unknown logic for a moment). While it is looping it will call the method stream_log which will download the log for Airflow. However, if the job completes before the log can be downloaded, the log will be incomplete.

The SAS code to run to replicate this is rather simple, it spits out a ton of NOTEs to the log intentionally trying to see how far the log can be pushed before breaking. If the uncommented 500000 does not work to see the issue, just change it to a higher number and try again.

%macro test_log_lines();
    /* %do i = 1 %to 250000; */
    %do i = 1 %to 500000;
    /* %do i = 1 %to 1000000; */
    /* %do i = 1 %to 2000000; */
    /* %do i=1 %to 40000000; */
    /* %do i=1 %to 80000000; */
        %put NOTE: hi mom &i.!;

        /* data _null_;
            sleep(1);
        run; */
    %end;
%mend test_log_lines;

%test_log_lines;

The solution to this is to have a second check after the while loop on line 347 to check if the num_log_lines < job['logStatistics']['lineCount']. This logic should be a while loop replace the logic on line 379 because that will only grab the last 99999 lines of the log, not ALL the rest.

Lastly, here is a screen shot of the log from the code I ran above in Airflow UI. You can see how the log abruptly stops.
image

session.verify got overwritten by env var

Problem:

In our Airflow worker pod, we specify an env var REQUESTS_CA_BUNDLE. This leads to SAS Studio Flow operator failed to honor the extra field of Airflow Connection {"ssl_certificate_verification": false } to skip the cert verification.

As you can see, it confirmed TLS verification is turned off and even get the access token from SAS Logon Get oauth token. But it failed to talk to SAS Studio REST endpoint.

[2024-06-26, 17:08:58 UTC] {sas.py:52} INFO - TLS verification is turned off
[2024-06-26, 17:08:58 UTC] {sas.py:62} INFO - Creating session for connection named sas_default to host https://d21670.ingress-nginx.miadmin-01-m1.irm.sashq-d.openstack.sas.com/
[2024-06-26, 17:08:58 UTC] {sas.py:82} INFO - Get oauth token (see README if this crashes)
[2024-06-26, 17:08:59 UTC] {sas_studioflow.py:90} INFO - Generate code for Studio Flow: /Users/miadmin/TestFlow.flw
[2024-06-26, 17:08:59 UTC] {logging_mixin.py:188} INFO - Code Generation for Studio Flow without Compute session
[2024-06-26, 17:08:59 UTC] {taskinstance.py:441} ▼ Post task execution logs
[2024-06-26, 17:08:59 UTC] {taskinstance.py:2905} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 404, in _make_request
    self._validate_conn(conn)
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1060, in _validate_conn
    conn.connect()
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/connection.py", line 419, in connect
    self.sock = ssl_wrap_socket(
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib64/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib64/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/lib64/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/sas/.local/lib/python3.8/site-packages/requests/adapters.py", line 564, in send
    resp = conn.urlopen(
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 801, in urlopen
    retries = retries.increment(
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/util/retry.py", line 594, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='d21670.ingress-nginx.miadmin-01-m1.irm.sashq-d.openstack.sas.com', port=443): Max retries exceeded with url: /studioDevelopment/code (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/sas/.local/lib/python3.8/site-packages/sas_airflow_provider/operators/sas_studioflow.py", line 91, in execute
    code = _generate_flow_code(
  File "/home/sas/.local/lib/python3.8/site-packages/sas_airflow_provider/operators/sas_studioflow.py", line 199, in _generate_flow_code
    response = session.post(uri, json=req)
  File "/home/sas/.local/lib/python3.8/site-packages/sas_airflow_provider/hooks/sas.py", line 112, in <lambda>
    session.post = lambda *args, **kwargs: requests.Session.post(  # type: ignore
  File "/home/sas/.local/lib/python3.8/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/requests/adapters.py", line 595, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='d21670.ingress-nginx.miadmin-01-m1.irm.sashq-d.openstack.sas.com', port=443): Max retries exceeded with url: /studioDevelopment/code (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/sas/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 400, in wrapper
    return func(self, *args, **kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/sas_airflow_provider/operators/sas_studioflow.py", line 124, in execute
    raise AirflowException(f"SASStudioFlowOperator error: {str(e)}")
airflow.exceptions.AirflowException: SASStudioFlowOperator error: HTTPSConnectionPool(host='d21670.ingress-nginx.miadmin-01-m1.irm.sashq-d.openstack.sas.com', port=443): Max retries exceeded with url: /studioDevelopment/code (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)')))
[2024-06-26, 17:08:59 UTC] {taskinstance.py:1206} INFO - Marking task as FAILED. dag_id=MySASStudioFlowOperatorDAG, task_id=sas_studio_test_flow, run_id=manual__2024-06-26T17:08:55.695486+00:00, execution_date=20240626T170855, start_date=20240626T170858, end_date=20240626T170859
[2024-06-26, 17:08:59 UTC] {standard_task_runner.py:110} ERROR - Failed to execute job 6 for task sas_studio_test_flow (SASStudioFlowOperator error: HTTPSConnectionPool(host='d21670.ingress-nginx.miadmin-01-m1.irm.sashq-d.openstack.sas.com', port=443): Max retries exceeded with url: /studioDevelopment/code (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)'))); 14161)
[2024-06-26, 17:08:59 UTC] {local_task_job_runner.py:240} INFO - Task exited with return code 1
[2024-06-26, 17:08:59 UTC] {taskinstance.py:3498} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2024-06-26, 17:08:59 UTC] {local_task_job_runner.py:222} ▲▲▲ Log group end

Root Cause

In the 1st REST call, it explicitly passed the boolean value verify to the request.post function. It works as expected.

response = requests.post(
f"{self.host}/SASLogon/oauth/token",
data=payload,
verify=self.cert_verify,
headers=my_headers,
timeout=http_timeout
)

In the 2nd REST call, it didn't pass verify to the request.* function but rather Session.verify.

# set to false if using self-signed certs
session.verify = self.cert_verify
# prepend the root url for all operations on the session, so that consumers can just provide
# resource uri without the protocol and host
root_url = self.host
session.get = lambda *args, **kwargs: requests.Session.get( # type: ignore
session, urllib.parse.urljoin(root_url, args[0]), *args[1:], **kwargs
)
session.post = lambda *args, **kwargs: requests.Session.post( # type: ignore
session, urllib.parse.urljoin(root_url, args[0]), *args[1:], **kwargs
)
session.put = lambda *args, **kwargs: requests.Session.put( # type: ignore
session, urllib.parse.urljoin(root_url, args[0]), *args[1:], **kwargs
)
session.delete = lambda *args, **kwargs: requests.Session.delete( # type: ignore
session, urllib.parse.urljoin(root_url, args[0]), *args[1:], **kwargs
)
return session

There is a bug in Python request function for years. But still everyone is wasting hours for this overwritten issue. It is better that we fix it in our code or at least make two REST calls behave in a consistent way (either both fail or both succeed).

doesn't work for airflow from the AZ market place.

This doesn't appear to work on Airflow for Ubuntu 20.04.02 LTS from the azure market place. After installation, the sas provider does not appear in the connections drop-down list.

Are there any pre-reqs for this airflow provider i.e. a certain version of Python being used for example?
Any suggested troubleshooting notes to review for novices new to cloud/python/airflow?

Feature Request: Support Queues

When a site has SAS Workload Orchestrator licensed, a new feature is available called Queues. SAS Batch cli can submit jobs to specific queues like so:

sas-viya --profile myserver batch jobs submit-pgm --pgm-path ./test.sas  --context default --queue-name default

The cli uses the /batch interface path however.

Can the same queue name be optionally specified in the airflow provider so that a queue may be chosen by the code in the DAG's

SASComputeDeleteSession session_name parameter name

It looks like the parameter to specify the session name is "session_name" in the class header. However, it does not work and seems to be "compute_session_name".

To be aligned with SASComputeCreateSession, "session_name" should be preferable.

DAG fails with "The folder could not be found.","path: /studioDevelopment/code" after successful authentication to Viya environment

Viya version: Stable 2024.01
Release: 20240127.1706321865048
Airflow version: 2.8.1

Here's the log
SAS-airflow-DAG_error_09022024.txt

The flow is created and saved at a location (which is dags_folder in airflow.cfg) using SAS Studio.

Error snippet:
[2024-02-09T02:04:57.580-0500] {sas.py:82} INFO - Get oauth token (see README if this crashes)
[2024-02-09T02:04:58.013-0500] {logging_mixin.py:188} INFO - A new unnamed compute session will be created
[2024-02-09T02:05:17.564-0500] {logging_mixin.py:188} INFO - Compute session 9e39aa3d-1cfa-4d54-a1d7-5bfc8cb75d67-ses0000 created
[2024-02-09T02:05:17.564-0500] {sas_studio.py:183} INFO - Generate code for Studio object: /data/viyauser0/flows/simple-flow1.flw
[2024-02-09T02:05:17.564-0500] {sas_studio.py:296} INFO - Code generation for Studio object stored in Content
[2024-02-09T02:05:17.724-0500] {taskinstance.py:2698} ERROR - Task failed with exception
Traceback (most recent call last):
File "/airflow-venv/lib64/python3.11/site-packages/sas_airflow_provider/operators/sas_studio.py", line 184, in execute
res = self._generate_object_code()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/airflow-venv/lib64/python3.11/site-packages/sas_airflow_provider/operators/sas_studio.py", line 312, in _generate_object_code
raise RuntimeError(f"Code generation failed: {response.text}")
RuntimeError: Code generation failed: {"version":2,"httpStatusCode":500,"errorCode":124303,"message":"Error encountered during code generation.","details":["cause: The folder could not be found.","path: /studioDevelopment/code","correlator: a97ad249-13d7-4037-a06a-8116dda914c3"],"errors":[{"version":2,"httpStatusCode":404,"errorCode":5121,"message":"The folder could not be found."}]}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/airflow-venv/lib64/python3.11/site-packages/airflow/models/taskinstance.py", line 433, in _execute_task
result = execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/airflow-venv/lib64/python3.11/site-packages/sas_airflow_provider/operators/sas_studio.py", line 208, in execute
raise AirflowException(f"SASStudioOperator error: {str(e)}")
airflow.exceptions.AirflowException: SASStudioOperator error: Code generation failed: {"version":2,"httpStatusCode":500,"errorCode":124303,"message":"Error encountered during code generation.","details":["cause: The folder could not be found.","path: /studioDevelopment/code","correlator: a97ad249-13d7-4037-a06a-8116dda914c3"],"errors":[{"version":2,"httpStatusCode":404,"errorCode":5121,"message":"The folder could not be found."}]}
[2024-02-09T02:05:17.728-0500] {taskinstance.py:1138} INFO - Marking task as FAILED. dag_id=simple-flow1, task_id=simple-flow1.flw, execution_date=20240209T070454, start_date=20240209T070457, end_date=20240209T070517
[2024-02-09T02:05:17.728-0500] {sas_studio.py:249} INFO - Deleting session with id 9e39aa3d-1cfa-4d54-a1d7-5bfc8cb75d67-ses0000
[2024-02-09T02:05:18.104-0500] {sas_studio.py:252} INFO - Compute session succesfully deleted

allways_reuse_session=True does not reuse a session

I've been trying to use allways_reuse_session=True (shouldn't it be always_reuse_session?) for a simple DAG that has 3 SASStudioOperator tasks connected in serial.
And it appears that each of them creates a new SAS session (a new Compute pod). So not different than NOT setting allways_reuse_session (allways_reuse_session=False).
Moreover, when allways_reuse_session=True, the 3 compute sessions are not properly deleted.

Feature Request: Support SCIM/External Authentication via Refresh Tokens

Please add the ability to provide a refresh token for obtaining new authentication tokens when in a SCIM/External Authentication environment. When authentication is done this way the user can't obtain an authentication token via user/password. The only way to do it is to obtain an access code and provide it to the SASLogon service. At that point the user will receive both an authentication token that is currently valid as well as a refresh token. The refresh token is valid for 90 days by default. This refresh token can then be used for subsequent processing by obtaining an authentication token. Adding the ability to pass a refresh token to the provider would then allow it to generate a new authentication token for users in these sorts of environments.

Respect TLS security

There are several instances throughout the code where TLS security measures are bypassed. These should be rectified to improve the security posture of the project:

f"{self.host}/SASLogon/oauth/token", data=payload, verify=False, headers=my_headers

urllib3.disable_warnings(InsecureRequestWarning)

Passing the Airflow Context variables INTO the SAS execution.

Apologies if I've missed how this already works/how to do it.

When I look at the log of a Flow that has executed in a DAG, I can see the following environment variables:

AIRFLOW_CTX_DAG_OWNER=***
AIRFLOW_CTX_DAG_ID=ABCD
AIRFLOW_CTX_TASK_ID=ABCD_1.1.flw
AIRFLOW_CTX_EXECUTION_DATE=2023-05-18T00:49:12.460154+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=manual__2023-05-18T00:49:12.460154+00:00

in the log, just before the SAS aspects start appearing

I want to access these, in the SAS code that gets executed, with sysget, however, they don't seem to exist, presumable as they have passed down into the SAS execution context.

I know there's support for passing in arbitrary env vars, but I would expect these "Airflow Context" ones to be "standard" or by default.

This would allow the instance of the code to properly contextualize itself, knowing:

  • that's executing "in batch" under Airflow (as opposed to interactively, in Studio)
  • use the discrete value of AIRFLOW_CTX_DAG_RUN_ID to do things like update a log table of execution timings, or even establish a shared discrete "context" between all Flows in the DAG, allowing rich execution separation when there are multiple instances, and allow better integration with the Airflow API for things like DAG triggering and housekeeping and monitoring

It looks like the standard (?) Python operator supports this with a parameter:
provide_context=True.

Adding this to the SAS Operator would really open up some rich possibilities.

Feature Request: Templating

Airflow provides the XCom mechanism to communicate small amounts of data between tasks. In order to move data from one task to a Studio Flow or Job Execution, we could enable templating for the Operator parameters. For the SASJobExecutionOperator, the "parameters" parameter could be templated and for the SASStudioFlowOperator, the "env_vars" parameter could be templated.

This would enable sas-airflow-provider users to use templated values for those two parameters, meaning that data could be passed from one task into a sas-airflow-provider task, then into the Studio Flow or Job Execution via the "env_vars" or "parameters" parameter.

Error Handling when job state is "cancelled"

Hi All, We have stumbled across some SAS studio flows that have a built in logic to abort. However that means that the job_state is set to "canceled" rather than "failed". Perhaps we should add error handling that takes this job state into account. Right now if a flow ends with a cancelled state airflow will mark it as successful in the web interface.

A potential fix would be to add a new airflow exception when job_state == "canceled" (note that "cancelled" with double "l" wont be accepted as a valid job state"

image

error: HTTPSConnectionPool - Failed to establish a new connection: [Errno 110] Connection timed out

SAS Viya information -

Version:
Long-Term Support 2023.03
Release:
20230620.1687221111013

Currently trying to execute an example DAG provided in the git repository. Made sure that /Public/Airflow/demo_studio_flow_1.flw exists within the environment, yet am still unable to run the DAG successfully.

Here is the abridged log from the run attempt attached.

20230622-airflow-dag-failure.txt

I've run the command the Airflow Provider is trying to execute via Postman and can see the following -
URL - {{protocol}}://{{url}}/studioDevelopment/code
Body -
{
"reference": {
"type": "content",
"path": "/Public/Airflow/demo_studio_flow_1.flw",
"mediaType": "application/vnd.sas.dataflow"
},
"initCode": true,
"wrapperCode": false
}

And I get the appropriate response -

{
"code": ""
}

Any guidance on how to navigate this would be appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.