GithubHelp home page GithubHelp logo

meltanolabs / tap-facebook Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 7.0 590 KB

Singer tap for extracting data from the Facebook Marketing API

Home Page: https://pypi.org/project/meltano-tap-facebook/

License: Other

Python 100.00%
facebook-marketing-api meltano singer-sdk singer-tap

tap-facebook's Introduction

tap-facebook

Singer tap for extracting data from the Facebook Marketing API.

Built with the Meltano Singer SDK.

Capabilities

  • catalog
  • state
  • discover
  • about
  • stream-maps
  • schema-flattening

Settings

Setting Required Default Description
access_token True None The token to authenticate against the API service
api_version False v16.0 The API version to request data from.
account_id True None Your Facebook Account ID.
start_date False None The earliest record date to sync
end_date False None The latest record date to sync
stream_maps False None Config object for stream maps capability. For more information check out Stream Maps.
stream_map_config False None User-defined config values to be used within map expressions.
flattening_enabled False None 'True' to enable schema flattening and automatically expand nested properties.
flattening_max_depth False None The max depth to flatten schemas.

A full list of supported settings and capabilities is available by running: tap-facebook --about

Installation

pipx install git+https://github.com/MeltanoLabs/tap-facebook.git

Configuration

Meltano Variables

The following config values need to be set in order to use with Meltano. These can be set in meltano.yml, via meltano config tap-facebook set --interactive, or via the env var mappings shown above.

  • access_token: access token from TAP_FACEBOOK_ACCESS_TOKEN variable
  • start_date: start date
  • end_date: end_date
  • account_id: account ID from TAP_FACEBOOK_ACCOUNT_ID variable
  • api_version: api version
tap-facebook --about

Elastic License 2.0

The licensor grants you a non-exclusive, royalty-free, worldwide, non-sublicensable, non-transferable license to use, copy, distribute, make available, and prepare derivative works of the software.

Attribution Window

Attribution Window is time period during which conversions might be credited to ads, we can have this time period between 1 day to 7 days for clicks and views

  • action_attribution_windows: We can add these variable to params, it will have a list type value which takes in 1d-7d clicks and 1d-7d views values. We have added this variable in get_url_params function of ads insights stream

Authentication

A Facebook access token is required to make API requests. (See Facebook API docs for more info)

Usage

API Limitation - Rate Limits

Hitting the rate limit for the Facebook API while making requests will return the following error:

400 Client Error: b'{"error":{"message":"(#80004) There have been too many calls to this ad-account. Wait a bit and try again

This error is handled using the Backoff Library, and the program will cease for a random amount of time before attempting to call the API again

Executing the Tap Directly

tap-facebook --version
tap-facebook --help
tap-facebook --config CONFIG --discover > ./catalog.json

Contributing

This project uses parent-child streams. Learn more about them here.

Initialize your Development Environment

pipx install poetry
poetry install

Create and Run Tests

Create tests within the tap_facebook/tests subfolder and then run:

poetry run pytest

You can also test the tap-facebook CLI interface directly using poetry run:

poetry run tap-facebook --help

Testing with Meltano

Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.

Your project comes with a custom meltano.yml project file already created. Open the meltano.yml and follow any "TODO" items listed in the file.

Next, install Meltano (if you haven't already) and any needed plugins:

# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-facebook
meltano install

Now you can test and orchestrate using Meltano:

# Test invocation:
meltano invoke tap-facebook --version
# OR run a test `elt` pipeline:
meltano elt tap-facebook target-jsonl

SDK Dev Guide

See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.

tap-facebook's People

Contributors

davert0 avatar dependabot[bot] avatar dhrruvrm avatar edgarrmondragon avatar eryanrm avatar monikarm avatar neilgorman104 avatar pnadolny13 avatar pre-commit-ci[bot] avatar

Stargazers

 avatar

Watchers

 avatar  avatar

tap-facebook's Issues

MeltanoLabs/tap-facebook vs. singer-io/tap-facebook

I was wondering if, in the README.md, an outline could be provided of the goals/purpose of this (relatively new) MelantoLabs/tap-facebook Tap, in comparison/contrast to singer-io/tap-facebook. The latter has been around for much longer, but at a high level it seems they both aim to produce streams of Ads, Ad Insights, and other Facebook Ad data. Are there some particular use cases where one is preferable over the other? Is there a reason a decision was made to start fresh with MeltanoLabs/tap-facebook rather than attempt to iterate/tweak the singer-io/tap-facebook?

feat: Add more ad_insights stream(s)

One of the main uses for this tap is to extract the insights reports, other variants implement many default report variations for this already.

Failed Validating Type For `account_id` - Not of type 'integer', 'null'

Hello! I'm trying to use this tap for the first time, and hitting an error when trying to run a complete workflow that validates the schema with meltano run tap-facebook target-jsonl.

Note that for privacy, I've changed my actual FB account id in the snippet below to 1234567890, but otherwise left the log intact:

The specific error:

jsonschema.exceptions.ValidationError: '1234567890' is not of type 'integer', 'null' cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl

My meltano.yml:

version: 1
default_environment: dev
project_id: <redacted>
environments:
- name: dev
- name: staging
- name: prod
plugins:
  extractors:
  - name: tap-facebook
    variant: meltanolabs
    pip_url: git+https://github.com/MeltanoLabs/tap-facebook.git
    config:
      start_date: '2023-01-01T00:00:00Z'
      end_date: '2023-02-01T00:00:00Z'
      api_version: v16.0
    select:
    - ads.*
  loaders:
  - name: target-jsonl
    variant: andyh1203
    pip_url: target-jsonl

The full log:

2023-05-16T22:14:29.530100Z [warning  ] No state was found, complete import.
2023-05-16T22:14:30.052435Z [info     ] 2023-05-16 18:14:30,052 | INFO     | tap-facebook         | Skipping deselected stream 'adaccounts'. cmd_type=elb consumer=False name=tap-facebook producer=True stdio=stderr string_id=tap-facebook
2023-05-16T22:14:30.052694Z [info     ] 2023-05-16 18:14:30,052 | INFO     | tap-facebook         | Skipping deselected stream 'adimages'. cmd_type=elb consumer=False name=tap-facebook producer=True stdio=stderr string_id=tap-facebook
2023-05-16T22:14:30.052765Z [info     ] 2023-05-16 18:14:30,052 | INFO     | tap-facebook         | Skipping deselected stream 'adlabels'. cmd_type=elb consumer=False name=tap-facebook producer=True stdio=stderr string_id=tap-facebook
2023-05-16T22:14:30.052892Z [info     ] 2023-05-16 18:14:30,052 | INFO     | tap-facebook         | Beginning incremental sync of 'ads'... cmd_type=elb consumer=False name=tap-facebook producer=True stdio=stderr string_id=tap-facebook
2023-05-16T22:14:30.052954Z [info     ] 2023-05-16 18:14:30,052 | INFO     | tap-facebook         | Tap has custom mapper. Using 1 provided map(s). cmd_type=elb consumer=False name=tap-facebook producer=True stdio=stderr string_id=tap-facebook
2023-05-16T22:14:31.372900Z [info     ] 2023-05-16 18:14:31,371 | INFO     | singer_sdk.metrics   | METRIC: {"type": "timer", "metric": "http_request_duration", "value": 1.311472, "tags": {"stream": "ads", "endpoint": "/ads?fields=['id', 'account_id', 'adset_id', 'campaign_id', 'bid_type', 'bid_info', 'status', 'updated_time', 'created_time', 'name', 'effective_status', 'last_updated_by_app_id', 'source_ad_id', 'creative', 'tracking_specs', 'conversion_specs', 'recommendations', 'configured_status', 'conversion_domain', 'bid_amount']", "http_status_code": 200, "status": "succeeded"}} cmd_type=elb consumer=False name=tap-facebook producer=True stdio=stderr string_id=tap-facebook
2023-05-16T22:14:31.402002Z [info     ] 2023-05-16 18:14:31,401 | WARNING  | tap-facebook         | Properties ('tracking_specs.post.wall',) were present in the 'ads' stream but not found in catalog schema. Ignoring. cmd_type=elb consumer=False name=tap-facebook producer=True stdio=stderr string_id=tap-facebook
2023-05-16T22:14:31.402750Z [info     ] Traceback (most recent call last): cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.402899Z [info     ]   File "/Users/robg/eventbrite/data-pipelines/docker/meltano/.meltano/loaders/target-jsonl/venv/bin/target-jsonl", line 8, in <module> cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.403205Z [info     ]     sys.exit(main())           cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.403338Z [info     ]   File "/Users/robg/eventbrite/data-pipelines/docker/meltano/.meltano/loaders/target-jsonl/venv/lib/python3.10/site-packages/target_jsonl.py", line 92, in main cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.403766Z [info     ]     state = persist_messages(  cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.403914Z [info     ]   File "/Users/robg/eventbrite/data-pipelines/docker/meltano/.meltano/loaders/target-jsonl/venv/lib/python3.10/site-packages/target_jsonl.py", line 54, in persist_messages cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.404096Z [info     ]     validators[o['stream']].validate((o['record'])) cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.404188Z [info     ]   File "/Users/robg/eventbrite/data-pipelines/docker/meltano/.meltano/loaders/target-jsonl/venv/lib/python3.10/site-packages/jsonschema/validators.py", line 130, in validate cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.404352Z [info     ]     raise error                cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.404484Z [info     ] jsonschema.exceptions.ValidationError: '1234567890' is not of type 'integer', 'null' cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.404568Z [info     ]                                cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.404644Z [info     ] Failed validating 'type' in schema['properties']['account_id']: cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.404717Z [info     ]     {'type': ['integer', 'null']} cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.404786Z [info     ]                                cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.404857Z [info     ] On instance['account_id']:     cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.404925Z [info     ]     '1234567890'        cmd_type=elb consumer=True name=target-jsonl producer=False stdio=stderr string_id=target-jsonl
2023-05-16T22:14:31.422833Z [error    ] Loader failed
2023-05-16T22:14:31.423336Z [error    ] Block run completed.           block_type=ExtractLoadBlocks err=RunnerError('Loader failed') exit_codes={<PluginType.LOADERS: 'loaders'>: 1} set_number=0 success=False

The issue seems to be that this line defines account_id as IntegerType, but it is coming in as a string in the response. Is this an issue with my setup or configuration? I'm running on Python 3.10.2.

bug: Since we have ODAX enforcement from v17, this field is useless and should always be assumed true

Originally discussed in https://meltano.slack.com/archives/C01TCRBBJD7/p1691683828313669?thread_ts=1691550548.963669&cid=C01TCRBBJD7
Screen Shot 2023-08-25 at 12 40 26 PM
Screen Shot 2023-08-25 at 12 40 33 PM

then was brought up again in https://meltano.slack.com/archives/C01TCRBBJD7/p1692969672434419

Screen Shot 2023-08-25 at 12 40 08 PM

singer_sdk.exceptions.FatalAPIError: 400 Client Error: b'{"error":{"message":"(#12) Since we have ODAX enforcement from v17, this field is useless and should always be assumed true is deprecated for versions v17.0 and higher","type":"OAuthException","code":12,"fbtrace_id":"AzatFQb1xy081BM4_4Pt_nt"}}' (Reason: Bad Request) for path: /v17.0/me/adaccounts

The solution was the exclude this value using select criteria:

    select:
    - '*.*'
    - '!adaccounts.has_advertiser_opted_in_odax'

Related to #90 we should remove this field if v17 is being used. I wonder if we can wrap it in a if/then to check for API version if we're not cutting over to v17 immediately.

bug: custom audiences `rule` is causing error

We should exclude the rule property from the stream since it is causing an error and as a result a long backoff loop because it throws a 500 saying something like Please reduce the amount of data... then we retry with no change so it still doesnt succeed..

Rate limit errors

Facebook's rate limits are quite low for apps in "Development" mode and getting approved to come out of development mode is not trivial. This makes it difficult to use Meltano with a new app.

Would it be possible to implement a rate limit backoff? It is possible to see how close you are to the rate limit using a header that FB provides in responses from the API:

def check_limit(account_number, access_token):

    check=requests.get('https://graph.facebook.com/v5.0/'+account_number+'/insights?access_token='+access_token)
    usage=float(find_between(check.headers['x-business-use-case-usage'],'"total_time":',','))
    print('\tRate limit for account %s threshold: %d%%' % (account_number, usage))
    return usage

Parallelize async insight report jobs

The tap gets reports using the async job workflow but we only kick off one job right now and wait for it to complete. We can kick off many of them at once and then poll for completion, even keeping them in order so the stream is sorted if we wanted. Since several other jobs will be processing while we poll for the first one, I'd expect all of them to be ready to extract once the first one is done. This would allow us to increase throughput significantly.

Missing config `required=true`

I see in the README that access_token and account_id are required but they arent defined that way in the tap.py. Update to include the required=true parameter.

Additionally sprucing up the descriptions could be helpful too.

Upgrade Default Version to v17

As of June 14, 2023, a glitch in Cursor Paging for FB API version 16 will occasionally throw facebook_business.exceptions.FacebookBadObjectError: Bad data to set object data which traces back to here, line 99: https://github.com/facebook/facebook-python-business-sdk/blob/main/facebook_business/adobjects/abstractobject.py. What is happening is that the tap is hitting the API, returning all the batches of records, and successfully processing all of them except for the last batch, which is being redirected to a "Facebook: Error" page, so there's no json to parse, just html.

This is a known bug. Others have been having this issue since June 14:
https://developers.facebook.com/support/bugs/638894768267963/?join_id=f387c33d3149118

FB API team pushed a bug fix here on June 23:
facebook-python-business-sdk: //github Fix Cursor Paging Issue on Duplicating Fields Summary: [BizSDK][Python] Fix Cursor Paging Issue on D
This bug fix was merged into main on June 23 as FB API version 17.0.2.

To avoid this issue, recommending we upgrade the "default version" of the Meltano tap to version 17:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.