GithubHelp home page GithubHelp logo

8-bit-sheep / googleanalyticsr Goto Github PK

View Code? Open in Web Editor NEW
258.0 258.0 79.0 44.61 MB

Use the Google Analytics API from R

Home Page: https://8-bit-sheep.com/googleAnalyticsR/

License: Other

R 34.00% HTML 64.43% Jupyter Notebook 1.45% Dockerfile 0.11% JavaScript 0.01%
analytics api google googleanalyticsr googleauthr r

googleanalyticsr's People

Contributors

antoinesoetewey avatar gronerik avatar hidekoji avatar ironistm avatar j450h1 avatar jdeboer avatar kiendang avatar loganek avatar maegan-whytock avatar markedmondson1234 avatar nick-holt avatar octaviancorlade avatar papageorgiou avatar ricardopinto avatar terashim avatar zjuul avatar zselinger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

googleanalyticsr's Issues

MCF reports not working (v4)

Hello,
It seems like the plugin won't recognize mcf dimensions and metrics. It tries to add "ga:" string, instead of adding "mcf:"
I'm not sure If it's just a bug or mcf support requires more thing to do.

502 timeout - Problem fetching data with version 0.3.0.9000

The query:

se <- segment_element("transactionRevenue", 
                      operator = "GREATER_THAN", 
                      type = "METRIC", 
                      comparisonValue = 0, 
                      scope = "SESSION") 
sv_simple <- segment_vector_simple(list(list(se)))
seg_defined <- segment_define(list(sv_simple))
segment4 <- segment_ga4("simple", user_segment = seg_defined)

ga_conversion_paths.df  <- google_analytics_4(
  ga_id,
  date_range = as.character(date_range),
  metrics = c('hits','transactionRevenue'),
  dimensions =  c("dimension18","dimension17", "eventAction", "eventLabel","segment","dimension8"),
  filtersExpression = "ga:eventLabel!~^:;ga:dimension18!=false;ga:eventCategory==clientid",
  segments = segment4,
  anti_sample = T,
  anti_sample_batches = 15
)

The response:

anti_sample set to TRUE. Mitigating sampling via multiple API calls.
Finding how much sampling in data request...
Downloaded [10] rows from a total of [258213].
Data is sampled, based on 53.8% of sessions.
Calculated [3] batches are needed to download approx. [309856] rows unsampled.
Anti-sample call covering 14 days: 2016-10-01, 2016-10-14
Request Status Code: 502
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^

New ga_auth() return the token invisibily

This may confuse:

> ga_auth()

Auto-auth - .httr-oauth
Authenticated
<Token>
<oauth_endpoint>
 authorize: https://accounts.google.com/o/oauth2/auth
 access:    https://accounts.google.com/o/oauth2/token
 validate:  https://www.googleapis.com/oauth2/v1/tokeninfo
 revoke:    https://accounts.google.com/o/oauth2/revoke
<oauth_app> google
  key:    289759286325-da3fr5kq4nl4nkhmhs2uft776kdsggbo.apps.googleusercontent.com
  secret: <hidden>
<credentials> access_token, token_type, expires_in, refresh_token

---

Error when anti-sample includes some 0-row data.frames, names are all NULL

Treat this better:

Error in fetch_google_analytics_4(requests, merge = TRUE) : 
  List of dataframes have non-identical column names. Got NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 
9.
stop("List of dataframes have non-identical column names. Got ", 
    paste(lapply(out, function(x) names(x)), collapse = " ")) 
8.
fetch_google_analytics_4(requests, merge = TRUE) 
7.
google_analytics_4(viewId = viewId, date_range = c(x$start_date, 
    x$end_date), metrics = metrics, dimensions = dimensions, 
    dim_filters = dim_filters, met_filters = met_filters, filtersExpression = filtersExpression, 
    order = order, segments = segments, pivots = pivots, cohorts = cohorts,  ... 
6.
FUN(X[[i]], ...) 
5.
lapply(new_date_ranges, function(x) {
    if (x$range_date > 1) {
        myMessage("Anti-sample call covering ", x$range_date, 
            " days: ", x$start_date, ", ", x$end_date, level = 3) ... 
4.
anti_sample(viewId = viewId, date_range = date_range, metrics = metrics, 
    dimensions = dimensions, dim_filters = dim_filters, met_filters = met_filters, 
    filtersExpression = filtersExpression, order = order, segments = segments, 
    pivots = pivots, cohorts = cohorts, metricFormat = metricFormat,  ... 
3.
googleAnalyticsR::google_analytics_4(viewId = id, date_range = c(start, 
    end), metrics = metrics, dimensions = dimensions, filtersExpression = filters, 
    segments = segment, anti_sample = anti_sample, max = max_results) at misc.R#188
2.
iihGoogleAnalytics(id = ga_viewId, start = as.Date("2016-01-01"), 
    end = Sys.Date(), metrics = c("adClicks", "CTR", "CPC", "sessions", 
        "bounces", "adCost", "goalCompletionsAll", "goal9Completions", 
        "goal11Completions"), dimensions = c("date", "campaign",  ... at xxx_functions.R#213
1.
SEM_data(ga_viewId = ga_viewId) 

multiple filters return error

gadata <- google_analytics(id = XXXXXX, 
                       start=start.date, end=end.date, 
                       metrics = c("uniquePageviews"), 
                       dimensions = c("pageTitle","date","channelGrouping"),
                       filters = c("ga:channelGrouping%3D%3DSocial","ga:channelGrouping%3D%3DDirect")

)

returns:

Error in checkGoogleAPIError(req) : 
  JSON fetch error: Invalid value 'c("ga:channelGrouping==Social", "ga:channelGrouping==Direct").
  Values must match the following regular expression: 'ga:.+'

Error using google_analytics_account_list()

Windows 10 64 bit.
R 3.3.2
RStudio Version 1.0.44

The code:
account_list = google_analytics_account_list()

returns the following error:

Error : df is not a data frame
API Data failed to parse.  Returning parsed from JSON content.
                    Use this to test against your data_parse_function.

How can it be solved?

Error when using list_goals function

I run this query for generating a list of goals:

list_goals(accountId = accountId, webPropertyId = uaCode_sb, profileId = "~all", start.index = NULL, max.results = NULL)

It returns the following error:

> Error: Variables must be length 1 or 8.
> Problem variables: 'id', 'accountId', 'webPropertyId', 'internalWebPropertyId', 'profileId', 'name', 'value', 'active', 'type', 'created', 'updated', 'urlDestinationDetails.url', 'urlDestinationDetails.caseSensitive', 'urlDestinationDetails.matchType', 'urlDestinationDetails.firstStepRequired', 'eventDetails.useEventValue'

Similar queries for custom metrics, custom datasources, etc. work fine. Not sure this is a bug, but I don't understand how to solve this based on the error message.

max_results documentation not up to date?

The documentation for google_analytics() states the default value for max_results is -1:
https://github.com/MarkEdmondson1234/googleAnalyticsR_public/blob/master/man/google_analytics.Rd#L10
However, looking into the code, it is 100:
https://github.com/MarkEdmondson1234/googleAnalyticsR_public/blob/master/R/getData.R#L35
Explicitly setting max_results to -1 doesn't lead to the described behaviour of automatically querying all data:
https://github.com/MarkEdmondson1234/googleAnalyticsR_public/blob/master/man/google_analytics.Rd#L29
It leads to an error:
JSON fetch error: Invalid value '-1' for max-results parameter. Valid values are between 0 and 10000.
Which makes absolutely sense, because no special behaviour for max_results = -1 seems to be implemented.

Another thing I noticed is that for values > 10000 automatically all data is fetched. I just saw that this is documented as a TODO in the code ;) https://github.com/MarkEdmondson1234/googleAnalyticsR_public/blob/master/R/fetch_functions.R#L2

But also for 10000 batching is used, so the following condition could probably be safely changed to max_results <= 10000 (ie not < 10000):
https://github.com/MarkEdmondson1234/googleAnalyticsR_public/blob/master/R/getData.R#L83

Duplicate data

Hi Mark,

Another issue using the v4 API. The following query returns 126,399 rows. It is expected to return 26,399 rows. As far as I can tell, it's just repeating some of the result rows. The rows that I checked looked to have the correct data.

data.googleAnalyticsR <- google_analytics_4(ga_id, 
                                 dimensions=c('ga:month', "ga:year", "ga:landingPagePath"), 
                                 date_range=c("2015-04-01", "2015-04-30"),
                                 metrics = c('ga:sessions', "ga:bounceRate", "ga:avgSessionDuration", "ga:pageviewsPerSession"),
                                 max=1000000)

When I ran the equivalent queries on RGA and RGoogleAnalytics, I got the expected behavior. These are, I believe, using the v3 API.

tmp.query.list <- Init(start.date = "2015-04-01",
                       end.date = "2015-04-30",
                       dimensions=c('ga:month', "ga:year", "ga:landingPagePath"), 
                       metrics = c('ga:sessions', "ga:bounceRate", "ga:avgSessionDuration", "ga:pageviewsPerSession"),
                       max.results = 1000000,
                       table.id = "ga:XXXXXX")
tmp.query <- QueryBuilder(tmp.query.list)
data.RGoogleAnalytics <- GetReportData(tmp.query, token, split_daywise = F, delay = 0)

data.RGA <- get_ga(profileId = "ga:XXXXXX",
                  dimensions=c('ga:month', "ga:year", "ga:landingPagePath"), 
                  metrics = c('ga:sessions', "ga:bounceRate", "ga:avgSessionDuration", "ga:pageviewsPerSession"),
                  start.date='2015-04-01',
                  end.date='2015-04-30',
                  max=1000000
)

Thanks and let me know if you need any more sleuthing. Happy to help--it's the least I can do.

Best,
David

multi_account_batching = TRUE does not refresh OAuth token

It looks like multi_account_batching does not refresh the OAuth token. If I start an R session with a query where multi_account_batching = TRUE, I get an error. If I run a standard google_analytics query, the console prints "Auto-refreshing stale OAuth token." After that, I have no errors with multi_account_batching = TRUE.

Very big downloads failing v4

Something happens in the second 50000 batch that gives a "503: service error", look at JSON boxes.

> gaaa <- getGoogleAnalytics(config, historic = FALSE, auth_file = "ga.httr-oauth")
## getGoogleAnalytics
anti_sample set to TRUE. Mitigating sampling via multiple API calls.
Finding how much sampling in data request...
Auto-refreshing stale OAuth token.
Downloaded [10] rows from a total of [601845].
No sampling found, returning call
Downloaded [50000] rows from a total of [601845].
Request Status Code: 503
Trying again: 1 of 5
Trying again: 2 of 5

Check pivot gav4 parsing

Under some circumstances looks to fail.

list(structure(list(columnHeader = structure(list(dimensions = list(
    "ga:eventLabel"), metricHeader = structure(list(metricHeaderEntries = list(
    structure(list(name = "ga:users", type = "INTEGER"), .Names = c("name", 
    "type"))), pivotHeaders = list(structure(list(totalPivotGroupsCount = 1L), .Names = "totalPivotGroupsCount"))), .Names = c("metricHeaderEntries", 
"pivotHeaders"))), .Names = c("dimensions", "metricHeader")), 
    data = structure(list(rows = list(structure(list(dimensions = list(
        "http://producer.imglobal.com/producerdocuments.ashx?a=524451&documentid=2645"), 
        metrics = list(structure(list(values = list("1"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("http://www.expatriatehealthcare.com/Broker/WFTI"), 
        metrics = list(structure(list(values = list("1"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("http://www.internationalrail.com/"), 
        metrics = list(structure(list(values = list("2"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("http://www.piau-engaly.com/"), 
        metrics = list(structure(list(values = list("1"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("http://www.saintlary.com/hiver/index.php"), 
        metrics = list(structure(list(values = list("1"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("http://www.travel-claims.net/"), 
        metrics = list(structure(list(values = list("1"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://medicaltravelcompared.co.uk/affiliate/q/rothwelltowler"), 
        metrics = list(structure(list(values = list("1"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://producer.imglobal.com/international-insurance-plans.aspx?imgac=524451"), 
        metrics = list(structure(list(values = list("3"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://quote.freespirittravelinsurance.com/a/3145"), 
        metrics = list(structure(list(values = list("2"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://secure.guestfirst.co.uk/"), 
        metrics = list(structure(list(values = list("1"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://uk.trustpilot.com/review/www.world-first.co.uk"), 
        metrics = list(structure(list(values = list("1"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://www.imgeurope.co.uk/purchase/Quote/GLOBAL_FUSION/pre-quote?imgac=524451"), 
        metrics = list(structure(list(values = list("2"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://www.imgeurope.co.uk/purchase/Quote/globehopper_group/pre-quote?imgac=524451"), 
        metrics = list(structure(list(values = list("2"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://www.imgeurope.co.uk/purchase/quote/globehopper_multitrip_group?imgac=524451"), 
        metrics = list(structure(list(values = list("1"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://www.imgeurope.co.uk/purchase/Quote/globehopper_multitrip/pre-quote?imgac=524451"), 
        metrics = list(structure(list(values = list("9"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://www.imgeurope.co.uk/purchase/quote/globehopper_platinum?imgac=524451"), 
        metrics = list(structure(list(values = list("3"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    )), structure(list(dimensions = list("https://www.imgeurope.co.uk/purchase/Quote/globehopper/pre-quote?imgac=524451"), 
        metrics = list(structure(list(values = list("33"), pivotValueRegions = list(
            structure(list(), .Names = character(0)))), .Names = c("values", 
        "pivotValueRegions")))), .Names = c("dimensions", "metrics"
    ))), totals = list(structure(list(values = list("65"), pivotValueRegions = list(
        structure(list(), .Names = character(0)))), .Names = c("values", 
    "pivotValueRegions"))), rowCount = 17L, minimums = list(structure(list(
        values = list("1"), pivotValueRegions = list(structure(list(), .Names = character(0)))), .Names = c("values", 
    "pivotValueRegions"))), maximums = list(structure(list(values = list(
        "33"), pivotValueRegions = list(structure(list(), .Names = character(0)))), .Names = c("values", 
    "pivotValueRegions")))), .Names = c("rows", "totals", "rowCount", 
    "minimums", "maximums"))), .Names = c("columnHeader", "data"
)))

Auto-authentication JSON file not working / Account list parsing fail

Hi,
maybe I'm just not fully integrating the authentication process. But as I followed the setup the only thing I need to do is to place the .json from Google APIs Admin Console and point to it with the Global variable.

Sys.setenv(GA_AUTH_FILE = "/Users/michaelsinner/rStudio/test/auth/myAuth.json")

When I try to run a simple test code I get this error:
No authorization yet in this session!
NOTE: a .httr-oauth file exists in current working directory.
Run gar_auth() to use the credentials cached for this session.
Token doesn't exist
Fehler in acc_sum() : Invalid Token
Zusätzlich: Warnmeldung:
In checkTokenAPI(shiny_access_token) : Invalid local token

What am I missing? Is there an issue with the auto authentication with JSON file?

Only returning 10,000 rows when samplingLevel is set to "WALK" and only 1,000 rows when segment used

When trying to use the walk option for samplingLevel, only 10,000 rows are returned for each api call.

full <- google_analytics(profile, start = singleStart ,end = singleEnd, metrics= "ga:sessions,ga:itemRevenue", dimensions = "ga:date,ga:dimension14,ga:dimension16,ga:medium,ga:source,ga:userType", samplingLevel = "WALK", max_results = 100000)

image

When adding a segment, the total drops to 1,000 per api call.

segment <- google_analytics(profile, start = singleStart ,end = singleEnd, metrics= "ga:sessions,ga:itemRevenue", dimensions = "ga:date,ga:dimension14,ga:dimension16,ga:medium,ga:source,ga:userType", segment = "sessions::condition::!ga:eventAction=@Create Account", samplingLevel = "WALK", max_results = 100000)

image

Is my script correct or is there another way to walk through the results?

object 'out' not found

I tried anti-sampling query, but there was the issue:

ga_auth()
unsampled_data_fetch <- google_analytics_4(ga_id, 
                                         date_range = c("2015-01-01","2015-06-21"), 
                                         metrics = c("users","sessions","bounceRate"), 
                                         dimensions = c("date","landingPagePath","source"),
                                         anti_sample = TRUE)

....

anti_sample set to TRUE. Mitigating sampling via multiple API calls.
Finding how much sampling in data request...
Downloaded [10] rows from a total of [49581].
Data is sampled, based on 1.1% of sessions. Use argument anti_sample = TRUE to request unsampled data.
Finding number of sessions for anti-sample calculations...
Downloaded [172] rows from a total of [172].
Calculated [102] batches are needed to download approx. [59497] rows unsampled.
Attempting hourly anti-sampling...
Finding number of hourly sessions for anti-sample calculations...
Downloaded [24] rows from a total of [24].
Anti-sample call covering 24 hours: 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Error in google_analytics_4(viewId = viewId, date_range = c(the_day, the_day),  : 
  object 'out' not found

Do you have any ideas about the issue?

Use gar_batch_walk with multi-account fetching

From Jimmy Glenn, he uses batching to speed up multi-account fetching in v3:

[.....] we have over 100 GA properties. We have a rollup property, but we're not on GA premium. Being able to run the same query on each property in a batch format is much faster. For a simple pageview query, batching takes 15 - 20 seconds versus 50 seconds using google_analytics calls. That time savings adds up when pulling traffic to each site by article or section.

 ga_pars <- list(ids = ga_views$gaId[1], 
               'start-date' = start_date, 'end-date' = end_date,
                metrics = 'ga:users,ga:sessions',
                output = 'json')
 # converting dates to character and URLencoding
ga_pars <- lapply(ga_pars, as.character) %>% 
                            lapply(., function(x) URLencode(x, reserved = T))

f <- gar_api_generator("https://www.googleapis.com/analytics/v3/data/ga",
                       "GET", 
                       pars_args = ga_pars, 
                       data_parse_function = parse_google_analytics)
output <- gar_batch_walk(f, 
                        walk_vector = ga_views$gaId, 
                        gar_pars = ga_pars, 
                        pars_walk = 'ids', 
                        data_frame_output = FALSE)

results <- lapply(output, plyr::ldply) %>% plyr::ldply()

Error: could not find function "dynamicSegment"

The following code returns:

Error: could not find function "dynamicSegment"

It's weird. googleAnalyticsR is loaded, and the help is showing, but R is saying the function doesn't exist. (Put aside that I might not have the syntax right yet... but it seems like there's something more fundamental I'm missing here.)

library(googleAnalyticsR)
segments_list <- list(
  segment_ga4("All Visits","gaid::-1"),
  dynamicSegment("Non-Paid",sessionSegment = "sessions::condition::ga:channelGrouping=~(Organic.Search)|(Direct)|(Referral)"),
  dynamicSegment("Paid",sessionSegment = "sessions::condition::ga:channelGrouping=~(Paid.Search)|(Display)|(Video)|(Social)|(Email)|(Other)")
)

Issue with googleAnalytics Authentication

Hello Mark,
I'm struggling to figure out how googleAuthR works. I've written a script to compare Google and Adobe. I'm scheduling the script for every week. When I run the script live myself in R, the authentication works (obviously as I get the pop up to authorize my Google account). However when I schedule the script to run, I have authentication issues. Not sure what I'm doing wrong or if it is the order I have the code in.

I've ran the script and have the .httr-oauth file saved in the same folder as the R script. Do I have the order wrong? Should there be something in the gar_auth()? Did I screw up the options? Thanks.

library(googleAnalyticsR)
library (googleAuthR)

gar_auth()

options(googleAuthR.client_id = "uxxxx.apps.googleusercontent.com")
options(googleAuthR.client_secret = "xxxx")
options(googleAuthR.scopes.selected = "https://www.googleapis.com/auth/analytics")

add some more segment examples

  1. you have to set "segment" as a dimension, doesn't seem natural, and not explicitly mentioned in the docs..
  2. There is a full example of an AND segment, but not of for an IF statement.
  3. The operators have specific names. Referring to a list of them would be helpful.

pageTitle won't work as dimension in query

gadata <- google_analytics(id = XXXXXX, 
                       start=start.date, end=end.date, 
                       metrics = c("uniquePageviews"), 
                       dimensions = c("pageTitle","date","channelGrouping"))

head(gadata)

  pageTitle       date  channelGrouping uniquePageviews
1 (not set) 2015-03-01           Direct              95
2 (not set) 2015-03-01 Email Newsletter             190
3 (not set) 2015-03-01   Organic Search             475
4 (not set) 2015-03-01      Paid Search             285
5 (not set) 2015-03-01         Referral              95
6 (not set) 2015-03-02           Direct             285

WALK not working if one dday

Not working first time....

google_analytics(gaId,
                        start = "2016-01-25",
                        end = "2016-01-25",
                        metrics = c("pageviews"),
                        dimensions = c("dimension3", "pagePath"),
                        samplingLevel = "WALK",
                        filters = "ga:dimension3!~_scUid",
                        max_results = 20000)

Doesn't fetch dates correctly:

raw <- google_analytics(gaId,
                        start = "2016-01-25",
                        end = "2016-01-26",
                        metrics = c("pageviews"),
                        dimensions = c("dimension3", "pagePath"),
                        samplingLevel = "WALK",
                        filters = "ga:dimension3!~_scUid",
                        max_results = 20000)

Anti-sampling

The v4 GA quotas will make this a lot easier for not so complicated fetches.

The traditional per day fetch to avoid sampling only works in most cases as it ensures the API fetches re below the sampling limit, but it breaks if that is not the case and is very inefficient if only lightly sampled.

The API supplies how many rows are sampled, so take this in the first call to calculate the batch sizes, and use that to split the data into non-sampled calls instead.

Error in google_analytics_account_list()

Running account_list <- google_analytics_account_list() gives an error:

request: https://www.googleapis.com/analytics/v3/management/accountSummaries/
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match

Traceback shows:

traceback()
7: stop("numbers of columns of arguments do not match")
6: rbind(deparse.level, ...)
5: f(init, x[[i]])
4: Reduce(rbind, listNameToDFCol(wp_prep, "accountId"))
3: data_parse_function(req$content, ...)
2: acc_sum()
1: google_analytics_account_list()

Retrieving actual data works fine, including the google_analytics_meta() function.

Batch fail when querying non-golden data

Downloaded [21189] rows from a total of [21207].
Error in seq.default(from = 50000, to = all_rows, by = reqRowLimit) : 
  wrong sign in 'by' argument

It looks like the total row field changes when you are querying data from today, possibly non-golden data

samplingLevel = "WALK" - error

The below query works and returns sampled results:

start.date <-  "2015-05-01"
end.date <- "2015-11-10"

ga.data <- google_analytics(id = 93625103, 
                        start=start.date, end=end.date, 
                        metrics = c("uniquePageviews"), 
                        dimensions = c("pageTitle","date","channelGrouping"),
                        filters = "ga:pageTitle%3D%3DXXXXX,ga:pageTitle%3D%3DXXXXX;ga:country%3D%3DNetherlands;ga:deviceCategory%3D%3Ddesktop",
                        max=100000,)

However when I add samplingLevel = "WALK" to query I get:

Request to profileId:  ()
Error in if (x$kind == "analytics#gaData") { : argument is of length zero

Cohort example ignores first cohort

Dear Mark,

First of all, thank you for writing this great R package! I am delighted to be able to use the API v4 directly from R.

I tried out the cohort example from the vignette, but it didn't behave the way I expected.

## first make a cohort group
cohort4 <- make_cohort_group(list("cohort 1" = c("2015-08-01", "2015-08-01"), 
                                "cohort 2" = c("2015-07-01","2015-07-01")))

## then call cohort report.  No date_range and must include metrics and dimensions
##   from the cohort list
cohort_example <- google_analytics_4(ga_id, 
                                     dimensions=c('cohort'), 
                                     cohort = cohort4, 
                                     metrics = c('cohortTotalUsers'))

This only returns a value for cohort 2. My first thought was that we didn't have data for cohort 1, but if I swapped the date ranges, I got only data for 2015-08-01. In fact, if I added a third row, I get the last two:

## first make a cohort group
cohort4 <- make_cohort_group(list("cohort 1" = c("2015-08-01", "2015-08-01"), 
                                "cohort 2" = c("2015-07-01","2015-07-01"),
                                "cohort 3" = c("2015-08-01", "2015-08-01")))

## then call cohort report.  No date_range and must include metrics and dimensions
##   from the cohort list
cohort_example <- google_analytics_4(ga_id, 
                                     dimensions=c('cohort'), 
                                     cohort = cohort4, 
                                     metrics = c('cohortTotalUsers'))

The above returns cohorts 2 and 3. Seems to be a simple off-by-one. I am using R 3.2.4 and I installed via install.packages today.

ga_v4_objects.R / order_type

In the order_type function should this line

testthat::expect_type(field, character)

be

testthat::expect_type(field, "character")

Can't seem to get a query with this option to work.

Better error messages when using v3 sytax in v4

Make it easier to port google_analytics calls to google_analytics_4

Things like filters get evaluated to filtersExpression when you should be using dim_filters or met_filters

Move start and end to date_range

Set max_results to max

etc.

URLencode filters

When running a query with the filter "ga:pageTitle=@at&T", it looks like the filter is not url encoded.
I added this to line 15 of google_analytics and it appears to work:

if (!is.null(filters)) {
    filters <- utils::URLencode(filters, reserved = TRUE)
    } 

There's probably a much more elegant solution out there, not sure I'm proficient enough to find it.

Document single filters syntax better - Dimension filter error

When running the following code:

df <- dim_filter(dimension="ga:campaign",operator="REGEXP",expressions="welcome")
data_fetch <- google_analytics_4(ga_id,date_range = c("2016-01-01","2016-12-31"),
                                 metrics = c("ga:itemRevenue","ga:itemQuantity"),
                                 dimensions = c("ga:campaign","ga:transactionId","ga:dateHour","ga:productBrand","ga:productName"),
                                 dim_filters = df,
                                 anti_sample = TRUE)

It has the following error:

Error in checkGoogleAPIError(req) :
JSON fetch error: Invalid JSON payload received. Unknown name "dimension_name" at 'report_requests[0].dimension_filter_clauses': Cannot find field.
Invalid JSON payload received. Unknown name "not" at 'report_requests[0].dimension_filter_clauses': Cannot find field.
Invalid value at 'report_requests[0].dimension_filter_clauses.operator' (TYPE_ENUM), "REGEXP"
Invalid JSON payload received. Unknown name "expressions" at 'report_requests[0].dimension_filter_clauses': Cannot find field.
Invalid JSON payload received. Unknown name "case_sensitive" at 'report_requests[0].dimension_filter_clauses': Cannot find field.

Splitting data into multiple rows when using anti_sample=true

I run the following code with or without anti_sample:

cf <- dim_filter("dimension7","EXACT",campaign_name,not = FALSE)
fc <- filter_clause_ga4(list(cf))
gaDataFunnel_dev1 <- google_analytics_4(viewId, 
                                    date_range = c(dateStart, dateYesterday),
                                    dimensions=c("deviceCategory", "dimension7"), 
                                    metrics = c("uniquePageviews"),
                                    order = order_type("deviceCategory", "ASCENDING"),
                                    dim_filters = fc,
                                    anti_sample = TRUE)

The code with anti_sample=true returns the same values separated into two rows:
anti_sample true

Without anti_sample=true, the table is no longer split into multiple rows:
no_anti_sample true

This seems to occur only in the past few days. It didn't happen before.
I'm running version 0.3.0 of the package

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.