GithubHelp home page GithubHelp logo

Comments (34)

bcalvert531 avatar bcalvert531 commented on September 25, 2024 7

I've been wrestling with this myself. I think that the explanation of TIB given in the book is confused, and conflates between the underlying tick-level data, and the tick imbalance bars we're constructing out of the tick-level data. Hopefully writing my thoughts out here will clarify my confusion and help some other folks.

Like @GerardBCN says above, the point of tick-imbalance bars is to construct bars/"buckets" containing the same amount of information about tick imbalance (i.e., the presence of informed traders) across varying time/amounts of ticks.

First we construct the b_t indicators for each individual tick, showing 1 if the price went up since the previous tick and -1 if the price went down, corresponding to a buyer or seller taking the tick from the market maker.

The price imbalance at tick T is the sum of b_t (tick buyer/seller signs) from 1 to T. This returns a signed int. (Note: it's not specified where our sequence of ticks 1...T starts at -- the beginning of our data series, or the end of the last sampled tick imbalance bar? It can only really make sense as the latter, as far as I can tell.)

We then sample a tick imbalance bar as soon as the (absolute value of the) imbalance at the tick T exceeds the expected imbalance across previously sampled tick imbalance bars. LdP defines the expected imbalance as: the expected number of ticks per TIB * abs(P[b=1] - P[b=-1]). Here, abs(P[b=1] - P[b=-1]) is the percent of the tick bar that should constitute the imbalance. i.e., If we have 100 ticks, the tick imbalance is -10 (45 buy ticks, 55 sell ticks), and abs(0.45 - 0.55) = abs(-0.1) = 0.1, so the expected tick imbalance is 100 * .1 = 10 = abs(imbalance). If our sequence of ticks 1...T has an abs(imbalance) > i.e. 10, we sample a bar.

So, as far as I can tell, the sequence of ticks 1...T is the sequence of ticks starting from the end of the previous bar. The expectations E(T) and E(theta_T) (i.e. expected imbalance), and the probabilities P[b_t=1] and P[b_t=-1] need to be referring to previously sampled tick imbalance bars.

LdP says that we can E(T) by taking the EWMA of the # of ticks per bar across our previous bars. He says that we can find P[b_t=1] "as an exponentially weighted moving average of b_t values from prior bars". Conceptually, this makes sense, since we're trying to find the expected (absolute) % of # of ticks that are imbalanced, which we then multiply by the expected # of ticks to find the quantity Theta_T needs to exceed to trigger a bar sampling. But the tick imbalance bars don't record individual b_ts, so we'd be taking the EWMA over the ticks, unlike with the expected # of ticks per bar where we EWMA over previously sampled bars. We could record the numbers of [b_t=1] and b_t=-1] for each bar, so P[b_t=1] = [b_t=1] / # ticks; but that this point we can just store the actual imbalance per tick bar and take the EWMA over that, right? Doesn't that route us around all the need to calculate expected tick size and imbalance ratios and stuff?

The issue that I run into then is setting a reasonable default parameter for the expected tick imbalance, which as noted above should be washed out by the actual buy/sell dynamics.

I don't know if I feel any less confused, or if this clarified anything for anyone, but I want to stop thinking about this for a while. Generally, I think the results @GerardBCN are getting look correct.

from adv_fin_ml_exercises.

GerardBCN avatar GerardBCN commented on September 25, 2024 4

Hi guys,

I've been also exploring tick-imbalance bars, in my case applied to the BTC-USD pair. See the results in the attached plot. At the beginning I was quite skeptic because it looked to me that sampling was not often enough (it sampled 7050 candles in ~6 years of trading data, which gives a mean of 3.2 candles per day), specially when in the book it is mentioned that the micro-structure of price is the real gold mine these days... so I was expecting more sampling to be able to capture changes of "micro-trends".

However, when taking a closer look at the results, it kind of looks it is doing the job right. So, de Lopez says in the book that these information-driven type of bars should be understood each of them as a bucket containing an equal amount of information. What one can see is that in periods in which the price goes sideways and there's general uncertainty (lack of "informed" trading) there's much less sampling of bars, and sampling quickly reacts to changes in the trend.

BTC_TIB

Any thoughts? Does this make sense to you?

from adv_fin_ml_exercises.

erodgithub avatar erodgithub commented on September 25, 2024 2

My implementation seems to be wrong. It seems like the estimated E[T] value increases greatly over time, so the first couple of bars happen within seconds or minutes of each other while the later ones are hours or days long depending on how much data I'm using. Any idea why that would be?

Sorry I'm not providing very much information here, but I'd be happy to discuss if any of you would like it.

from adv_fin_ml_exercises.

Alcester avatar Alcester commented on September 25, 2024 1

Thanks Peter,

@aldebaransearch, I am just trying to understand TIB, I never used it yet. During free time i wrote some code based on my understanding of tibs (my first time with python, please do not judge the quality of the code :) ), so now I wanted to compare it with someone. You were very kind to show me your progress, next days i will go deeper into this code using yours too.

I share my code, if you are interested, do not rely on much because it was written in 2 days based on what I understood...
tib_test.txt

Best, Massimiliano

from adv_fin_ml_exercises.

GerardBCN avatar GerardBCN commented on September 25, 2024 1

I think my problem is this particular definition:

(2P[bt = 1]-1) [is calculated] as an exponentially weighted moving average of bt values from prior bars

I actually do the ewma of P[bt=1] from previous bars. I just don't see how one can do (and what's the output) of doing ewma from a sequence of 1 and -1. To me it makes sense to say: hey, in the last bars the probability of finding a bt = 1 was 0.8, 0.7, 0.9, 0.7, 0.8, 0.9... etc so, the expectation is that the P[bt=1] should be also quite high, i.e. the ewma of the sequence of probabilities which is :

ewma([0.8, 0.7, 0.9, 0.7, 0.8, 0.9], window=3)[-1]

0.840625

from adv_fin_ml_exercises.

BlackArbsCEO avatar BlackArbsCEO commented on September 25, 2024

@mysl good catch. I mistakenly left some of the additional code in a different notebook. I will try to update today. In the mean time, if you have an implementation for computing the imbalance bars then we should compare notes.

from adv_fin_ml_exercises.

mysl avatar mysl commented on September 25, 2024

@BlackArbsCEO , yeah. I do have an implementation, not very efficient though, I can try putting somewhere on the gist later. Actually there is something from the book confusing me, i.e. when computing the exponential moving average of the volume imbalance for example, not sure if I should use absolute value or its signed value.

from adv_fin_ml_exercises.

BlackArbsCEO avatar BlackArbsCEO commented on September 25, 2024

@mysl, implementing the imbalance bar I find the definition of T confusing because he uses it as the index of the theta_T series but then states that E[T] is the expected size of the tick bar. Does he mean the size as in the number of ticks in the tick-imbalance bar? If so how is that value initialized? My current implementation assumes that E[T] is literally the ewma of the range(len(theta_T)) index which I then use in the computation of the actual T* values. The computed T* values then appears reasonable (with T* values ranging from ~1-40 ticks) but I think I may have misunderstood or overlooked something to that end.

from adv_fin_ml_exercises.

BlackArbsCEO avatar BlackArbsCEO commented on September 25, 2024

closing until someone posts a better implementation for the imbalance bars.

closed ef219be

from adv_fin_ml_exercises.

mysl avatar mysl commented on September 25, 2024

@BlackArbsCEO , sorry, busy with other stuff lately.

Does he mean the size as in the number of ticks in the tick-imbalance bar?

my understanding is yes, and my assumption is that we can initialize with somewhat "reasonable" value, then the initial value would be forgotten as more data comes in, and after a warm up period, the impact would be neglected.
I am simply using ewma(v) = lambda*new_v + (1-lambda)*v to calculate the ewma.
The following is a VIB plot generated from two day's ticks with this approach, looks relatively reasonable and robust to different initial values. It looks like it is more sensitive to the lambda value which controls the forgetting rate.

However per my simple test, it seems not having significant better statistical properties than regular time bar, such as more stationary or closer to normal return etc as claimed in the book. So maybe my understanding or implementation is wrong.
new bitmap image

from adv_fin_ml_exercises.

aldebaransearch avatar aldebaransearch commented on September 25, 2024

I'm sorry for being a little late on this thread, but I hope that you, @BlackArbsCEO , @mysl and @erodgithub are still listening.

I have also played around with the different methods from chapter 2. I can only support your findings; the resulting bars are more stationary, more uniformly distributed across time etc. There will be a difference across different markets/instruments.

Going along the thoughts of chapter 5, I do not really find it too worrying though. As long as the resulting bars are just stationary "enough" I like the idea of only creating bars that are relevant for the problem at hand. For instance I have a strategy that really suffers from adverse selection when the tick imbalance is large, hence I think (dream :) ), that bars sampled more often when tick imbalance changes, could help me.

However, I find it problematic that the methods seem super sensitive to initial conditions. Take tick imbalance bars as an example: I find certain levels of initial tick imbalance combined with certain decay factors for the ewma weighting, that quite sharply divide parameters into 2 territories: one that produces many bars only few ticks big for the entire length of history and one that runs wild like @erodgithub describes above, such that the length of the bars rapidly grows to several days and finally infinity. Have you observed the same behavior?

I would love to share thoughts and ideas related to the book and hope you would like the same.

from adv_fin_ml_exercises.

aldebaransearch avatar aldebaransearch commented on September 25, 2024

From #1 (comment) above:

Does he mean the size as in the number of ticks in the tick-imbalance bar?

my understanding is yes, and my assumption is that we can initialize with somewhat "reasonable" value, then the initial value would be forgotten as more data comes in, and after a warm up period, the impact would be neglected.

That is my understanding too

from adv_fin_ml_exercises.

p1m3nt avatar p1m3nt commented on September 25, 2024

smoothing method is one of many factors that affect the sensitivity of bars to the initial condition, state space filters such as kalman filter gives better control and yields robust stability regardless initial condition

from adv_fin_ml_exercises.

aldebaransearch avatar aldebaransearch commented on September 25, 2024

@BlackArbsCEO, I have now committed a python file bars.py based on my understanding of the chapter 2. Sorry it took me so long!

@p1m3nt, I can see what you are hinting at, but after experimental designs, I have great trouble making a Kalman filter than the exponential smoothers. Results are still very unstable with regards to parameters for the filter. Have you implemented a version of a Kalman filter, that assumes any particular dynamics of the system? For the imbalance bars, my Kalman filter models the average time of a bar, T, and the imbalance. I have no velocity components and no interaction between the 2 variables.

@erodgithub, have you found a way around the increasing duration of bars. I am pretty sure it is the same thing we are struggling with.

Best to all of you!

from adv_fin_ml_exercises.

rspadim avatar rspadim commented on September 25, 2024

i don't know if it help, here a example o kalmann
KalmanExample (1).xlsx

from adv_fin_ml_exercises.

Alcester avatar Alcester commented on September 25, 2024

@BlackArbsCEO , @mysl, @erodgithub, @aldebaransearch someone of you has implemented a complete version of TIB algorithm? I'm yet struggling to understand some parts of it

from adv_fin_ml_exercises.

aldebaransearch avatar aldebaransearch commented on September 25, 2024

@Alcester, I never had super success with the tick imbalance or other imbalance bars as well. As I described above, the imbalance always end up exploding for me in terms of the "typical" imbalance (the product of ewma weighted ticks pr bar and ewma weighted imbalance always grows). The idea using the Kalman filter above from @p1m3nt did not really make the difference (and sorry @rspadim for not getting back to say thanks for the Kalman examples - my problem is not the general understanding of various Kalman models, rather figuring out which one you or @p1m3nt might have been successful with in this context). I did add my code in a pull request, but apparently it never made it through. So here you have my snippet:

`@jit(nopython=True)
def tib(b_t, initial_imbalance, alpha):

weighted_count = 0 # Denominator for normalization of EWMAs
weighted_sum_T = 0 # Nominator for EWMA of duration of a bar
weighted_sum_imbalance = 0 # Nominator for EWMA of the imbalance of bar

out = np.zeros(b_t.shape)
dummy = 0
imbalance = initial_imbalance
T = 0

for i in range(len(b_t)):
    dummy += b_t[i]
    T += 1
    if (abs(dummy)>=imbalance):
        out[i] = 1
        weighted_sum_T = T + (1-alpha)*weighted_sum_T
        weighted_sum_imbalance = dummy/(1.0*T) + (1 - alpha) * weighted_sum_imbalance
        weighted_count = 1 + (1-alpha) * weighted_count
        ewma_T = weighted_sum_T/weighted_count
        ewma_imbalance = weighted_sum_imbalance/weighted_count
        imbalance = ewma_T * abs(ewma_imbalance)
        dummy = 0
        T = 0

return out

`

whick I use like this:

tib(np.where(data['side'] == 'Buy', 1, -1),100,.98)

However I have been more lucky building tick runs bars and they work really nice for my application. The volume runs and dollar runs also have a problem of suddenly exploding. I should actually try with the data @BlackArbsCEO hosts on his website; I am using tick data from BitMEX on a perpetual BTC swap and I would not be surprised if the crypto data showed to be not as "well behaving" as "old school" financial products. The tick runs bar code that I use look like this

`@jit(nopython=True)
def trb(property, alpha, initial_run_length, weighted_count=0, weighted_sum_T=0, weighted_sum_up=0, T=0, dummy_up=0):

out = np.zeros(property.shape)
run_length = initial_run_length
last_i = 0

for i in range(len(property)):
    if property[i]>0:
        dummy_up += 1
    T += 1
    dummy = max(dummy_up,T-dummy_up)
    if (abs(dummy) >= run_length):
        out[i] = 1
        weighted_sum_T = T + (1 - alpha) * weighted_sum_T
        weighted_sum_up = dummy_up / (1.0 * T) + (1 - alpha) * weighted_sum_up
        weighted_count = 1 + (1 - alpha) * weighted_count
        ewma_T = weighted_sum_T / weighted_count
        ewma_up = weighted_sum_up / weighted_count
        run_length = ewma_T * max(ewma_up,1-ewma_up)
        dummy_up = 0
        T = 0
        last_i = i

return out, last_i, [run_length, weighted_count, weighted_sum_T, weighted_sum_up, T, dummy_up]`

where property is the np.where(data['side'] == 'Buy', 1, -1) from above and initial_run_length=150 and alpha=0.98 works for me.

Use it for what it is worth. It would be great to know if you end up making some of it work...

Best, Peter

from adv_fin_ml_exercises.

pkdmark avatar pkdmark commented on September 25, 2024

Hi,

I have also tried creating TIB and have been quite careful making sure it models the definition he described in the book and I find that the E[T] just explodes because if it takes 4 ticks to reach an initial imbalance limit set as 4 (so b = 1 1 1 1 for the first 4 ticks) then it could take 20 ticks to observe another imbalance of 4 and since ewma(b) will be between -1 and 1 then the new E[imbalance] = E[T]*|ewma[b]| will become 20 * 0.8 for example and the new expected imbalance increases and it will take exponentially long to reach the new expected imbalance. I'm not sure the concept is solid :/ If someone has a solution I'd really love to hear it.

Sample of the exploding with initial E[imbalance] = 5 (seemed reasonable)


The initial expected length of the bar is 5.0
The initial expected signal is [ 0.99525]
The initial expected imbalance is [ 4.97624999]

The imbalance has exceeded the expected imbalance
[ 5.] > [[ 4.97624999]]
It took 5 tick signals to for the imbalance to exceed E[imbalance]

The new expected bar size it [ 5.]
The new expected signal is [[ 0.90499941]]
The new expected imbalance is [[ 4.52499703]]

The imbalance has exceeded the expected imbalance
[ 5.] > [[ 4.52499703]]
It took 9 tick signals to for the imbalance to exceed E[imbalance]

The new expected bar size it [ 8.80047506]
The new expected signal is [[ 0.90025]]
The new expected imbalance is [[ 7.92262767]]

The imbalance has exceeded the expected imbalance
[ 8.] > [[ 7.92262767]]
It took 28 tick signals to for the imbalance to exceed E[imbalance]

The new expected bar size it [ 27.04013775]
The new expected signal is [[ 0.99975003]]
The new expected imbalance is [[ 27.03337856]]

The imbalance has exceeded the expected imbalance
[ 28.] > [[ 27.03337856]]
It took 76 tick signals to for the imbalance to exceed E[imbalance]

The new expected bar size it [ 73.55202142]
The new expected signal is [[-0.90475]]
The new expected imbalance is [[ 66.54619138]]

The imbalance has exceeded the expected imbalance
[ 67.] > [[ 66.54619138]]
It took 2629 tick signals to for the imbalance to exceed E[imbalance]

The new expected bar size it [ 2501.227639]
The new expected signal is [[-0.9002375]]
The new expected imbalance is [[ 2251.69892038]]

The imbalance has exceeded the expected imbalance
[ 2252.] > [[ 2251.69892038]]
It took 439962 tick signals to for the imbalance to exceed E[imbalance]

from adv_fin_ml_exercises.

Herrsosa avatar Herrsosa commented on September 25, 2024

Hi,

thank you to everyone for providing examples so far. Based on the code that others have posted and the definition in the book, I have been trying to create tick imbalance bars too. Given the data available to me, the bars do not tend to blow out after a short period of adjustment in the beginning.
I´m new to programming. So, sorry for the inefficient code.

def TIB(df,column,initial_imbalance,alpha):

weighted_sum_T = 0 
weighted_sum_prob = 0 
df["delta_p"] = df[column].pct_change()
imbalance = initial_imbalance
b_t = np.zeros(len(df["delta_p"]))
b_t[0]=0
indx = []
T=0

for i in range(1,len(df["delta_p"])):
    if df["delta_p"][i] == 0: 
        b_t[i]=b_t[i-1]
    else:
        b_t[i] = np.abs(df["delta_p"][i])/df["delta_p"][i]
    T+=1
    if (abs(b_t.sum())>=imbalance):
        indx.append(i)
        weighted_sum_T = alpha*T + (1-alpha)*weighted_sum_T
        weighted_sum_prob = alpha*sum(x>0 for x in b_t)/(T) + (1 - alpha) * weighted_sum_prob
        imbalance = weighted_sum_T * abs(2*weighted_sum_prob-1)
       
        T = 0
        b_t = np.zeros(len(df["delta_p"]))
  
return indx

Best, Nils

from adv_fin_ml_exercises.

Herrsosa avatar Herrsosa commented on September 25, 2024

from adv_fin_ml_exercises.

pkdmark avatar pkdmark commented on September 25, 2024

I Herossa,

Sorry I accidentally deleted it when I tried to edit a mistake. tldr; I said I wasn't sure the observed imbalance exceeding the expected imbalance was correct. As it should be theta_t > E_0[theta_t].

Could you reason why you set the condition abs(theta-T) >= E[T] * abs(2P[b=1]-1) ? What was theta-T meant to represent (unless it was just a mistake) ?

from adv_fin_ml_exercises.

Herrsosa avatar Herrsosa commented on September 25, 2024

Hi pkdmark,

what I was trying, is to somehow find a solution for the last equation on page 29 (TIB is defined as "T*-continuous subset of ticks such that [this equation] is met").

Earlier on the page it is stated that E_0[theta_t] = E_0T.
As I said, I overlooked the fact that (2P[b=1]-1) may be estimated as exponentially weighted moving average of bt. Therefore, I mistakenly tried to estimate this term.

Sorry if my explanation is not clear and thanks for your feedback. However, maybe I just did not understand the equation correctly.

Best,
Nils

from adv_fin_ml_exercises.

rspadim avatar rspadim commented on September 25, 2024

maybe you are sampling wrong

from adv_fin_ml_exercises.

vgajinov avatar vgajinov commented on September 25, 2024

Regrading what T represents. From the book

"The idea behind tick imbalance bars (TIBs) is to sample bars whenever tick imbalances exceed our expectations. We wish to determine the tick index, T, such that the accumulation of signed ticks (signed according to the tick rule) exceeds a given threshold. "

where tick rule applies to a sequence {bt }t=1,...,T

So T represents the duration of one bar!

from adv_fin_ml_exercises.

vgajinov avatar vgajinov commented on September 25, 2024

@GerardBCN
You have assumed right as it is the same thing. Here is how I see it:

E0 [θT ] =E0 [T](P[bt = 1] − P[bt = −1])
P[bt = 1] + P[bt = −1] = 1 => P[bt = −1] = 1 - P[bt = 1]

==>

E0 [θT ] =E0 [T](2P[bt = 1] − 1)

Where P[bt = 1] = sum(bt=1) / (sum(bt=1) + sum(bt=-1) = sum(bt=1) / T

So EWMA(bt=1) is equivalent to finding EWMA(P(bt=1)).

However, I have interpreted this sentence for the book

" In practice, we can estimate (2P[bt = 1] − 1) as an exponentially weighted moving average of bt
values from prior bars" as

ewma(current_imbalance / current_T)

from adv_fin_ml_exercises.

vgajinov avatar vgajinov commented on September 25, 2024

I also find that the definition in the book has a problem with either exploding or vanishing ewma values which leads to bar duration constantly growing or going down to 1. In the later case you would be basically sampling at every tick. I also found some comments on Quantopian about this problem, albeit without any suggested solution.

I personally have adapted the condition for the bar termination to be

θ_T >= max( E0[T] *abs(2P[bt=1] − 1), initial_imbalance)

where initial imbalance is the value of imbalance I pass to the function (can be tick, volume or dollar)

I wonder if someone has got this right. Please post the solution if you did!

from adv_fin_ml_exercises.

mangleddata avatar mangleddata commented on September 25, 2024

I've been wrestling with this myself. I think that the explanation of TIB given in the book is confused, and conflates between the underlying tick-level data, and the tick imbalance bars we're constructing out of the tick-level data. Hopefully writing my thoughts out here will clarify my confusion and help some other folks.

Like @GerardBCN says above, the point of tick-imbalance bars is to construct bars/"buckets" containing the same amount of information about tick imbalance (i.e., the presence of informed traders) across varying time/amounts of ticks.

First we construct the b_t indicators for each individual tick, showing 1 if the price went up since the previous tick and -1 if the price went down, corresponding to a buyer or seller taking the tick from the market maker.

The price imbalance at tick T is the sum of b_t (tick buyer/seller signs) from 1 to T. This returns a signed int. (Note: it's not specified where our sequence of ticks 1...T starts at -- the beginning of our data series, or the end of the last sampled tick imbalance bar? It can only really make sense as the latter, as far as I can tell.)

We then sample a tick imbalance bar as soon as the (absolute value of the) imbalance at the tick T exceeds the expected imbalance across previously sampled tick imbalance bars. LdP defines the expected imbalance as: the expected number of ticks per TIB * abs(P[b=1] - P[b=-1]). Here, abs(P[b=1] - P[b=-1]) is the percent of the tick bar that should constitute the imbalance. i.e., If we have 100 ticks, the tick imbalance is -10 (45 buy ticks, 55 sell ticks), and abs(0.45 - 0.55) = abs(-0.1) = 0.1, so the expected tick imbalance is 100 * .1 = 10 = abs(imbalance). If our sequence of ticks 1...T has an abs(imbalance) > i.e. 10, we sample a bar.

So, as far as I can tell, the sequence of ticks 1...T is the sequence of ticks starting from the end of the previous bar. The expectations E(T) and E(theta_T) (i.e. expected imbalance), and the probabilities P[b_t=1] and P[b_t=-1] need to be referring to previously sampled tick imbalance bars.

LdP says that we can E(T) by taking the EWMA of the # of ticks per bar across our previous bars. He says that we can find P[b_t=1] "as an exponentially weighted moving average of b_t values from prior bars". Conceptually, this makes sense, since we're trying to find the expected (absolute) % of # of ticks that are imbalanced, which we then multiply by the expected # of ticks to find the quantity Theta_T needs to exceed to trigger a bar sampling. But the tick imbalance bars don't record individual b_ts, so we'd be taking the EWMA over the ticks, unlike with the expected # of ticks per bar where we EWMA over previously sampled bars. We could record the numbers of [b_t=1] and b_t=-1] for each bar, so P[b_t=1] = [b_t=1] / # ticks; but that this point we can just store the actual imbalance per tick bar and take the EWMA over that, right? Doesn't that route us around all the need to calculate expected tick size and imbalance ratios and stuff?

The issue that I run into then is setting a reasonable default parameter for the expected tick imbalance, which as noted above should be washed out by the actual buy/sell dynamics.

I don't know if I feel any less confused, or if this clarified anything for anyone, but I want to stop thinking about this for a while. Generally, I think the results @GerardBCN are getting look correct.

I am reading and re-reading this and find this to be clearer than what I've read in the book. Thanks! Appreciate if you have any code you could share.

from adv_fin_ml_exercises.

BlackArbsCEO avatar BlackArbsCEO commented on September 25, 2024

@mangleddata take a look at the mlfinlab github project. they have notebooks as well as source code for their implementation

from adv_fin_ml_exercises.

chfreundchen avatar chfreundchen commented on September 25, 2024

Another two cents on the topic, regarding stability:
I think the key problem is that the formula in the book takes into account drift, but not variance. To be more precise: Imagine that the probability for an uptick is 0.5. Accordingly, the drift (i.e. average uptick over time) is zero (since half the time you'd expect an uptick, and half the time you'd expect a downtick). It follows that the expected time to hit a barrier theta goes towards infinity as P goes to 0.5.
What is the problem with that? Well imagine the tick sequence as a random walk. Even if the probability to go up is 0.5, the walk will inevitably (i.e., with probability 1) hit plus or minus theta in finite time. Why? Because the random walk does not have zero variance (in which case it would be a straight line with slope P_up-P_down). Once you account for that, you get decent behavior.
So, modifying the original idea, what you could do for TIBs is: (foreach bar)

  1. given past data, calculate the probability of a tick being an uptick, P_up.
  2. from P_up, derive the trend (as before, we will call it m) AND the Variance S, which of course, is a function of the number of ticks.
  3. for your target T (which can still be an EWMA of past Ts), calculate, for example, the barrier theta for which it holds: theta(T,p_up) such that P(s_T-m>XS or s_T+m<XS)>Y, where s_T is the tick imbalance sequence, and X and Y are parameters of your choosing (and may actually be quite closely related, come to think of it).
    In Words: given a probability p_up and an expected time T for your next bar, build a channel around your sequence s_t that slopes with the trend according to p_up, and finish the bar once you break out of the channel.
    See picture: on random data (jagged line) I there is a channel (blue dashed lines) that is sloped based on p_up, and it adjusts based on the trend. Even if p_up is 0.5, you still do not get infinite time-to-barrier.

TIB

from adv_fin_ml_exercises.

mangleddata avatar mangleddata commented on September 25, 2024

@mangleddata take a look at the mlfinlab github project. they have notebooks as well as source code for their implementation

Thanks for the reference. I gave it a try and tried different bar types (different imbalance bars, run bars) - but I am unable to reproduce some of the claims related to autocorrelation I read. A time based bar seems to show less autocorrelation compared to any of these bars (I tried an equity like INTC with every day tick data for a period of 30 days). I will keep trying. If you have any advise, I would be happy to take it. Possible I am missing something if this is not expected.

from adv_fin_ml_exercises.

MaticConradi avatar MaticConradi commented on September 25, 2024

... a channel (blue dashed lines) that is sloped based on p_up, and it adjusts based on the trend ...

@chfreundchen I think what you're suggesting is a pretty good take on this issue. While I agree with the general approach, I have a hard time understanding how would you determine the slope of the channel. You would need to determine a price delta for which the channel acends/decends on each bar, but you didn't touch on how that would be derived from a percentage value.

from adv_fin_ml_exercises.

chfreundchen avatar chfreundchen commented on September 25, 2024

We are still talking about tick bars here, just as in the original example, i.e. we count each downward price move as -1, and each upward price move as +1. Then the slope of the channel is still determined by the expectation, 2p_up -1.
Of course you can also go for bars that incorporate the size of the price change. In that case you calculate the average size of upwards and downwards moves, respectively, let's say s_up and s_down, and then compute the expectation as p_up
s_up+p_down*s_down. You will have to adjust the calculation of the variance in a similar manner.

from adv_fin_ml_exercises.

Taats avatar Taats commented on September 25, 2024

I havent quiet followed the whole conversation but i'm also trying to figure out how to succesfully create imbalanced bars. I've stumbled upon this blogpost from a guy in China but thanks to google translate it became readable. Just wanted to share it with you lot : ) https://cloud.tencent.com/developer/article/1457661

from adv_fin_ml_exercises.

vesl avatar vesl commented on September 25, 2024

@mangleddata take a look at the mlfinlab github project. they have notebooks as well as source code for their implementation

Mlfinlab is working on a batchs of ticks and starting from init param at each batch to reset the ewma.
But its still same problem in my opinion, batch size is pretty arbritrary value.

Probably we should find a way to dynamically set a batch size, or dynamically increase or decrease expected value.

Let me be more precise for second point :
If bar at T-1 were imbalanced on buys, and current bar is imbalanced on sells, then we can say this is a new trend, and its probably not legit to build expected values from previous bars.

from adv_fin_ml_exercises.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.