Hello! I apologize if this is a topic that has already been covered here (I looked in

Working with multidimensional curves in DLIS files about dlisio HOT 5 CLOSED

lucasblanes commented on August 17, 2024

Working with multidimensional curves in DLIS files

from dlisio.

Comments (5)

achaikou commented on August 17, 2024 1

The frame.curves() function is very slow to use in a loop for.

I think you shouldn't use frame.curves() inside the loop. You should call it just once after user indicated their wish to load any curve, then cache the result all_curves = frame.curves() and use this full all_curves ndarray as a source to populate your 1-D pandas frame if you need to.

Single call to frame.curves() will load all the curves in the frame. Extracting curves separately (like 1-D, 1-D, 1-D...) channel-by-channel with frame.channels[i].curves() is very likely to be much slower:

Due to the memory-layout of dlis-files, reading a single channel from disk and reading the entire frame is almost equally fast. That means reading channels from the same frame one-by-one with this method is way slower than reading the entire frame with Frame.curves() and then indexing on the channels-of-interest.

I think we actually read all the curves together anyway, even if just one was requested, we just return only one curve value.
So one call to frame.curves() seems to be the only option in your case.

from dlisio.

lucasblanes commented on August 17, 2024 1

I did it here and it worked perfectly! I recover all curves from an image log DLIS in 0.62 minutes, compared to 18 minutes previously. Thank you very much!

from dlisio.

lucasblanes commented on August 17, 2024 1

Just for the record for other colleagues, the code I use to extract these curves is as follows:

def summary_curve_all(df_in, frame_in, nan_value=-999.25):
    curves = frame.curves()
    curves = curves.tolist()
    
    curve = []
    all_curves = []
    all_mins   = []
    all_maxs   = []
    all_means  = []
    all_median = []
    
    for i in range(1,len(curves[0])):
        for j in range(len(curves)):
            curve.append(curves[j][i])
        if df_in.loc[i-1,'Unidade'] == 'meters':
            curve_to_append = np.array(curve) * 0.00254
            if 'int' in str(curve_to_append.dtype):
                curve_to_append = np.float64(curve_to_append)
            curve_to_append[curve_to_append == nan_value] = np.nan
            all_curves.append(             curve_to_append)
            all_mins.append(  np.nanmin(   curve_to_append))
            all_maxs.append(  np.nanmax(   curve_to_append))
            all_means.append( np.nanmean(  curve_to_append))
            all_median.append(np.nanmedian(curve_to_append))
        else:
            curve_to_append = np.array(curve)
            if 'int' in str(curve_to_append.dtype):
                curve_to_append = np.float64(curve_to_append)
            curve_to_append[curve_to_append == nan_value] = np.nan
            all_curves.append(             curve_to_append)
            all_mins.append(  np.nanmin(   curve_to_append))
            all_maxs.append(  np.nanmax(   curve_to_append))
            all_means.append( np.nanmean(  curve_to_append))
            all_median.append(np.nanmedian(curve_to_append))
        curve = []
    df_in['Curvas']  = all_curves
    df_in['Mínimo']  = all_mins
    df_in['Máximo']  = all_maxs
    df_in['Média']   = all_means
    df_in['Mediana'] = all_median

from dlisio.

achaikou commented on August 17, 2024

Hi!

No, I don't think this question has been asked before.

You are right, it seems impossible to easily read all multidimensional curves into the same pandas dataframe. From here:

Note that pandas (and CSV) only supports scalar sample values. I.e. frames containing one or more channels that have none-scalar sample values cannot be converted to pandas.DataFrame or CSV directly.

I am not a pandas specialist, so I don't know what are possible workarounds for this.

However you say that numpy array is enough.
frame.curves() already returns numpy.ndarray. Is there something preventing you from using it directly, without any additional conversion?

from dlisio.

lucasblanes commented on August 17, 2024

Thanks for the answer Alena. I am currently being able to save the numpy array that the frame.curves() returns in a pandas dataframe cell:

The problem is that I do this iteratively and this is quite time consuming for DLIS that contain many multidimensional arrays, such as image logs or wireline formation test. My code is this one:

def summary_curve_values(df_in, curve_index='Curvas', unit_index='Unidade', nan_value=-999.25, verbose=True):
    values = []
    mins   = []
    maxs   = []
    means  = []
    median = []
    for i in range(len(df_in)):
        if verbose:
            print('Starting curve ' + str(i+1) + ' of ' + str(len(df_in)) + '.')
        if df_in.loc[i,unit_index] == 'meters':
            curve = df_in.loc[i,curve_index]() * 0.00254
            curve[curve == nan_value] = np.nan
            values.append(curve)
            mins.append(  np.nanmin( curve))
            maxs.append(  np.nanmax( curve))
            means.append( np.nanmean(curve))
            median.append(np.nanmedian(curve))
        else:
            curve = df_in.loc[i,curve_index]()
            if 'int' in str(curve.dtype):
                curve = np.float64(curve)
            curve[curve == nan_value] = np.nan
            values.append(curve)
            mins.append(  np.nanmin( curve))
            maxs.append(  np.nanmax( curve))
            means.append( np.nanmean(curve))
            median.append(np.nanmedian(curve))
    df_in['Curvas']  = values
    df_in['Mínimo']  = mins
    df_in['Máximo']  = maxs
    df_in['Média']   = means
    df_in['Mediana'] = median

A Comment - I first get the frame.curves object in the Dataframe cell and then, if the user wants, I open it through the frame.curves() function. I do this because I first bring the DLIS information to the user and if they want to load the curves, it runs the function above to extract the curves. Thus the code does not spend time if the user does not want to load the curves.

In short, my problem is now related to efficiency. The frame.curves() function is very slow to use in a loop for. I would need code that was faster. Does anyone have an idea of using pd.DataFrame(Frame.curves()) to extract the 1D curves and then run only in the N-D curves to take the multidimensional curves?

from dlisio.

Working with multidimensional curves in DLIS files about dlisio HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs