Click here to Skip to main content
15,886,362 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am new in python. I am trying to predict the "time_to_failure" for given "acoustic_data" in the test CSV file using catboost algorithm.

Python
def catbostregtest(X_train, y_train):   
    # submission format
    submission = pd.read_csv('sample_submission.csv', index_col='seg_id')
    X_test = pd.DataFrame()
    # prepare test data
    for seg_id in submission.index:
        seg = pd.read_csv('test/' + seg_id + '.csv')
        ch = gen_features(seg['acoustic_data'])
        X_test = X_test.append(ch, ignore_index=True)
    # model of choice here
    model = CatBoostRegressor(iterations=10000, loss_function='MAE', boosting_type='Ordered')
    model.fit(X_train, y_train)
    y_hat = model.predict(X_test)    #error line
    # write submission file LSTM
    submission['time_to_failure'] = y_hat
    submission.to_csv('submissionCAT.csv')
    print(model.best_score_)

This function "catbostregtest" is giving me error with the errorlog
Traceback (most recent call last):<br />
      <br />
        File "E:\dir\code.py", line 290, in main()<br />
        <br />
        File "E:\dir\code.py", line 230, in main catbostregtest(X_train, y_train)<br />
        <br />
        File "E:\dir\code.py", line 175, in catbostregtest y_hat = model.predict(X_test)<br />
        <br />
        File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\catboost\core.py", line 4365, in predict return self._predict(data, "RawFormulaVal", ntree_start, ntree_end, thread_count, verbose, 'predict')<br />
        <br />
        File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\catboost\core.py", line 1854, in _predict predictions = self._base_predict(data, prediction_type, ntree_start, ntree_end, thread_count, verbose)<br />
        <br />
        File "C:\Users\xyz\AppData\Local\Continuum\anaconda3\lib\site-packages\catboost\core.py", line 1271, in _base_predict return self._object._base_predict(pool, prediction_type, ntree_start, ntree_end, thread_count, verbose)<br />
        <br />
        File "_catboost.pyx", line 4015, in _catboost._CatBoost._base_predict<br />
        <br />
        File "_catboost.pyx", line 4020, in _catboost._CatBoost._base_predict<br />
        <br />
        CatBoostError: c:/goagent/pipelines/buildmaster/catboost.git/catboost/libs/data/model_dataset_compatibility.cpp:236: Feature 0 from pool must be mean.


This is gen_features function
Python
def gen_features(X):
    strain = []
    strain.append(X.mean())
    strain.append(X.std())
    strain.append(X.min())
    strain.append(X.max())
    strain.append(X.kurtosis())
    strain.append(X.skew())
    strain.append(np.quantile(X,0.01))
    strain.append(np.quantile(X,0.05))
    strain.append(np.quantile(X,0.95))
    strain.append(np.quantile(X,0.99))
    strain.append(np.abs(X).max())
    strain.append(np.abs(X).mean())
    strain.append(np.abs(X).std())
    return pd.Series(strain)


This function is called from the main function

Python
def main(): 
      train1 = pd.read_csv('train.csv', iterator=True, chunksize=150_000, dtype={'acoustic_data': np.int16, 'time_to_failure': np.float64})
      X_train = pd.DataFrame()
      y_train = pd.Series()
      for df in train1:
          ch = gen_features(df['acoustic_data'])
          X_train = X_train.append(ch, ignore_index=True)
          y_train = y_train.append(pd.Series(df['time_to_failure'].values[-1]))
  catbostregtest(X_train, y_train)


Here is the structure of the train.csv file
train — ImgBB[^]
Here is the structure of the sample_submission.csv file
submittion — ImgBB[^]
Here is the structure of one of the test(csv) file.
test — ImgBB[^]
How I can remove the error that occur during making predict from catboost model?
How I can remove this error please help. You can download and run the project in spyder ide from this link Link

What I have tried:

I have tried all procedure on these links
Usage examples - CatBoost. Documentation[^]
python - Catboost Regression. Function Extrapolation - Stack Overflow[^]
Posted
Updated 12-Mar-20 2:04am
v3
Comments
Richard MacCutchan 12-Mar-20 5:00am    
" CatBoostError: c:/goagent/pipelines/buildmaster/catboost.git/catboost/libs/data/model_dataset_compatibility.cpp:236: Feature 0 from pool must be mean."

That is the key error message.
Member 8840306 12-Mar-20 5:20am    
Which is the error line in code? What does this "key error message" shows ?
Richard MacCutchan 12-Mar-20 5:22am    
Look at the error messages, they give file and line number.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900