Click here to Skip to main content
15,886,774 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Hey Guys,

I am receiving telemetry data at the rate of about 50 records/second.
Each record is variable length, with an average length of about 80 characters.
As you can see, I do not have a lot of time to run a binary classification prediction on each record.

I am building a tabular dataset that has six feature columns, and a binary response column (0, 1).
All features have floating point values, and there is no missing data.
All the features should be positive, but this is necessary, and not sufficient condition.

I am trying to filter out the obviously bad records, so I can avoid sending this data to the classification code.

What I am trying to do is determine the minimum values of the six features (f1, f2, ... f6), and use this as a filter.

To calculate of all six features requires about 400 milliseconds.
 
I have a cost function, C(f1,f2,f3,f4,f5,f6), that computes the actual (ground truth) values for each set of features, but it is compute-intensive (about four seconds).

My question is: How can I calculate the minimum values for f1, f2, ... f6, such that they maximize the prediction accuracy, so I can use these values as a threshold filter?

Charles


What I have tried:

I have only tried to guess at the minimum values.
Posted
Comments
[no name] 8-Sep-20 23:14pm    
Do a plot of the frequency distribution.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900