15,939,736 members

See more:

I want to experiment with different interpolation techniques. Some determining factors for choosing the appropriate interpolation techniques are:

1. Density of data (Checking if the data is dense or sparse)

2. Dimensionality of data (High dimensional or low dimensional)

3. Size of the data (for lower computation time)

Since:

-> Radial Basis is good for sparse & high dimensional data

-> Cubic Spline is good for dense data

-> Polynomial Interpolation is better for small and low dimensional datasets

as it can fit accurate lines/curves for functions upto 3 degrees.

My doubts are:

1) Is there a sure shot way of checking the density of the interpolation column's data distribution? Some techniques I have identified are:

1: If the number of missing/empty values is significant (>=50%) the data is sparse.

2: If the range of majority of the data values is small & the standard deviation is small, then the data is considered dense, else, sparse.

3: Visualization using a scatter plot.

2) Is the following concept regarding high & low dimensionality right?

If the number of dimensions/variables is the same as or outnumbers the number of rows, then the data is high dimensional, else, low-dimensional.

3) Can I include a rule in my code that if my data is low dimensional & the number of rows is less than 100,000 then it's a small dataset, else, large?

**What I have tried:**

I gathered the above points after a lot of research. Do let me know if my above points right.

Note: I understand that some of the above points like size of the dataset & differentiating between sparse & dense data distribution are subjective, but I want to know if they are accurate to a decent extent.

1. Density of data (Checking if the data is dense or sparse)

2. Dimensionality of data (High dimensional or low dimensional)

3. Size of the data (for lower computation time)

Since:

-> Radial Basis is good for sparse & high dimensional data

-> Cubic Spline is good for dense data

-> Polynomial Interpolation is better for small and low dimensional datasets

as it can fit accurate lines/curves for functions upto 3 degrees.

My doubts are:

1) Is there a sure shot way of checking the density of the interpolation column's data distribution? Some techniques I have identified are:

1: If the number of missing/empty values is significant (>=50%) the data is sparse.

2: If the range of majority of the data values is small & the standard deviation is small, then the data is considered dense, else, sparse.

3: Visualization using a scatter plot.

2) Is the following concept regarding high & low dimensionality right?

If the number of dimensions/variables is the same as or outnumbers the number of rows, then the data is high dimensional, else, low-dimensional.

3) Can I include a rule in my code that if my data is low dimensional & the number of rows is less than 100,000 then it's a small dataset, else, large?

I gathered the above points after a lot of research. Do let me know if my above points right.

Note: I understand that some of the above points like size of the dataset & differentiating between sparse & dense data distribution are subjective, but I want to know if they are accurate to a decent extent.

Comments

[no name]
17-May-23 12:18pm

You test and then catalog the results from which you can draw future conclusions. Machine learning.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject,
20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8
+1 (416) 849-8900