Click here to Skip to main content
15,888,984 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I want to store the different lengths data from different files in a 2D array.

DataFile:
data1.txt
1,2,3,4

data2.txt
1,2

data3.txt
1,2,65,7,8,9,0,5,4,8,3,43
....

Array:
x=numpy.array([])

x[number_of_File][data]:
x[0]----output data1
x[2]----output data2
....

What I have tried:

I can use the list to implement this function as follow,
Python
x = []
try:
    nameOfPath=["data1.txt","data2.txt","data3.txt",.....] # Names of pathFile can be defined in a document with .txt format
    for each_item in nameOfPath:
        with open(each_item, "r") as dataFile:
            x1 = []
            for each_line in dataFile:
                x1.append(each_line.split(","))
            x.extend(x1)    

except IOError as e:
    print(e)


however, I want to implement this function with the numpy.array.
I try it many time, it does not work.

For example:
I used np.vstack
Python
x = np.vstack([x, x1])


the Error shows that
all the input array dimensions except for the concatenation axis must match exactly


the array x and x1 dimensions are not same.

So, if I still want to use the numpy.array, how should do to implement it?

thank you!

Email: gz.geophysics@outlook.com
Posted
Updated 17-Dec-17 2:19am
Comments
Richard MacCutchan 10-Dec-17 7:36am    
The error message is clearly telling you that you cannot do it if the dimensions of the individual arrays are different. You could create some simple arrays first with the data, then normalise them to the length of the longest.
Zhang, G. 10-Dec-17 18:20pm    
In fact, the list can accomplish this function.
however, the numpy.array has more attribute and functions to use. so I still want use the array to store the data.
Richard MacCutchan 11-Dec-17 2:57am    
Fine, but you must still follow the rules.

I have quickly skimmed your code and noticed the following things, You are reading a text file and splitting with "," I think you should be using CSV file instead of .txt file which will be easier for processing.

Since you didn't specify what you are trying to achieve in this code I made few assumptions.
I see some words like DataFile, Data, etc., so I guess you are performing some data analysis, In that case you should go for pandas to read the data frames rather processing it on your own since there are various other concerns like pre-processing etc.

Here is an example file of yours,
DataFile:
data1.txt
1,2,3,4


I have converted this into csv named data.csv

Here is the code:

Python
import pandas as pd
import numpy as np

if __name__ == "__main__":
    data_frame = pd.read_csv("data.csv", header=None) # since you didn't specify header in your question
    np_array = data_frame.iloc[:, :].values # [:, :] => [rows, columns]
    print(np_array)  # print the numpy array
    print(np_array.ndim)  # print the dimension


which will output this,

[[1 2 3 4]]
2  => dimension


Refere more here:
numpy.ndarray.ndim — NumPy v1.13 Manual[^]

And one more thing, numpy has a method named "reshape" which can reshape the array for the dimensions you want.
 
Share this answer
 
v2
Comments
Zhang, G. 10-Dec-17 17:51pm    
I mean to use the numpy.array to store a two-dimensional data, the first dim store the file or line number and the second dim store the data. And the data in each file or each line has different sum number.

another example:
data.csv
1, 2, 3, 4, 5
1, 2, 4
2, 3, 5, 9, 10, 2, 3, 4, 7
1, 2
....

I want to use an array x[lineNum][data] to store them
len(x[0])=5
len(x[1])=3
len(x[2])=9
len(x[3])=2
....

in the code, we usually use the numpy.vstack(x, item) to add the data of each line
to the array.
but the vstack requires the data of each line should be the same length,
it will works under condition that the len(x[0])=len(x[1])=len(x[2])=...=len(x[N])

How to use the array to accomplish this function?
I find a solution to accomplish this functions.

examples\data1

1   2   3

examples\data2

4   5   6   7   8   9   10

examples\data3

11  12

Python
x = []
try:
    nameOfPath = ["examples\data1", "examples\data2",
                  "examples\data3"]  # Names of pathFile can be defined in a document with .txt format
    for each_item in nameOfPath:
        with open(each_item, "r") as dataFile:
            x1 = []
            for each_line in dataFile:
                x1.append(each_line.split(","))
            x.extend(x1)

except IOError as e:
    print(e)

import numpy as np

np_x = np.array(x)
print(len(np_x))
print(np_x)


output:

3
[['1   2   3']
 ['4   5   6   7   8   9   10']
 ['11  12']]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900