Click here to Skip to main content
15,885,537 members
Please Sign up or sign in to vote.
2.50/5 (2 votes)
See more:
Hello,

I've searched around but not found an answer to this. Perhaps someone can help?

How come:
VB
Dim PointArray() as String
PointArray = System.IO.File.ReadAllLines(MyFileName)

Works well and is really fast, but I cant find a similar method for doing the same into a List as below:
VB
Dim PointList As New List(Of String)
PointList = System.IO.File.ReadAllLines(MyFileName)

I'm trying to use a List rather than an Array as I believe its faster than an array when handling data and MyFileName contains around 300,000 lines.
Thanks
Posted

This is wrong idea.

First of all, System.IO.File.ReadAllLines, most likely, uses a list internally (or something similar), and then make an array out of it, for more generality and user's convenience. Why? Because the total number of elements is not unknown in advance, before reading, and re-sizing and an array would be too expensive and plain stupid, and because reading a file twice would be as stupid. So, a variable-length collection is used to read a file first. Why to array? Because array represents a fixed-length structure, which is the most adequate representation of a file being read.

Moreover, you only need some non-fixed (non-array) collection if you plan to change its size, add, insert or remove elements. If not, using it would be just a waste. By the way, if your purpose was to remove some unwanted elements from data, it would be a really bad approach. Instead, you could better filter out data while reading file line-by-one, yes, to a list. You are doing something illogical. If you want to modify the data set later on, the most efficient way would be to read line-by-one, as Richard Deeming advised. If not, the array would be the best for you.

Your performance considerations are not based on anything rational.

And finally, if your file is too big to even fit in memory. For such cases, a cunning solution exists: you keep the file open and create some digest from the file kept in memory. Say, it can memorize the file position of every line, or some other unit. On top of that, you organize reading of piece of file on demand, implementing array-like interface using a class indexed property "this".

—SA
 
Share this answer
 
Comments
Richard Deeming 22-Oct-15 15:27pm    
Absolutely right on the implementation of ReadAllLines[^] - it calls InternalReadAllLines[^], which adds each line to a List<T> and then calls the list's ToArray method.

So you'd end up with:

* the List<T>'s internal array, which starts at 4 and doubles every time you run out of space
- for a list of 1,300,000 items, this will take up 8Mb over and above the space for the strings;

* a copy of the populated portion of the array returned from ToArray
- another 5Mb for the 1,300,000 list;

* the internal array for the new List<T> returned from ToList
- this should start with the correct size, as the input implements ICollection<T>, so only another 5Mb;


Using ReadLines().ToList() will only mimic the first step of that, so you'd end up saving 10Mb with one trivial change.
Sergey Alexandrovich Kryukov 22-Oct-15 16:35pm    
Great. Thank you for the confirmation. I think I explained the rationale behind it.
—SA
SheepRustler 22-Oct-15 15:31pm    
Yes Richard, You're absolutely correct - using Readline instead of ReadAllLines made quite a big difference to the amount of memory consumed (well, at least according to Task Manager). Thanks for the tip I'd have never have realised that.

It still takes hours to complete though :(
Sergey Alexandrovich Kryukov 22-Oct-15 16:36pm    
After this confirmation, will you accept this answer formally?
—SA
Sorry, I just discovered the .ToList option

VB
PointList = System.IO.File.ReadAllLines(MyFileName).ToList

works fine....
 
Share this answer
 
Comments
Richard Deeming 22-Oct-15 10:59am    
If you're going to call ToList, it would be better to use the ReadLines method, rather than the ReadAllLines method.

ReadAllLines reads all the lines of the file into an array, resizing as it goes. You then copy the entire array into a new List<T>.

ReadLines uses a lazy iterator which never holds more that one line in memory at once. The ToList method then copies each line into a List<T>, which will use a lot less memory.
SheepRustler 22-Oct-15 11:07am    
Thanks Richard, I had no idea about that but as my files are big that's a very helpful tip.
Thanks for replying.
Sergey Alexandrovich Kryukov 22-Oct-15 15:11pm    
You are right. This "solution", technically correct, still make no sense. I tried to explain it in Solution 2, please see. (Your advise is credited, of course. :-)
—SA
Matt T Heffron 22-Oct-15 12:10pm    
An equivalent is to use the constructor for List(Of T) that takes the IEnumerable(Of T) directly:
Dim PointList = New List(Of String)(System.IO.File.ReadLines(MyFileName))
SheepRustler 22-Oct-15 12:13pm    
OK, makes sense Matt. Is that more efficient in terms of resources too or just visually better?

Quote:
Philippe,

Here's what I have:

a text file containing a (huge) list of 3d points (X,Y,Z) which I call T1, T2, T3 etc....:
-3153.61743,-6115.18164,-3289.31226
-3256.75195,-6075.70801,-3550.83057
-3775.00977,-5870.36035,-3817.91870
-3890.03223,-5819.92871,-3797.58984
-4134.60156,-5724.16309,-3651.39648
....


For each point (called T) I need to calculate if any of the other points (called x1, x2 etc) are
within a cone drawn from the vector of T to an infinite point on that vector. In its 'shadow' if you like.
At the end I need a list of all such points.

Here's what I do:
1) Read the X,Y,Z into three lists (of integers - that's close enough)
2) Calculate the range of each point and its two spherical bearings
3) Sort the list by range so that the closest to origin are at the top of the list
4) Create seperate lists of the X, Y, Z, Range, Bearing#1, Bearing#2 value of every point
5) Run a loop a bit like this:

For every T
For every X (which is T+1)
Calculate vectors from range, bearing#1, bearing #2 for X to see if its in the cone vector
If yes, add to a List of Y
Next X
Next T
Write the Y list to file.


Here's a few things I've tried to speed things up:
1) I determine which 3d quadrant each X is in - if its not in the same as the current T I dont run the vector calcs
2) I tried writing the Y to disk using a Streamwriter, every 10, 100, 1000 points of Y but this was very slow.
3) I tried decreasing the size of the lists using List.RemoveAt(0) each time the T loop was completed, made no difference to speed
4) To be able to monitor progress (albeit slow!) the loop lives in a Background worker with a progress bar and % completed label.

I'm sure there's a clever/fast way to do this, but it escapes me!



You're doing it wrong!

You have to read such of data using OleDb!
The idea is:
1) load data into DataTable[^] using OleDbReader[^]
2) use Linq to DataSet[^] to make calculation

Follow this link[^] to see my past answers.

For further information, please see:
Much ADO About Text Files[^]
Textfile connection strings[^]
HOW TO: Use Jet OLE DB Provider 4.0 to Connect to ISAM Databases[^]
Schema.ini File (Text File Driver)[^]
Using OleDb to Import Text Files (tab, CSV, custom)[^]
How To Open Delimited Text Files Using the Jet Provider's Text IIsam[^]
LINQ to DataSet Examples[^]
Queries in LINQ to DataSet[^]
Querying DataSets (LINQ to DataSet)[^]
 
Share this answer
 
v2
Comments
SheepRustler 22-Oct-15 16:03pm    
Looks interesting, I'll check it out via the links... Thanks
Maciej Los 22-Oct-15 16:05pm    
First of all, check my past answer (which has been updated) to see example code.
Here's some pieces of my thoughts on optimization. I could do more if I knew what the cone-related calculations looked like.
The point is to do as much as possible only once instead of each time through the looping.
(I'm first going to post this in C# because that's how I think, and then I'll add a slightly cleaned up decompilation into VB!)
C#
using System;
using System.Collections.Generic;
using System.IO;
using System.Windows.Media.Media3D;

namespace Stuff
{
  class Program
  {
    static void Main(string[] args){ }
    class Thing : IComparable<Thing>
    {
      private static readonly Vector3DConverter V3Parser = new Vector3DConverter();
      public Thing(Vector3D raw)
      {
        _Raw = raw;
        // Describe the point in Spherical Polar Coordinates
        R = _Raw.Length; 
        // For now, assume R isn't 0
        Theta = Math.Acos(raw.Z / R);
        Phi = Math.Atan2(raw.Y, raw.X);
        // Quadrant: anticlockwise from -pi
        Q = Phi < -Math.PI / 2 ? 0 :
            Phi < 0 ? 1 :
            Phi < Math.PI / 2 ? 2 :
            3;
      }
      public Thing(string textual)
        : this((Vector3D)V3Parser.ConvertFromString(textual))
      { }
      private readonly Vector3D _Raw;
      public Vector3D Raw { get { return _Raw; } }

      public readonly double R;
      public readonly double Theta;
      public readonly double Phi;
      public readonly int Q;

      #region IComparable<Thing> Members
      public int CompareTo(Thing other)
      {
        if (other == null)
          throw new ArgumentNullException("other");
        return R.CompareTo(other.R);
      }
      #endregion

      public override string ToString()
      {
        return V3Parser.ConvertToString(_Raw);
      }

      public static Vector3D operator -(Thing thing1, Thing thing2)
      {
        return thing1._Raw - thing2._Raw;
      }
    }
    static void ConeProblem(string MyFileName)
    {
      List<Thing> AllThings = new List<Thing>();
      foreach (string line in File.ReadLines(MyFileName))
      {
        AllThings.Add(new Thing(line));
      }
      AllThings.Sort();
      List<Thing> SelectedThings = new List<Thing>();
      for (int i = 0; i < AllThings.Count; i++)
      {
        Thing ti = AllThings[i];
        for (int j = i + 1; j < AllThings.Count; j++)
        {
          Thing tj = AllThings[j];
          if (IsInConeVector(ti, tj))
            SelectedThings.Add(tj);
        }
      }
      // Now write out the SelectedThings
    }
    static bool IsInConeVector(Thing t1, Thing t2)
    {
      if (t1.Q != t2.Q)
        return false;
      // You still need to do this (HARD) part.
      return true;
    }
  }
}
VB-ish: (retyped, there may be typos)
VB
Import System
Import System.Collections.Generic
Import System.IO
Import System.Windows.Media.Media3D

Private Class Thing
 Implements IComparable(Of Thing)
 Private Shared ReadOnly V3Parser As Vector3DConverter = New Vector3DConverter
 Public Sub New(ByVla textual As String)
  Me.New(DirectCast(Thing.V3Parser.ConvertFromString(textual), Vector3D))
 End Sub

 Public Sub New(ByVal raw as Vector3D)
  Me._Raw = raw
  ' Describe the point in Spherical Polar Coordinates
  Me.R = raw.Length
  ' For now, assume R isn't 0
  Me.Theta = Math.Acos(raw.Z / Me.R)
  Me.Phi = Math.Atan2(raw, Y, raw.X)
  ' Quadrant: anticlockwise from -pi
  Me.Q = If((Me.Phi < -Math.PI/2), 0,
         If((Me.Phi < 0), 1, 
         If((Me.Phi < Math.PI/2), 2, 3)))
 End Sub

 Public Function CompareTo(ByVal other As Thing) As Integer
  If (other Is Nothing) Then
   Throw New ArgumentNullException("other")
  End If
  Return Me.R.CompareTo(other.R)
 End Function

 Public Shared Operator -(ByVal thing1 As Thing, ByVal thing2 as Thing) As Vector3D
  Return (thing1._Raw - thing2._Raw)
 End Function

 Public Overrides Function ToString() As String
  return Thing.V3Parser.ConvertToString(Me._Raw)
 End Function

 Public ReadOnly Property Raw As Vector3D
  Get
   Return Me._Raw
  End Get
 End Property

 Private ReadOnly _Raw As Vector3D
 Public ReadOnly Phi As Double
 Public ReadOnly Q As Integer
 Public ReadOnly R As Double
 Public ReadOnly Theta As Double
End Class

Private Shared Sub ConeProblem(ByVal MyFileName As String)
 Dim AllThings as New List(Of Thing)
 Dim line as String
 For Each line In File.ReadLines(MyFileName)
  AllThings.Add(New Thing(line))
 Next
 AllThings.Sort()
 Dim SelectedThings As New List(Of Thing)
 Dim i As Integer
 For i = 0 To AllThings.Count - 1
  Dim ti As Thing = AllThings.Item(i)
  Dim j As Integer
  For j = (i + 1) To AllThings.Count - 1
   Dim tj As Thing = AllThings.Item(j)
   If Program.IsInConeVector(ti, tj) Then
    SelectedThings.Add(tj)
   End If
  Next j
 Next i
 ' Now write out the SelectedThings
End Sub

Private Shared Function IsInConeVector(ByVal t1 As Thing, ByVal t2 As Thing) As Boolean
 If (t1.Q <> t2.Q) Then
  Return False
 End If
 ' You still need to do this (HARD) part.
 Return True
End Function

To further optimize performance (if you have multi-cores available), and memory a bit, setup the writing of the SelectedThings in a separate thread (BackgroundWorker) and use (something like) a System.Threading.Concurrent.BlockingCollection to safely pass the Things from selection to being written out.
 
Share this answer
 
v5
Comments
SheepRustler 23-Oct-15 4:05am    
Wow. Thanks Matt, this is going to take me a while to digest....

Here's the Vector and Cone functions I've been using:

Private Function isLyingInCone(Xx As Integer, Xy As Integer, Xz As Integer, Tx As Integer, Ty As Integer, Tz As Integer, Bx As Integer, By As Integer, Bz As Integer, ConeHalfAngle As Double) As [Boolean]
' X= coordinates of point to be tested, T coordinates of apex point of cone, B coordinates of centre of basement circle, ConeHalfAngle in radians

Dim apexToXVect As Integer() = dif(Tx, Ty, Tz, Xx, Xy, Xz) '(X,Y,Z) INTEGER Vector pointing to X point from apex

' Vector pointing from apex to circle-center point.
Dim axisVect As Integer() = dif(Tx, Ty, Tz, Bx, By, Bz) 'B is a vector in line with the origin and T extending out to "infinity and beyond..." :)

' X is lying in cone only if it's lying in an infinite version of its cone -- that is not limited by "round basement".
' Use dotProd() to determine angle between apexToXVect and axis.
' Compare cos() of angles between vectors instead of bare angles.
Dim isInInfiniteCone As [Boolean] = dotProd(apexToXVect, axisVect) / magn(apexToXVect) / magn(axisVect) / ConeHalfAngle

If Not isInInfiniteCone Then 'the point is not in the cone
Return False 'its NOT in the infinite cone
Else
Return True 'its IS in the infinite cone
End If
End Function

'Vector Calcs:
Private Function dotProd(a As Integer(), b As Integer()) As Integer
Return a(0) * b(0) + a(1) * b(1) + a(2) * b(2)
End Function

Private Function dif(Tx As Integer, Ty As Integer, Tz As Integer, Xx As Integer, Xy As Integer, Xz As Integer) As Integer()
'returns an INTEGER array of the difference of each co-ord
Return (New Integer() {Tx - Xx, Ty - Xy, Tz - Xz}) '[X,Y,Z] An array the difference of each INTEGER co-ord in X, Y and Z
End Function

Private Function magn(a As Integer()) As Double
'Returns a single value of the MAGNITUDE of a vector from an INTEGER Array of cords. The SQRT of the sum of Squares
Return CDbl(Math.Sqrt((CLng(a(0)) * CLng(a(0)) + CLng(a(1)) * CLng(a(1)) + CLng(a(2)) * CLng(a(2)))))
End Function

Private Function RectToSphere(ByVal pointA As Point3D) As Point3D
' ----- Convert rectangular 3D coordinates to spherical coordinates.
Dim rho As Double
Dim theta As Double
Dim phi As Double

rho = Math.Sqrt(pointA.X ^ 2 + pointA.Y ^ 2 + pointA.Z ^ 2)
theta = Math.Atan2(pointA.Y, pointA.X)
phi = Math.Acos(pointA.Z / Math.Sqrt(pointA.X ^ 2 + pointA.Y ^ 2 + pointA.Z ^ 2))

Return New Point3D(rho, theta, phi)
End Function

Private Function SphereToRect(ByVal pointA As Point3D) As Point3D
' ----- Convert spherical coordinates to rectangular 3D coordinates.
Dim x As Double
Dim y As Double
Dim z As Double

x = pointA.X * Math.Cos(pointA.Y) * Math.Sin(pointA.Z)
y = pointA.X * Math.Sin(pointA.Y) * Math.Sin(pointA.Z)
z = pointA.X * Math.Cos(pointA.Z)
Return New Point3D(x, y, z)
End Function

Public Class PolarToCartesian
Property X As Double
Property Y As Double
Sub New(Radius As Double, AngleDegree As Double)
Dim AngleRadian As Double = AngleDegree * 0.017453292519943295 '2 * Math.PI / 360
X = Radius * Math.Cos(AngleRadian)
Y = Radius * Math.Sin(AngleRadian)
End Sub
End Class

Public Class CartesianToPolar
Property Radius As Double
Property AngleRadian As Double
ReadOnly Property AngleDegree As Double
Get
Return AngleRadian * 57.295779513082323 '360 / (2 * Math.PI), quicker than doing maths
End Get
End Property

Sub New(X As Double, Y As Double)
Radius = (X ^ 2 + Y ^ 2) ^ 0.5
AngleRadian = Math.Atan2(Y, X)
End Sub
End Class
SheepRustler 23-Oct-15 4:18am    
...and these

Public Class Point3D
Public X As Double
Public Y As Double
Public Z As Double

Public Sub New(ByVal xPoint As Double, ByVal yPoint As Double, ByVal zPoint As Double)
Me.X = xPoint
Me.Y = yPoint
Me.Z = zPoint
End Sub

Public Overrides Function Tostring() As String
Return X & "," & Y & "," & Z
End Function
End Class
SheepRustler 23-Oct-15 6:58am    
Here's my loops

'ListTx ListTy & ListTz are the co-ords of each point (three seperate lists)
'ListBx, ListBy & ListBz are the co-ords of direction vector cone base (three lists)
'ListTQuad is a list of the quadrants each point is in
'ListTRange is the range of eac point
'Thres is a constant
'ShadowList is a list of co-ords that meet the IsLyingInCone test

For J = 0 To NumPoints - 1 'or each ROW point in the cloud array - treat as a potential obstruction - ** T **
If BackgroundWorker1.CancellationPending Then
e.Cancel = True ' Set Cancel to True
Exit For
End If

'Now we have the coords of T and B, see if X is inside the cone
For K = J + 1 To NumPoints - 1 'for every point in the cloud BELOW this one see if it is a shadow - ** X **
If ListTQuad(K) = ListTQuad(J) Then 'in the same quad so continue seeing if its a shadow
Dim XTRange As Integer = ListTRange(K) - ListTRange(J) ''now set the occlusion angle according to the range
'now set the occlusion angle according to the range between X & T - long range use 5 deg(fast), short range use 30 deg(slow)
Dim myAngle As Double = If(XTRange < Thresh, NearAngle, FarAngle) 'Define the angle to use based on range, doubles needed as angle is in rads
Try
'do the calculation for each point here
'isLyingInCone (Xx, Xy,Xz (point to be examined), Tx, Ty, Tz(Shadow),Bx,By,Bz (Cone Base Cords),Cone Angle)
Dim Tj = New Integer() {ListTx(J), ListTy(J), ListTz(J)}
Dim Bj = New Integer() {ListBx(J), ListBy(J), ListBz(J)}
Dim Xk = New Integer() {ListTx(K), ListTy(K), ListTy(K)}
If isLyingInCone(Xk, Tj, Bj, myAngle) = True Then 'IN the cone, IN shadow, put in the SHADOW array
ShadowList.Add(ListTx(K) & "," & ListTy(K) & "," & ListTz(K))
'ShadowWriter.WriteLine(ListTx(K) & "," & ListTy(K) & "," & ListTz(K), True) 'TRUE=APPEND write shadows.
End If
Catch ex As Exception
ErrList = ErrList & ex.Message & "IsLyingInCone : Line " & ListSx.Count & vbCrLf
Exit Try
End Try
End If 'The same quadrant
Next 'K
I fear your question is wrong because it is not your real problem.
After reading your comments, the whole feeling is that you try to optimize steps of a procedure you built. Even if tempting, it is the wrong approach.
The problem I see is that you have a naive approach that degrade with the size of dataset. Your approach scales like O(n)= n^2 or worst.

Rather than describing what you do in your solution, you should describe the original problem with the reason of the problem. And let us come with a solution that will be efficient.

Handling huge 2D/3D dataset require to use some unusual techniques (specialised techniques), the right combination on techniques depend on what you need to do exactly.

The Algorithms forum http://www.codeproject.com/Forums/326859/Algorithms.aspx[^] may be more fit for such a question.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900