|
0.5 petabtyes per week, and you want to process and store it in the cloud
ok on a 10 gigabit link (which you won't have) it'll take around 937 days to upload 0.5 petabytes
yes, you would need 937 days per week to upload the weeks data
-- The fastest link currently available is 18.2Gb (South Korea), still looking at >400 days
-- 5g promises 100Gb, that's >90 days per week
your figures and information are just ridiculous, just pathetic.
clearly whoever is coming up with those is totally clueless
and you want to put it on the cloud or the clown???
don't care you crossed it out,
asking people for advice?
provide real and proper information, not this bogus crap
BINGO THAT!
pestilence [ pes-tl-uh ns ] noun
1. a deadly or virulent epidemic disease. especially bubonic plague.
2. something that is considered harmful, destructive, or evil.
Synonyms: pest, plague, people
|
|
|
|
|
I never thought that a Petabyte word could have troubled someone so much.
Please Ignore the message if it did not interest you. Just like I'm doing for your message now.
And go home and have a chill beer on my name
Bingo that?
|
|
|
|
|
Maybe I am doing the numbers wrong but I get 4.6 days to transfer 0.5 PB at 10Gb/s.
0.5 PB = 500,000 GB
10Gb = 1.25 GB
500,000 / 1.25 = 400,000 (seconds)
400,000 / 86400 = 4.63 days
|
|
|
|
|
Are your 10 Gb/s GigaBytes? Or GigaBits?
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
10 Gb = 10 Gbits (hence small b)
1.25 GB = 10 GBytes (hence big B)
|
|
|
|
|
I now realize the capitalization in your previous message... (it was late at night, when I wrote)
but I think your comparisons in this message are mixing them up as I I did yesterday
musefan wrote: 1.25 GB = 10 GBytes (hence big B)
I think you wanted to say
1,25 GB (big B) = 10 Gb (small b)
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
I was Solution Architect a couple of years ago for a system that stored 2PB per week of video and telemetry data from self driving car development.
You are correct that that is a large amount of data, roughly 3.6 GB/s if you think of it as a constant stream. In addition to the challenge of simply storing it, it also has to be simultaneously backed up and analysed.
Data arrive at the data center from the many cars in the field not via cables but on SSD based cartridges that have to be read in via reader stations. The cartridges are then returned to the field to collect more data.
Storage for production and "backup" is provided by hundreds of Dell-EMC Isilon NAS nodes. Analysis of the data is done in more than 300 servers each with 32 cores and 0.5 TB of RAM.
It sounds mind blowing but there really are companies collecting, storing and processing data on that scale.
Andy
|
|
|
|
|
musefan wrote: That would probably make Youtube one of the most important bits of software in the universe
And everybody knows that's FarceBok.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I concur. I developed a system for a hospital that recorded all activity from around 70 cameras 24/7 and it never went above needing about 800GB. Certainly less than a terabyte, never mind petabytes!
- I would love to change the world, but they won’t give me the source code.
|
|
|
|
|
Is the application 3rd party? If so, ask them for recommendations on how to balance workload across machines. Most applications that have high processing requirements will support this kind of shared workload scenario.
If it's in house, then the developers should have a pretty big say in the best way to maximise performance.
Any serious amounts of image processing are best done across multiple machines, but requires the application to support it.
Storage is a separate issue, so design it as so, you only really need to make sure the connection between processing machine(s) and storage machine(s) is fast enough to keep up. Other than that, it's 2 separate requirements.
|
|
|
|
|
musefan wrote: Is the application 3rd party?
The client is a subsidiary to one of the top Oil & Gas industry. They want to work with us for building the application. They've hired people from AMD on their side. I guess this is just for the hardware department. & They also own the AI/ML teams. We are just focusing on the application that collects data.
Now most probably, as I've updated on my OP, the data seems to be fairly huge. But the intent of the contact person from this company looks to be testing our capacity. He's watching if we'd run away looking at the scale of the application. We did not run, because we don't know what it means to handle Petabytes of data.
|
|
|
|
|
Considering the data requirements you need I suggest the following storage system:
1 - transport layer[^]
2 - storage[^] (note hack-proof encryption in progress)
Ravings en masse^ |
---|
"The difference between genius and stupidity is that genius has its limits." - Albert Einstein | "If you are searching for perfection in others, then you seek disappointment. If you seek perfection in yourself, then you will find failure." - Balboos HaGadol Mar 2010 |
|
|
|
|
|
I have the same system in place for work emails, and can confirm it is very effective
|
|
|
|
|
Load test it in the cloud, then buy a server with 2x the capacity of the cloud one to cover additional workload growth.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason?
Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful?
--Zachris Topelius
Training a telescope on one’s own belly button will only reveal lint. You like that? You go right on staring at it. I prefer looking at galaxies.
-- Sarah Hoyt
|
|
|
|
|
The CERN experiments produce 1PB/second of data, which is reduced to 1PB/day for storage (CERN Data Centre passes the 200-petabyte milestone | CERN). This allows them to store the "interesting" results out of 1 billion collision events/second. Are you telling us that your DP and image processing needs are 10% of CERN's?
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
They are not really comparable though are they? CERN decided to discard 99.99% of their data. OP may not choose to discard any of theirs.
Plus, we cannot assume both sets of software are as efficient as they could be. OP software could be using really bad compression (or maybe none at all).
I just don't really understand why everyone is trying to argue the quantity of data. It's not even close to being an impossible amount (given current technology). Also, maybe the numbers are estimates for 5 years from now. You wouldn't want a system that only works for a week would you...
|
|
|
|
|
OK, here's my first attempt at analysis:
- It is just possible to handle this amount of data with a dedicated 10 Gbps connection (the actual data rate is 6.6 Gbps), but once you take into account framing, collisions, etc., it looks very iffy.
[Probably have multiple systems receiving the data] - The interfaces (NVMe, etc.) can handle this data rate, but building a storage system that can handle this sort of sustained write rate is non-trivial.
[Probably use multiple disks running in parallel] - Once you have the data stored locally, you must read it off the storage at the same rate (otherwise you will eventually run out of space), process it, and store it somewhere else.
[The initial processing of this much data would presumably require a massively parallel system, with all the communication and synchronization issues that this entails. Have at least one primary processing node for each receiving system] - How will secondary, tertiary, etc. processing be done?
[Whether you have one secondary processor for one or more primary processors or vice versa depends on the amount of data and the processing required. Again, we have synchronization and communication issues] - Presentation of the results?
[Presumably requires that the results of the processing be sent to a single node. Synchronization, communication issues...]
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
There are so many questions yet to be answered before jumping in to what computer I should buy. Questions like,
- Does all the data from all around the World end up in single data center?
- Does this data center do all the processing?
- Is this really needed or can processing be distributed around the World?
- Sure at some point of time you may need all your data in one location for some kind of analysis. But does this have to be real time? Do you need "raw" data or processed data from remote servers can work fine?
I can think of more if I spend some more time on it.
"It is easy to decipher extraterrestrial signals after deciphering Javascript and VB6 themselves.", ISanti[ ^]
|
|
|
|
|
Well, you obviously need to go distributed and pre-crunch the data locally, before sending it to regional servers for analysis.
|
|
|
|
|
Jörgen Andersson wrote: Well, you obviously need to go distributed and pre-crunch the data locally, before sending it to regional servers for analysis.
I don't know why, but it almost sounds like you are a robot trying to describe the process of eating
|
|
|
|
|
AMD Threadripper 3990x's and 6GB/s tape drives.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it.
― Confucian Analects: Rules of Confucius about his food
|
|
|
|
|
You could look at Dell EMC Isilon for the storage. I worked on a system for an automotive company a couple of years ago where they were collecting and analysing 2PB per week of video and telemetry for self driving car development.
The Isilon storage is NAS and modular so that you can add to clusters as the requirements grow. It is quite an interesting challenge because at 2PB per week you have a constant data input stream of, on average, 3.6 GB/s that has to be stored, next to that backup has to be made, and of course users must be able to access the system for data analysis runs. That's a lot of parallel data movement.
Networking is also a challenge, the initial system for 13PB had over one hundred storage nodes each with 40 Gb/s front end networking ports to connect to the server farm. The system also has its own private network that supports striping data across nodes for availability and protection from failures.
I was the solution architect for the system. It was one of my last projects before I retired from EMC in 2018.
|
|
|
|
|
valuable inputs. thanks a lot AndyChisholm
|
|
|
|
|
As it's Cheltenham week
Include me ? with Horsy racy maybe a rich way of running ? (11)
"We can't stop here - this is bat country" - Hunter S Thompson - RIP
|
|
|
|
|
I have no idea where to even start...
|
|
|
|