|
The frog rule: Time is fun when you're having flies.
|
|
|
|
|
They would jump at first opportunity.
Monday starts Diarrhea awareness week, runs until Friday!
JaxCoder.com
|
|
|
|
|
I toad you not to post this but you shot me down...
If you can't laugh at yourself - ask me and I will do it for you.
|
|
|
|
|
Not trying to poke the friends here.
But it's amusing to see how few heads get heated up and scream "Peta byte per week?" Heck you cannot work on such huge projects. When you clearly do not know what the project is about.
When I am not sure about the project, how can you be so sure that the project or the data handled cannot be of that size?
My boss was joking about this attitude with the Data scientists whom he was interviewing for the team.
On their CVs, they highlight in bold. The current analytics engine run over "x PBs" of data.
And among the Data Scientists & the Data Engineers, It's a scale of expertise. And among the Data-Engineers , they doubt each other if the other is really working on data of that scale.
This reminded me of people in 2000s. that brag about thread-pools and out-of-proc distributed architectures.
The previous post I made - I had clearly mentioned, the guy whom we interacted with might have said an outrageous figure just to test our "capacity to think" about the architecture.
And he did send us couple of research papers that talked about solving the Peta-byte challenge.
We never know what's the exact sources of these data. Except for the high-resolution videos.
He said "Virtually unlimited storage" on a geographically distributed deployment. Nobody knew what this is about. I posted a question asking if the requirements are insanely high, what's the recommended approach.
Two possibilities I see.
1. You are not interested to answer the question.
2. You are interested (That's why you reply) but you are quite skeptical about the Peta-Byte figure.
A gentlemanish reply I should say[^]
Great manners are learned from the few good. Thanks OriginalGriff.
|
|
|
|
|
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
pestilence [ pes-tl-uh ns ] noun
1. a deadly or virulent epidemic disease. especially bubonic plague.
2. something that is considered harmful, destructive, or evil.
Synonyms: pest, plague, people
|
|
|
|
|
well if its an interview the interviewer can always go technical and ask if all the data was.....
Caveat Emptor.
"Progress doesn't come from early risers – progress is made by lazy men looking for easier ways to do things." Lazarus Long
|
|
|
|
|
I've worked for a few big-data companies, so the numbers didn't particularly worry me -- except that it looked like you might have wanted that much storage on the servers themselves.
I wanna be a eunuchs developer! Pass me a bread knife!
|
|
|
|
|
About a week ago, a representative from the Norwegian Mapping Authority said in an interview that storing their data in some cloud service was not a viable solution: Last year, they had increased their saved data volume by two petabytes. He did not indicate the current total size.
Lots of people - even computer professionals - are rather careless with large units: Copying a megabyte file is almost instantaneous nowadays. A gigabyte file takes a little while, not that much time. The next size, only one higher... If copying a gigabyte takes 85 seconds, copying a terabyte takes a full day and night, 24 hours. A petabyte copied at the same speed would take three years. A thousand days. A coworker of mine, program developer for many years, and an eager photographer and home movie maker, bought an 8 TB disk for his home PC, copying all his video and photo stuff that over the years had been accumulating in his multi-disk NAS. But it woudn't copy... Only when he sat down to make an estimate of how long it "ought to" take (in his case: about two days) did he realize that it was copying fine.
At work, we can have (almost) all the disk space we want ... as long as we do not require backup of it. The cost of a raw disk is a drop in the ocean compared to the cost of backup. Ten years ago, we backed up our build servers, all desktop PCs, as well as a number of central servers. Today, the number of SW developers is a magnitude higher, and the disks are at least a magnitude bigger. So today, anything that can be restored from the VCS, can be rebuild or can be reinstalled from the central software repos, are no longer backed up. What can be considered duplicates, or recoverable information, we cannot afford to back up. The volumes are too immense. But of course: That requires that we have very reliable backup plans for the primary copies, and have working plans and procedures for recovering data when some secondary copy is lost.
|
|
|
|
|
Next time the NSA should ask their question somewhere else.
Social Media - A platform that makes it easier for the crazies to find each other.
Everyone is born right handed. Only the strongest overcome it.
Fight for left-handed rights and hand equality.
|
|
|
|
|
One of our clients is looking for an on-prem solution for their image-intensive application. This would ideally suit the cloud better but they are adamant about on-prem route. So let them pay.
What's the server config you would recommend for a high intensive graphical workload + insanely big storage. Maybe 0.5 petabytes a week. This involves a high degree of media processing.
--I see this 0.5 PB per week became the focus on the discussion. ROFL.
This could be a fact, I'm not sure in how many locations, how many geographies, in what resolution, etc they would like to capture data. It's up to them. He said the word "Virtually unlimited" and gave an example of 0.5 petabytes a week. So maybe he just pushed the limits to see what hardware architecture we might come up with.
So the whole point of this message is to know if the requirements are extreme, and the client still want to go on-prem, what could be a better suggestion - it could be on the architecture or it could be about an alternate, hybrid solution.
What kinda hardware architecture comes to your mind for this requirement.
Key parameters below, + Please add whatever you feel is key to the on-premises servers, clusters.
- CPU has to be extreme
- virtually unlimited disk space
- Heavy-duty GPU
I still feel like pushing them into the cloud.
modified 11-Mar-20 10:16am.
|
|
|
|
|
Nand32 wrote: This involves a high degree of media processing. And you prefer a cloud solution for that?
It may be better to just load up the workstations with memory and lots of cores, and only transfer on save. I can't think of any servers that are particularly optimised for graphical processing, anyway -- they're intended to mainly serve data; doing work on a server is somewhat frowned upon (by me, for one). The typical server graphics card is only really used for KVMs or basic monitors.
The level of storage you require is probably better kept separate from the servers (plural; you'll want two, load balanced, for that 99.999% SLA feeling), but various HP Proliant models can take up to eight discs -- and if you want the "cloud experience", you can install Nunatix.
I wanna be a eunuchs developer! Pass me a bread knife!
|
|
|
|
|
Mark_Wallace wrote: And you prefer a cloud solution for that?
Yep. Like this High Performance Computing | Microsoft Azure[^]
Mark_Wallace wrote: I can't think of any servers that are particularly optimised for graphical processing, anyway -- they're intended to mainly serve data;
Something like this.
NVIDIA GPU Optimized Servers - Thinkmate[^]
The hardware architecture should combine a cluster of nodes that specializes in different departments. I'm no expert in this actually. Looking to hear from you guys. thanks
|
|
|
|
|
Nand32 wrote: Yep. Like this High Performance Computing | Microsoft Azure[^] Hmm.
With something like that, I'd still want a local, heavy-duty server to channel clients into it, so it looks to me like an additional expense.Nand32 wrote: Something like this.
NVIDIA GPU Optimized Servers - Thinkmate[^] Live and learn.
I'd want to do a lot of testing on how it handles load, how it assigns resources to connected clients, and how it handles clients that are asking too much, before considering going live with it. See if they'll lend you one that you fancy, for a PoC.
The first thing I'd want to know is if you've got 20 clients working on a server, and one of them does something that would freeze, lock-up, or crash a workstation, if the work were being done there, what happens to the other 49 clients?
That's easy (and fun!) to test.
I wanna be a eunuchs developer! Pass me a bread knife!
|
|
|
|
|
I think I would recommend the boss to go for server consultants like ThinkMate.
|
|
|
|
|
Mark_Wallace wrote: The first thing I'd want to know is if you've got 20 clients working on a server, and one of them does something that would freeze, lock-up, or crash a workstation, if the work were being done there, what happens to the other 49 clients?
Probably those 30 clients magically appearing out of thin air that crashed the system in the first place... damn those pesky magicians and there mysterious ways.
|
|
|
|
|
musefan wrote: Probably those 30 clients magically appearing out of thin air that crashed the system in the first place I see that you've had solid experience in fault-finding.
(In this case, it's easy, because it's my fault)
I wanna be a eunuchs developer! Pass me a bread knife!
|
|
|
|
|
Mark_Wallace wrote: (In this case, it's easy, because it's my fault)
User error. My experience tells me look at the user first, and then at myself (code) later (or more often not at all).
|
|
|
|
|
Are you sure those numbers are right?
Because 0.5 Petabyte / week needs insane transfer rates: call it 7 gigabit per second upload speed - and that's some serious bandwidth for cloud!
To be honest, those look like numbers somebody plucked out of the air without thinking too much about them ...
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
And that only for continuously transferring it... it needs to be processed too
M.D.V.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.
|
|
|
|
|
OriginalGriff wrote: Because 0.5 Petabyte / week needs insane transfer rates: call it 7 gigabit per second upload speed
It could be true. It could be a cumulative estimate of distributed data. It might not go into a single server. Some of them are media (i.e Hi-res Videos, Image snapshots & mathematical data).
Think of it as data flowing from the customers' facilities through surveillance cameras deployed across the globe.
It's a deep learning project for videos captured in real-time. I guess the client is close to the estimate.
But I'm not sure if they worked out a real math to arrive at this number.
|
|
|
|
|
I'd want to work out the "real math" on that one, and then add a margin on top - you're talking about some networking going on regardless of cloud / inhouse (and really serious for cloud access) and the infrastructure for that is going to be big money ignoring the actual storage / processing hardware. We aren't talking about doing this over a wireless link (or even 5G, though that's technically just about capable in the real world you'll get nothing like that).
I'd treble check the numbers are real before I went any further - a tiny error has major consequences at this kind of scales.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
I'm dialing this company. They usually put us through a questionnaire to gasp the requirement. Gonna link the customer with this company and get it straightened out.
Application-Ready Solutions - Thinkmate[^]
|
|
|
|
|
Right.
But if those numbers are correct...that's why the cloud is a non-starter. Otherwise they'll be blowing their budget on bandwidth, even before getting anything else done.
|
|
|
|
|
Yep. You'll be talking about a 10gigabit EAD leased line, with fibre optic direct to your building. Depending on where you are that'll be expensive. As in "Cheaper to build a new building somewhere else" expensive, I suspect.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
AntiTwitter: @DalekDave is now a follower!
|
|
|
|