The Lounge is rated PG. If you're about to post something you wouldn't want your
kid sister to read then don't post it. No flame wars, no abusive conduct, no programming
questions and please don't post ads.
I tried to avoid the Docker hype for a while, as I am not really enthousiastic about Docker for Windows, but as the pressure is mounting I had to give in. Sadly no one seems to realize the amount of work on the Builder side that will be needed, and also to get the images served properly.
Well enough whining for now, I had a look at this overview: https://www.slant.co/topics/2436/~docker-image-private-registries[^]
And to me Harbor looks like an interesting choice, but I would like to hear if anyone has had any experience with it in a Windows environment.
Looking forward to your reaction(s)
I guess I would have stopped at "in a Windows environment." My experience with Hyper-V was horrid. Docker running under a VM in Windows was OK. Neither seems like a useful solution for any kind of problem I can think of, unless the Windows/Docker relationship has moved beyond first base.
“Microservices is a silver bullet, magic pill, instant fix, and can't-go-wrong solution to all of software's problems. In fact, as soon you implement even the basics of microservices all of your dreams come true; you will triple productivity, reach your ideal weight, land your dream job, win the lottery 10 times, and be able to fly, clearly.”
We are in the process of firmly establishing the Docker Registry from docker.com. We are developing low level software, so we mostly do system builds and testing - no orchestration or swarming, no microservices. The reason for using Docker is to keep control over the build tools: We must be able to pick up an old project and rebuild a delivery from two or three years ago, identical to the bit. Using dockerized tools is an element to reach this goal.
We did initial trials on Windows, just to learn what it is, but our IT guys want central servers on Linux (they seem to prefer CentOS, but I think the developers pressured for Ubuntu on the registry server). We already have a small handful of production lines and half a dozen repositories (i.e. image names) using the registry. It seems to be fairly stable.
But: The free version has no access control whatsoever: Any developer can push any self-built garbage image to the server. Young software developers are as rebellious as a teenage son, always trying to ignore rules and sneak around blocks. So we must either switch to the paid version (our budget guys prefer not to), or investigate the open-source Portus[^] solution - our IT guys are evaluating Portus right now.
When deleting, only pointers are deleted. The garbage collector is sort of lazy, you have to wake him up manually (or by an alarm clock). He is rather careless, too: First makes a round to mark what to dispose, then a second round to pick it up. If someone pushes another image between the rounds, saying "But I would like to use that layer!", he may pick it up and dispose the layer in his second round anyway. Their own words are "Stop the world GC" We will set the cron-ometer to Monday morning at 04:00, and all who are supposed to upload images will know to sleep tight Monday mornings, rather than pushing images.
The free version neither provides a web interface to the registry nor a stand-alone UI - not even a command-line version (but you would definitely want a GUI of sorts to overview the registry). For now we are using curl for REST calls ... which is slightly above drawing the bit pattern to send out on the line, but not by much. You can find a number of free front ends at github, but I haven't seen any ready-to-use binaries, and most certainly not for Windows. While I sure can retrieve the source code, set up Linux in a virtual machine, pick up all the build tools required to build the job and run the build, doing that for twelve alternatives is a little cumbersome. I haven't done that yet.
One point that is independent of which registry solution you choose:
In the experimentation phase, images were build without any discipline and order, so except for the Ubuntu base layer, almost every image had its own set of layers. And they were huge - each version tended to be 4-5 gigabytes.
There are two reasons for this: We decided against "One image, one tool" and an "external" build script calling tools in turn. Rather, we put all the tools for a build step into a single container; this makes it much easier to keep track of consistent toolboxes where we know that the various tools' versions go together. The build step is controlled by a bash script running inside the container (it is located in the checked-out source tree, that is mounted in the container at startup). So, images tend to be large (but there are not as many of them).
Second: The experimenting developers seemed to be scared of layers, trying to reduce the number by loading as many tools, as many Python packages and whathaveyou, as possible, in one single build steps for the image. So every layer was different, no common use (except for the Ubuntu base), and disk space requirements were huge, when the tiniest little version update required a complete 5 GB image build from the bottom.
So: We are now establishing a tree structure of base layers: With Ubuntu 18.04 LTS at the bottom, we create an image with a stable set of basic build management tools, common to all build tasks and not expected to change, and we use this "ubuntutools" (rather than the raw Ubuntu) layer as a base to build on. Then we add a fairly stable gcc, and a set of C/C++ related tools to make a "gcc base layer" for the more specialized images to be based on. On the ubuntutools base we also build a Python branch with a fairly large set of pre-installed Python packages (we currently use around 150 of them) and a set of Python tools. Our developers frequently request new packages; then we lay a thin "veneer" layer on top of the common Python layer, adding to the large set already in the base.
The art is in determining which tools are super-stable, and can be put in the lower layers (like Cmake and Ninja - they do come in new versions, but we rarely require the update), medium-stable tools (like gcc - we do not switch to a new release until the old one doesn't work for us), and volatile elements (like python packages under development) that must be placed in the leaf nodes. When we have to update a low or intermediate level layer, the tree must grow a new branch, but we require a documented need for that update before we accept it - a developer's wish to always run "the latest and greatest version" is not sufficient. (In many cases, when the update requirement is for a single component, we can also provide a veneer layer that replaces the version lower layer.)
This structure has a number of benefits:
Using an already-build, complex image as a base reduces (leaf) image build time drastically.
Dockerfile for the top layers are very simple.
A lot of disk space is saved, both in the registry and in the Docker engine.
The layer cache in the engine is used far more efficiently.
Network traffic to retrieve layers/images from the registry is significantly reduced.
When several containers run simultaneously, they will to a much larger degree share code segments in RAM, even if they run different images, when they are built on the same "high level" base image.
Startup times may be somewhat reduced: The probability of a (medium layer) image already being present in RAM increases.
The only significant disadvantage is that to see all the tool versions in your image, you have to nest backwards through multiple levels of base images. We are documenting the entire tree on our intranet, and you can click yourself backwards layer by layer, to get far more information than you could find in a huge "single level" Dockerfile, and certainly in a much more readable format!
But most of all: Enforcing this tree structure helps us keep those unruly developers under control so they don't go wild with plethoras of incompatible tool versions (which is killing to the idea of reproducible builds!).
Thanks, useful information!
Seems my suspicion of the (free) Docker Registry is confirmed, on Slant someone commented:
Biggest CON there is that it cannot control deleting of images properly
Bottom line is this makes docker registry suck when your harddisk fills up at the wrong time and you cannot push out your builds!
Of course this has nothing to do with the "Enterprise Grade Private Docker Registry which seems fine and reasonably priced too.
As far as I can see, even the free version CAN delete images properly, but you have to do a garbage collection (analogous to emptying your thrash bin in Windows) to actually free up the space.
Another detail: When you use the REST API "by hand", deletion requires a SHA that I haven't yet discovered how to read from the registry itself. (Maybe I am expected to locally calculate the SHA of the image manifest - I believe that is what it really is!) So I have to pull the image to the Docker engine, which can provide the SHA I need through "images --digests". I guess I will find a way where I don't have to pull a huge image across the network only because I want to delete it!
Thanks - I wasn't aware of this library, and I will sure make use of it to build a simple Windows GUI. It is very useful to me.
However... It is the answer to another question. This interfaces to the Docker Engine - the "virtual machine" (I do not use this term to start another war, just to explain where we are at!) on which the containers are executed. The Engine comes with a command line interface, both in Windows and Linux distributions. Linux die-hards never want anything else, and can't understand that Windows people find it cumbersome. This library is great for building a more visual interface for those who want something else.
When you use the CLI (or a GUI based on this interface) to start a container from an image not currently available, the Docker engine will pull it from a remote database, a "registry". Unless you have set up your own, you (by default) use the "Docker Hub" registry, at docker.io. Your only involvement in the image retrieval is naming the image you want to run; you do not see the registry as such. You may also explicilty pull an image from a registry, or push an image you have build, yet you need to know just the full image name.
If you want to inspect the registry itself - the Docker Hub, one you have set up yourself or some other registry you have access to - you do not talk to the Docker Engine, but to the remote registry, at another location, and speaking a quite different language. Well, both are REST based, but I was really surprised to see how different two REST APIs can be, in particular considering that they both come out of the Docker community!
The biggest difference is in the description of the protocol; REST does limit your freedom somewhat. But even at the REST level, the choices made are so divergent that to make common UI for the Engine and the registry is like making two separate programs under a single surface. I can't easily see how I can make higher abstractions (/classes) suitable for both protocols. Of course it is "possible", but far from "elegant"; a lot of it will be of no interest to "the other" part.
Maybe I will try to make a "merged" GUI anyway. But we have a number of Linux die-hards here that will become rather grumpy when they are offered a nice GUI only if they jump out of their nice cosy Linux whirlpool and dirty their hands on using a Windows tool. I guess they will stick to their CLI interface to the Engine, and if they need to inspect the registry, they will use curl and type the REST URLs by hand.
If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.