Red Hat OpenShift bare metal installation protocol

There are several installation options for OpenShift for various cloud providers, and another installation option is on bare metal hardware. This option looks like the basic one for any other installation, and as soon as you familiarize yourself with it, you will manage with any other installation.

The current version of OpenShift only allows online installation. The offline version is promised to be later.

Preparing Infrastructure Server

The bare metal installation requires pre-installing an infrastructure server that will perform DNS and load balancing tasks. The installation reference gives hints about this infra server configuration, but without much details. Thinking thoroughly, it would be nice to make this server a default gateway for the internal network of OpenShift servers. The following configuration plan is the result of my thoughts:

  1. A separate VLAN for OpenShift servers has been created. The infra-server has a second network interface connected to it.
  2. Just because this server is the default gateway for servers behind, it makes NAT for their outgoing requests.
  3. Authoritative DNS server for <cluster>.<base.domain>, mostly for internal purposes, supports Dynamic DNS updates
  4. DHCP server for OpenShift servers, do updates DNS server with leases.
  5. NTP server for OpenShift servers, required external NTP reference
  6. HTTP server serving PXE installations, running on non default port
  7. TFTP server serving PXE installations
  8. A load balancer (haproxy) with required configuration.

Summarizing all these requirements, I developed several Ansible scripts that form the necessary infrastructure server. They should work for any target of the RedHat family, but they are tested only on CentOS7 and come without any support or guarantee. Scripts do supposed that firewall and selinux are disabled.

Edit roles/openshift.infra-server/defaults/main.yaml values to fit your needs, then run:

$ ansible-playbook -i "192.168.0.222," role-openshift.infra-server.yml
where mentioned IP address is for your future infra server. Of course, your server should met usual Ansible prerequisites (python, SSH, user permission).

Alternatively, you can use the hints from the installation guide and complete the full infra server setup manually.

The server is almost ready, we will need the RHCOS installation images located where the PXE configuration expects them:

# mkdir /var/www/html/RHCOS
# wget -O /var/www/html/RHCOS/kernel \
  https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/latest/rhcos-4.3.8-x86_64-installer-kernel-x86_64
# wget -O /var/www/html/RHCOS/initramfs.img \
  https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/latest/rhcos-4.3.8-x86_64-installer-initramfs.x86_64.img
# wget -O /var/www/html/RHCOS/metal.raw.gz \
  https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/latest/rhcos-4.3.8-x86_64-metal.x86_64.raw.gz
# ls -la /var/www/html/RHCOS/
total 862088
drwxr-xr-x 2 root root        61 Apr  5 04:10 .
drwxr-xr-x 3 root root        19 Apr  5 04:05 ..
-rw-r--r-- 1 root root  71105367 Mar 31 13:37 initramfs.img
-rw-r--r-- 1 root root   8106848 Mar 31 13:37 kernel
-rw-r--r-- 1 root root 803561085 Mar 31 13:37 metal.raw.gz

Installation server

Any Linux workstation is suitable for generating configuration files and use the OpenShift CLI. Just because we already have such a server (infra-server), we will use it as an installation server. Create any less privileged user and log in with the infra server.

Use this link to create "pull secret" and save it somewhere. Download "latest" installer, unpack it and move to your PATH directory:

$ mkdir ~/bin
$ ( cd ~/bin && \
  curl -s https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-install-linux.tar.gz | \
  tar zxvf - )

Do the same for the OpenShift CLI. We will need this much later:

$ ( cd ~/bin && \
  curl -s https://mirror.openshift.com/pub/openshift-v4/clients/ocp/stable/openshift-client-linux.tar.gz | \
  tar zxvf - )
$ openshift-install version
openshift-install 4.3.8
built from commit f7a2f7cf9ec3201bb8c9ebb677c05d21c72e3cc5
release image quay.io/openshift-release-dev/ocp-release@sha256:a414f6308db72f88e9d2e95018f0cc4db71c6b12b2ec0f44587488f0a16efc42

Create an install-config.yaml file with sample content taken from the installation guide and modify it as you like. This typically includes adding a proxy definition and fixing the cluster name and base domain name. Please note that this information must match the DNS settings on the infra server.

Do not forget to update the "pull secret" data and your public SSH key. It is advisable to create a separate key for managing OpehShift, because most likely you will have to share the private key with other administrators. Keep a copy of the file as the original file will be deleted during processing.

"Pull secret" is what allows you to retrieve installation and update the data from RedHat. You should care that your subscription will be at least at trial mode.

$ rm -rf ~/work && mkdir ~/work
$ cp ~/install-config.yaml ~/work/
$ openshift-install create manifests --dir ~/work
INFO Consuming Install Config from target directory
WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings

Several kubernetes manifests will appear in the working directory. According to the installation guide, mastersSchedulable must be set to False at work/manifest/cluster-scheduler-02-config.yml file for bare metal installation. Correct this and then convert the manifests to ignition files:

$ vi ~/work/manifests/cluster-scheduler-02-config.yml
$ openshift-install create ignition-configs --dir ~/work
INFO Consuming Master Machines from target directory
INFO Consuming Worker Machines from target directory
INFO Consuming Openshift Manifests from target directory
INFO Consuming Common Manifests from target directory

The previous contents of the working directory will disappear, and several * .ign files will appear instead. The ignition file for RH CoreOS plays the same role as the kickstart for RHEL. Copy them to your PXE server to serve them over HTTP. Make them readable by apache service as they are too protected by default:

$ chmod 644 ~/work/*ign
$ sudo cp -av ~/work/*ign /var/www/html/RHCOS/

NOTE: The generated ignition files include SSL certificates for authenticating each other in the being created cluster. These certificates will expire in the next 24 hours for security reasons. You must complete the installation within this period, otherwise you should repeat the entire procedure, starting with deleting the working directory and creating new ignition files.

Deploying cluster

Although the installation guide recommends that you initally deploy the bootstrap server and then everyone else, my experience has shown that the order does not matter. It turns out even better when the master and worker servers are already deployed and waiting for information from bootstrap.

Then update the /etc/dhcpd.conf.static file on the infra-server with the host name and MAC address of all installed servers, for example:

..
host bootstrap {
        hardware ehternet <MAC-address-of-your-bootstrap-server-in-lower-case>;
	fixed-address XXX.XXX.XXX.XXX ;
	host-name "bootstrap.<cluster-name>.<base.domain>";
	ddns-hostname "bootstrap";
}

host etcd-0 {
	hardware
..

It is important to use fixed IP addresses for all components in the cluster, as some certificates will be signed for that IP address and may stop working after a renewal.

Once updated, restart the DHCP server to accept the changes:

# service dhcpd restart
Redirecting to /bin/systemctl restart dhcpd.service

Boot all designated workers servers using the network boot method, select "Worker" from PXE menu and start their installation.

Boot all designated master servers using the network boot method, select "Master" from PXE menu and start their installation.

Boot the bootstrap server using the network boot method, select "Bootstrap" from PXE menu and install it. This server will be released at the end of the installation, so it can be a temporary virtual server or a server designated to be a worker later.

Follow over the TFTP and apache logs on our infra-server to make sure that PXE installation was successful:

# tail -f /var/log/messages /var/log/httpd/access_log

Our haproxy service depends on DNS information about existing servers. Our DNS is dynamically updated by DHCP, which means that until the "etcd-X" servers are deployed, the information about them is missing. The haproxy service will refuse to start without this. Wait until the servers, being installed, reboots a couple of times until the correct host name appears on their console. This indicates that the dynamic DNS should already be updated if it is configured correctly. Check the entries in the /etc/haproxy/haproxy.cfg file for all the servers in the game and restart the haproxy service.

# vi /etc/haproxy/haproxy.cfg
# service haproxy restart
Redirecting to /bin/systemctl restart haproxy.service

Now you can log in to the bootstrap server and monitor it's actions. The server is accessible only from our infra-server, using a user "core" and a pre-configured SSH key. This user has sudo privileges using NOPASSWD option.

# ssh core@bootstrap
 ..
[core@bootstrap ~]$ journalctl -b -f -u bootkube.service
 ..
			etcdctl failed. Retrying in 5 seconds...

The last message means that the bootstrap has finished to start services and is waiting for the master servers to continue. To make sure that the system is ready to deploy the main servers, you can connect to port 6443 of the infra-server and observe an openshift certificate.

# echo -n | openssl s_client -connect infra-server:6443 2>/dev/null | openssl x509 -noout -text

Another good debugging resource is our haproxy statistics page. Point your browser to http://infra-server:8404/stats to see haproxy works. The masters-config backend should be UP, to make workers and masters able download their configuration and continue with deploy. The masters-api backend should be UP, to form future etcd cluster. You can see servers goes up and down, attached to backends and removed, do not worry, just wait.

The rest of the process is fully automated and not documented at all. This step takes at least 20 minutes, servers will update CoreOS and deploy required software. If you are still watching the process on the boot server, wait until the following appears:

[core@bootstrap ~]$ journalctl -b -f -u bootkube.service
 ..
			bootkube.service complete

Another option is to verify this from the installation server:

$ openshift-install --dir work wait-for bootstrap-complete --log-level=debug
DEBUG OpenShift Installer v4.3.0
DEBUG Built from commit 2055609f95b19322ee6cfdd0bea73399297c4a3e
INFO Waiting up to 30m0s for the Kubernetes API at https://api.osh.example.com:6443...
INFO API v1.16.2 up
INFO Waiting up to 30m0s for bootstrapping to complete...
DEBUG Bootstrap status: complete
INFO It is now safe to remove the bootstrap resources

Both methods reports that the bootstrap process had completed and the bootstrap server can be removed. Shut down the bootstrap server.

Adding more worker servers to cluster

There are several basic cluster operators that run on worker servers, so you need to deploy at least one worker server from the beginning.

If your bootstrap server was designated be a worker server later, update /etc/dhcpd.conf.static with this fact.

 ..
host c2 {
        hardware ethernet f2:c6:d1:c4:65:99;
	fixed-address XXX.XXX.XXX.XXX ;
        option host-name "c2.<cluster-name>.<base.domain>";
        ddns-hostname "c2";
 ..
and restart DHCP service:
# service dhcpd restart
Redirecting to /bin/systemctl restart dhcpd.service

Boot the worker server from the network, select the "Worker" in the PXE menu. Wait until they reboot a couple of times. Again, update HAPROXY configuration about ingress HTTP and HTTPS load balancer, adding new workers servers. Restart it and check it running.

Next step, according to the installation guide, is about connecting to the cluster. Export the kubeadm credentials:

$ export KUBECONFIG=~/work/auth/kubeconfig
$ oc whoam
system:admin

The last message confirms that you can connect to the cluster with administrator privileges.

List an existing nodes:

$ oc get nodes
NAME                     STATUS   ROLES    AGE   VERSION
c0.osh.example.com       Ready    worker   28m   v1.16.2
c1.osh.example.com       Ready    worker   28m   v1.16.2
etcd-0.osh.example.com   Ready    master   28m   v1.16.2
etcd-1.osh.example.com   Ready    master   28m   v1.16.2
etcd-2.osh.example.com   Ready    master   28m   v1.16.2

There is no new c2 worker server added. The next chapter of the installation guide describes this: recently added servers are awaiting administrator approval. Well, some security should be provided. List pending certificate approval:

$ oc get csr
NAME        AGE     REQUESTOR                                                                   CONDITION
 ..
csr-pf8bb   6m12s   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   Pending

If you do not see pending requests, the installation of a new production server is most likely still ongoing.

Then approve it:

$ oc adm certificate approve csr-pf8bb
certificatesigningrequest.certificates.k8s.io/csr-pf8bb approved

The result is the new query, regarding server itself:

$ oc get csr
NAME        AGE     REQUESTOR                                                                   CONDITION
 ..
csr-kcnzl   43s    system:node:c2.osh.example.com                                               Pending

Approve it too. The new worker server added:

$ oc get nodes
NAME                     STATUS   ROLES    AGE     VERSION
c0.osh.example.com       Ready    worker   53m     v1.16.2
c1.osh.example.com       Ready    worker   53m     v1.16.2
c2.osh.example.com       Ready    worker   2m11s   v1.16.2
etcd-0.osh.example.com   Ready    master   53m     v1.16.2
etcd-1.osh.example.com   Ready    master   53m     v1.16.2
etcd-2.osh.example.com   Ready    master   53m     v1.16.2

Finishing and troubleshooting

Good, what next? Check that all cluster operators start:

$ oc get clusteroperators
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                                       Unknown     Unknown       True       23h
cloud-credential                           4.3.8     True        False         False      24h
cluster-autoscaler                         4.3.8     True        False         False      23h
console                                    4.3.8     False       True          False      23h
dns                                        4.3.8     True        False         False      23h
image-registry                             4.3.8     True        False         False      23h
ingress                                    4.3.8     True        False         False      20h
insights                                   4.3.8     True        False         False      24h
kube-apiserver                             4.3.8     True        False         False      23h
kube-controller-manager                    4.3.8     True        False         False      23h
kube-scheduler                             4.3.8     True        False         False      23h
machine-api                                4.3.8     True        False         False      23h
machine-config                             4.3.8     True        False         False      113m
marketplace                                4.3.8     True        False         False      20h
monitoring                                 4.3.8     True        False         False      112m
network                                    4.3.8     True        False         False      24h
node-tuning                                4.3.8     True        False         False      113m
openshift-apiserver                        4.3.8     True        False         False      6h59m
openshift-controller-manager               4.3.8     True        False         False      20h
openshift-samples                          4.3.8     True        False         False      20h
operator-lifecycle-manager                 4.3.8     True        False         False      23h
operator-lifecycle-manager-catalog         4.3.8     True        False         False      23h
operator-lifecycle-manager-packageserver   4.3.8     True        False         False      7h7m
service-ca                                 4.3.8     True        False         False      24h
service-catalog-apiserver                  4.3.8     True        False         False      23h
service-catalog-controller-manager         4.3.8     True        False         False      23h
storage                                    4.3.8     True        False         False      23h
and wait for all operators becomes AVAILABLE. If this does not happen for a while, check the case especially:
$ oc get pods --all-namespaces | grep console
openshift-console-operator         console-operator-644498f9db-hb99k        1/1     Running            5          27h
openshift-console                  console-669ffdcc9f-5fbzj                 0/1     CrashLoopBackOff   297        27h
openshift-console                  console-7cddd989d8-rjhwq                 0/1     Running            301        27h
openshift-console                  console-7cddd989d8-sp2bp                 0/1     Running            300        27h
openshift-console                  downloads-6f4898c5c9-9scbg               1/1     Running            1          27h
openshift-console                  downloads-6f4898c5c9-g2q7l               1/1     Running            3          27h
Then check the logs of this failing operator:
$ oc logs console-7cddd989d8-rjhwq -n openshift-console
 ..
2020/04/6 14:43:24 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.osh.example.com/oauth/token failed: Head https://oauth-openshift.apps.osh.example.com: Service Unavailable

Seams the console operator depends on authentication operator, that also does not avaliable. Lets check it:

$ oc get pods --all-namespaces | grep auth
openshift-authentication-operator     authentication-operator-5954c6c9d-cq72z        1/1     Running            6          27h
openshift-authentication              oauth-openshift-569dfc5dd7-5n9pj               1/1     Running            0          11h
openshift-authentication              oauth-openshift-569dfc5dd7-hsxx8               1/1     Running            1          11h
$ oc logs oauth-openshift-569dfc5dd7-5n9pj -n openshift-authentication
Copying system trust bundle
I0406 03:35:21.357327       1 secure_serving.go:64] Forcing use of http/1.1 only
I0406 03:35:21.357443       1 secure_serving.go:123] Serving securely on [::]:6443

Service looks up, listening on port 6443. Continue digging:

$ oc logs authentication-operator-5954c6c9d-cq72z -n openshift-authentication-operator | grep "^E" | tail -1
E0406 14:46:39.344974       1 controller.go:129] {AuthenticationOperator2 AuthenticationOperator2} failed with: error checking current version: unable to check route health: failed to GET route: Service Unavailable

Something wrong with routes:

$ oc get route --all-namespaces | egrep "NAME|auth"
NAMESPACE                  NAME              HOST/PORT                             PATH   SERVICES            PORT    TERMINATION            WILDCARD
openshift-authentication   oauth-openshift   oauth-openshift.apps.osh.example.com         oauth-openshift     6443    passthrough/Redirect   None
$ oc get endpoints -n openshift-authentication
NAME              ENDPOINTS                           AGE
oauth-openshift   10.130.0.33:6443,10.131.0.22:6443   27h
$ oc get pods -n openshift-authentication -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP            NODE                     NOMINATED NODE   READINESS GATES
oauth-openshift-569dfc5dd7-5n9pj   1/1     Running   0          11h   10.131.0.22   etcd-2.osh.example.com   <none>           <none>
oauth-openshift-569dfc5dd7-hsxx8   1/1     Running   1          11h   10.130.0.33   etcd-0.osh.example.com   <none>           <none>

Very strange status. Authentication pods runs on master nodes while registered as *.apps.cluster.basename, that points on load balancer of workers according to installation guide. That load balancer has no definitions for port 6443, only 80 and 443 (according to the same paper).

In mine configuration, both apps and api IPs are the same, that cause port 6443 serverd by API load balancer and requests were forwarded to API service, and not to apps router.

In addition, the console operator was looking for authentication service on plain HTTPS port of oauth-openshift.apps.osh.example.com. Looks like oauth-openshift route should be fixed and registered on 443 port.

$ oc describe route -n openshift-authentication oauth-openshift
Name:                   oauth-openshift
Namespace:              openshift-authentication
Created:                20 hours ago
Labels:                 app=oauth-openshift
Annotations:            <none>
Requested Host:         oauth-openshift.apps.osh.example.com
                          exposed on router default (host apps.osh.example.com) 20 hours ago
Path:                   <none>
TLS Termination:        passthrough
Insecure Policy:        Redirect
Endpoint Port:          6443

Service:        oauth-openshift
Weight:         100 (100%)
Endpoints:      10.130.0.33:6443, 10.131.0.22:6443

We should change "Endpoint Port" be 443.

$ oc edit route -n openshift-authentication oauth-openshift
$ oc patch route -n openshift-authentication oauth-openshift -p '{"spec":{"port":{"targetPort":"433"}}}'

Nothing help. Although both commands executed without errors, the resulting route remains the same. I am stuck here for today. To be continued...


Updated on Mon Apr 6 20:16:44 IDT 2020 More documentations here