GPFS recipes

Installing software

The software comes in RPM format suits for installing on RedHat 7 (client could be installed on 6 also). All files packed in one selfextracting java driven archive, which in turn packed as tar.gz. Still some archive technologies not used in process, there is a place for improvement.

Phase one. Unpacking sofware

Unpack TAR file, run selfextracting file and accept license restrictions. Number RPMs will be unpacked in /usr/lpp/mmfs/X.X.X.x directory.

Phase two. Installing software

# cd /usr/lpp/mmfs/4.2.3.0/gpfs_rpms
# yum localinstall gpfs.msg.en_US*.rpm gpfs.gpl*.rpm gpfs.docs*.rpm \
	gpfs.gskit*.rpm gpfs.ext*.rpm gpfs.adv*.rpm gpfs.base*.rpm

These RPMs require some dependencies, thus yum using preferrable

A lot of binaries installed into /usr/lpp/mmfs/bin, add it to PATH:

# echo 'export PATH=$PATH:/usr/lpp/mmfs/bin' > /etc/profile.d/mm.sh
# exit

Phase three. Compiling kernel modules

Run command mmbuildgpl -v. The command will fail if development packages are missing on your system. Read last line of error message, it will summarize missings, then install them:

# yum install kernel-devel make cpp gcc gcc-c++ kernel-headers
 ..
# mmbuildgpl -v
 ..

DNS have to be configured well, othervice fill /etc/hosts with all known data. NTP (or chronyd) should be configured, time have to be synchronized.

Creating cluster

Exchange SSH keys so that user "root" be able login passwordless even on same node. GPFS tools doing SSH to every node despite it runs on local system.

[root@gpfs1 ~]# mmcrcluster -N gpfs1.local:manager-quorum -C yoda.local --ccr-enable -A
mmcrcluster: Performing preliminary node verification ...
mmcrcluster: Processing quorum and other critical nodes ...
The authenticity of host 'gpfs1.local (192.168.122.190)' can't be established.
ECDSA key fingerprint is 65:38:d9:b4:a1:05:d9:6c:b3:87:61:33:4c:40:62:85.
Are you sure you want to continue connecting (yes/no)? yes
mmcrcluster: Finalizing the cluster data structures ...
mmcrcluster: Command successfully completed
mmcrcluster: Warning: Not all nodes have proper GPFS license designations.
    Use the mmchlicense command to designate licenses as needed.

Where gpfs1.local is hostname of first cluster member, manager-quorum is designed role for it, yoda.local is very original name for new cluster. If suggested cluster name does not contain dots ., then first node FQDN will be added to it. Usually it is not looks nice, use something with dots.

[root@gpfs1 ~]# mmlscluster 

===============================================================================
| Warning:                                                                    |
|   This cluster contains nodes that do not have a proper GPFS license        |
|   designation.  This violates the terms of the GPFS licensing agreement.    |
|   Use the mmchlicense command and assign the appropriate GPFS licenses      |
|   to each of the nodes in the cluster.  For more information about GPFS     |
|   license designation, see the Concepts, Planning, and Installation Guide.  |
===============================================================================


GPFS cluster information
========================
  GPFS cluster name:         yoda.local
  GPFS cluster id:           3885883280435585769
  GPFS UID domain:           yoda.local
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name  IP address       Admin node name  Designation
-----------------------------------------------------------------------
   1   gpfs1.local       192.168.122.190  gpfs1.local      quorum-manager

Apply license to node by:

[root@gpfs1 ~]# mmchlicense server --accept -N gpfs1.local

The following nodes will be designated as possessing server licenses:
        gpfs1.local
mmchlicense: Command successfully completed

A server keyword may be replaced to client on client systems.

Add another node (or client) to cluster

[root@gpfs1 ~]# mmaddnode -N gpfs2.local:quorum-manager
Mon Apr 24 12:13:47 EDT 2017: mmaddnode: Processing node gpfs2.local
The authenticity of host 'gpfs2.local (192.168.122.195)' can't be established.
ECDSA key fingerprint is 65:38:d9:b4:a1:05:d9:6c:b3:87:61:33:4c:40:62:85.
Are you sure you want to continue connecting (yes/no)? yes
Failed to retrieve current node list (err 823)
mmremote: The CCR environment could not be initialized on node gpfs2.local.
mmaddnode: The CCR environment could not be initialized on node gpfs2.local.
mmaddnode: mmaddnode quitting.  None of the specified nodes are valid.
mmaddnode: Command failed. Examine previous error messages to determine cause.

The default RedHat installation have firewall installed and enabled, that cause this error. Disable firewall by removing firewalld package from both nodes. Disable selinux too to eliminate surpizes.

[root@gpfs1 ~]# rpm -qa | grep firew
firewalld-0.3.9-14.el7.noarch
[root@gpfs1 ~]# rpm -e firewalld-0.3.9-14.el7.noarch
[root@gpfs1 ~]# ssh gpfs2 rpm -e firewalld-0.3.9-14.el7.noarch
[root@gpfs1 ~]# mmaddnode -N gpfs2.local:quorum-manager
Mon Apr 24 12:19:32 EDT 2017: mmaddnode: Processing node gpfs2.local
mmaddnode: Command successfully completed
mmaddnode: Warning: Not all nodes have proper GPFS license designations.
    Use the mmchlicense command to designate licenses as needed.
mmaddnode: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@gpfs1 ~]# mmchlicense server --accept -N gpfs2.local

The following nodes will be designated as possessing server licenses:
        gpfs2.local
mmchlicense: Command successfully completed
mmchlicense: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

Adding GPFS client to cluster performed exactly by the same way exactly. The difference is that you should not define client -> clusert root passwordless connection, only from cluster to client. The second is designated role should be client. The third difference is a license type, of course.

Remove node from cluster

Move shared resources to other nodes or destroy them before removing node from cluster.

[root@gpfs1 ~]# mmdelnode -N gpfs2.local
Verifying GPFS is stopped on all affected nodes ...
mmdelnode: Command successfully completed

You should do this command on other than being removed cluster member, the trusted connection between nodes will closed and transaction may be not completed. The -a option will remove all nodes from cluster, then remove cluster data from itself (cluster will be destroyed).

Cluster status and startup

[root@gpfs1 ~]# mmlscluster 

GPFS cluster information
========================
  GPFS cluster name:         yoda.local
  GPFS cluster id:           3885883280435591273
  GPFS UID domain:           yoda.local
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name  IP address       Admin node name  Designation
-----------------------------------------------------------------------
   1   gpfs1.local       192.168.122.190  gpfs1.local      quorum-manager

[root@gpfs1 ~]# mmgetstate 

 Node number  Node name        GPFS state 
------------------------------------------
       1      gpfs1            down
[root@gpfs1 ~]# mmstartup -a
Mon Apr 24 13:59:31 EDT 2017: mmstartup: Starting GPFS ...
gpfs1.local:  /tmp/mmfs has been created successfully.
[root@gpfs1 ~]# mmgetstate 

 Node number  Node name        GPFS state 
------------------------------------------
       1      gpfs1            arbitrating

Storage subsystem

Create NSD (Network Shared Disk)

Think NSD a disk attached to NSD server. If LUN comes from storage and connected to more than one NSD server, no need to configure redundancy. Otherwice, you should think attached disk as JBOD and configure amount of copies more than three.

Create NSD definition (stanza) file, you will use it number of times. Find self-describing name for it and keep it for reference.

I have a non used partition on my disk /dev/vda3 to be used as NSD disk:

[root@gpfs1 ~]# cat nsd1.stanza
%nsd: device=/dev/vda3 nsd=nsd1 servers=gpfs1.local 
[root@gpfs1 ~]# mmcrnsd -F nsd1.stanza 
mmcrnsd: Processing disk vda3
[root@gpfs1 ~]# mmlsnsd

 File system   Disk name    NSD servers                                    
---------------------------------------------------------------------------
 (free disk)   nsd1         gpfs1.local

Create file system

File system is an aggregation of NSDs to put filesets on it. It can be used diractly without filesets too. You can think it as aggregate in NetApp terms.

[root@gpfs1 ~]# mmcrfs fs0 -F nsd1.stanza -A yes -T /export

The following disks of fs0 will be formatted on node gpfs1.local:
    nsd1: size 13052 MB
Formatting file system ...
Disks up to size 129 GB can be added to storage pool system.
Creating Inode File
  33 % complete on Mon Apr 24 14:01:33 2017
  66 % complete on Mon Apr 24 14:01:38 2017
  97 % complete on Mon Apr 24 14:01:43 2017
 100 % complete on Mon Apr 24 14:01:44 2017
Creating Allocation Maps
Creating Log Files
  53 % complete on Mon Apr 24 14:01:49 2017
 100 % complete on Mon Apr 24 14:01:51 2017
Clearing Inode Allocation Map
Clearing Block Allocation Map
Formatting Allocation Map for storage pool system
Completed creation of file system /dev/fs0.
[root@gpfs1 ~]# mmmount /export
Tue Apr 25 17:01:52 EDT 2017: mmmount: Mounting file systems ...
[root@gpfs1 ~]# df -hP /export 
Filesystem      Size  Used Avail Use% Mounted on
fs0              13G  717M   13G   6% /export

Create fileset and mount

Fileset (in NetApp terms) is similar to Q-tree with some features of volumes.

[root@gpfs1 ~]# mmcrfileset fs0 home --inode-space new 
Fileset home created with id 1 root inode 131075.
[root@gpfs1 ~]# ll /export/
total 0
[root@gpfs1 ~]# mmlinkfileset fs0 home -J /export/home
Fileset home linked at /export/home
[root@gpfs1 ~]# ll /export/
total 0
drwx------ 2 root root 4096 Apr 25 17:02 home

However, df do not show nothing mounted on /export/home, therefore have the habbit for command:

[root@gpfs1 ~]# mmlsfileset fs0
Filesets in file system 'fs0':
Name                     Status    Path                                    
root                     Linked    /export                                 
home                     Linked    /export/home 

Creating trust connection between clusters and remote GPFS mount

Remote (cache) cluster

I've build similar one-node cluster as:

[root@gpfs2 ~]# mmlscluster 

GPFS cluster information
========================
  GPFS cluster name:         luk.local
  GPFS cluster id:           106684897476624261
  GPFS UID domain:           luk.local
  Remote shell command:      /usr/bin/ssh
  Remote file copy command:  /usr/bin/scp
  Repository type:           CCR

 Node  Daemon node name  IP address       Admin node name  Designation
-----------------------------------------------------------------------
   1   gpfs2.local       192.168.122.195  gpfs2.local      quorum-manager
[root@gpfs2 ~]# mmlsnsd

 File system   Disk name    NSD servers                                    
---------------------------------------------------------------------------
 fs0           nsd1         gpfs2.local              

[root@gpfs2 ~]# mmlsfileset fs0
Filesets in file system 'fs0':
Name                     Status    Path                                    
root                     Linked    /export 

Exchange cluster's public keys

Copy cluster's public key to each other as in example:

[root@gpfs1 ~]# scp gpfs2:/var/mmfs/ssl/id_rsa.pub /var/mmfs/ssl/luk.local.pub
id_rsa.pub                                             100% 1773     1.7KB/s   00:00    
[root@gpfs1 ~]# scp /var/mmfs/ssl/id_rsa.pub gpfs2:/var/mmfs/ssl/yoda.local.pub
id_rsa.pub                                             100% 1779     1.7KB/s   00:00    

Authorize commands on master

Now authorize remote cluster (luk.local) to mount GPFS from yoda.local:

[root@gpfs1 ~]# mmauth add luk.local -k /var/mmfs/ssl/luk.local.pub 
mmauth: Command successfully completed
[root@gpfs1 ~]# mmauth grant luk.local -f fs0

mmauth: Granting cluster luk.local access to file system fs0:
        access type rw; root credentials will not be remapped.

mmauth: Command successfully completed
[root@gpfs1 ~]# mmauth show
Cluster name:        luk.local
Cipher list:         AUTHONLY
SHA digest:          67e927b2ce7b669b8297d8053b7fb4e7971c00df13dc9f9eef23d39b2be52cf1
File system access:  fs0       (rw, root allowed)

Cluster name:        yoda.local (this cluster)
Cipher list:         AUTHONLY
SHA digest:          3429410232c636a693429f4fbe516d2a59abcc67bd5b77775b5f95f33ea58066
File system access:  (all rw)

Configure master on remote

[root@gpfs2 ~]# mmremotecluster add yoda.local -k /var/mmfs/ssl/yoda.local.pub -n gpfs1.local
mmremotecluster: Command successfully completed

Mount foreign GPFS

[root@gpfs2 ~]# mmremotefs add rfs0 -f fs0 -C yoda.local -A yes -T /yoda  
[root@gpfs2 ~]# mmremotefs show all
Local Name  Remote Name  Cluster name       Mount Point        Mount Options    Automount  Drive  Priority
rfs0        fs0          yoda.local         /yoda              rw               yes          -        0
[root@gpfs2 ras]# mmmount rfs0
Wed Apr 26 02:22:56 EDT 2017: mmmount: Mounting file systems ...
[root@gpfs2 ras]# df -hP /yoda 
Filesystem      Size  Used Avail Use% Mounted on
rfs0             13G  717M   13G   6% /yoda
[root@gpfs2 ras]# ls /yoda/
home

AFM connection

A remote file system should be mounted on caching cluster. A role gateway should be assigned to node mounting remote FS. It is gpfs2 node in our example:

[root@gpfs2 ~]# mmchnode --gateway -N gpfs2
[root@gpfs2 ~]# mmlscluster
 ..
 Node  Daemon node name  IP address       Admin node name  Designation
-----------------------------------------------------------------------
   1   gpfs2.local       192.168.122.195  gpfs2.local      quorum-manager-gateway

Now do mount of remote file system on it. Create the fileset that refer to that remote mount:

[root@gpfs2 ~]# mmcrfileset fs0 home -p afmtarget=gpfs:///yoda/home -p afmmode=iw --inode-space=new
Fileset home created with id 1 root inode 131075.
[root@gpfs2 ~]# mmlinkfileset fs0 home -J /export/home
Fileset home linked at /export/home
[root@gpfs2 ~]# mmafmctl fs0 getstate -j home
Fileset Name  Fileset Target     Cache State    Gateway Node  Queue Length  Queue numExec 
------------  --------------     -------------  ------------  ------------  ------------- 
home          gpfs:///yoda/home  Active         gpfs2         0             2486

Updated on Sun Apr 30 10:57:46 IDT 2017 More documentations here