Unfortunately, Novell Netware is pretty much dead as an operating system. These pages will not be updated anymore, but will be retained for a while for the benefit of the faithful who continue to use this excellent operating system.
Novell did create the Open Enterprise Server, a SUSE Linux based OS that runs most of the old NetWare server functions.

Netware Clustered Servers

Netware Clustering Concepts

Novell introduced server clustering in NetWare 5 then enhanced it in NetWare 6. This section discusses Novell Cluster Services 1.6 from a storage perspective.

A cluster is a group of file servers, servers are often called nodes in Novell documentation. A Netware 6 cluster contains between 2 and 32 servers. All servers in the cluster must be configured with IP and be on the same IP subnet. All servers in the cluster must be in the same NDS tree and the NDS tree must be replicated on at least two, but not more than six servers in the cluster. NetWare 5 and NetWare 6 clusters can coexist in the same NDS tree. Each server must have at least one local disk device for the SYS: volume, you normally connect your data disks to a cluster using a SAN.

Clustering allows services to survive the failure of a server. Any disks that were mounted on the failed server are switched to one of the other servers in the cluster. Any applications which were active, or users who were logged onto the failed server are switched to another server. This is called failover and all users typically regain access to their resources in seconds, with no loss of data and usually without having to log in again.
It is also possible to manually invoke a failover if you need to bring down a server for maintenance or a hardware upgrade.

Novell Cluster Services 1.6 consists of a number of management modules or NLMs. The storage related modules are -

  • The CLSTRLIB or Cluster Configuration Library stores the NDS cluster data. The first activated node in the cluster uses CLSTRLIB to access NDS eDirectory and becomes the master node for the cluster. CLSTRLIB sends NDS cluster data to all cluster nodes.
  • It is the Cluster Resource Manager (CRM) that is responsible for failover of resources after a failure. To do this, CRM needs to track all the cluster's resources and where they are running.The policies on how failover should happen are held in the NDS.
  • The Cluster Volume Broker (CVB) keeps track of the NSS configuration of the storage pools and logical volumes for the cluster. If a change is made to NSS for one server, the CVB ensures that the change is replicated across all the nodes in the cluster. The CVB also looks after data integrity. It will veto conflicting operations and enforce the rule that only a single server can access a pool at a time.
  • The Cluster System Services (CSS) module looks after data integrity issues for cluster aware applications that share distributed memory or locks. This basically ensures that storage pools are only active on one node at a time.
  • The Split Brain Detector (SBD) is really nothing to do with storage, but is far to good a name to ignore. Each server in the cluster sends out a heartbeat signal every minute, to say ‘I'm Alive’. If Server1 stops sending its heartbeat, the other servers in the cluster know that Server1 is dead, and start to take over its resources. What happens if Server1 just loses its Network connection? Server1 cannot hear the other servers, and starts to take over their resources. The other servers cannot hear Server1, and start to take over its resources, and the whole thing would end up in a mess, except that the Split Brain Detector steps in, marks Server1 out of service, and lets the rest of the cluster take over.

There are 6 other cluster management NLMs which are not discussed here

Cluster commands

You manage the cluster with Cluster commands from the system console. You can see the full list of cluster commands by typing

HELP CLUSTER

At the console

Some useful commands are

CLUSTER VIEW

Displays the current node, and a list of nodes, i.e. servers.

CLUSTER RESOURCES

Displays the list of resources managed by the cluster, and which node has ownership of which resource

You can force a resource to move to a different node with the command

CLUSTER MIGRATE resource-name node-name

Netware provides a few screens to monitor the cluster operations.
The Logger screen displays loaded NLMs and NSS operations like enforcement of directory quotas
The Cluster Resources screen displays volume mount and dismount messages

Volumes and Pooling

Pools

Storage Pools are containers for logical volumes. A Cluster Services pool is simply an area of storage space created from the available Netware partitions. With NSS3.0, these can be virtual partitions, and can support a mixture of NSS and non-NSS volumes. A Storage pool must be either all local, or all shared.

A Shared storage pool can only be in use by one cluster node at a time to ensure data integrity. Data corruption would most likely occur if two or more nodes had access to the same shared storage pool simultaneously. This is managed by the Cluster System Services NLM.

Failover in Cluster Services 1.6 is by Storage Pool, whereas Netware 5 did failover by volume. If a shared storage pool is active on a node when the node fails, the cluster automatically migrates the pool to another node. The clustering software reactivates the pool and remounts the cluster-enabled logical volumes within that pool.

Cluster Volumes

Inside the storage pools are the logical volumes. The volumes are only visible and accessible when the pool is active. As the logical volumes have no hard size limit they can request more space from the storage pool as needed. They hold the files and folders for users and applications.

When you define a pool with its volumes, you have to cluster-enable all the volumes. This creates a virtual server for each cluster-enabled volume, with its own server name and IP address. Applications and users access the volume through the virtual server name and the IP address of the virtual volume. This means that if the hosting server fails, and the volume fails over to another server, clients are not affected, and the IP address of the shared disk does not change. This is illustrated in the picture below.

Netware Cluster with 2 disks

In Netware 5.1 the virtual server name was generated by the system, and has the format NDStreename-diskname-server. DNS could not understand the underscores in the name, so the IP addresses had to be hardcoded. Netware 6 removes this restriction, you can override the default name with a name that DNS can understand.

Logical volumes have a new attributed called Flush On Close, which simply means that when a file is closed the cache is flushed to disk. This means that when you close a file, you can be confident that the data is safely stored on disk, and is not sitting in cache. If a server fails, any data resident in cache will be lost. Flush On Close is set 'ON' on the server, and will have some performance overhead.

NDS, Trustee IDs and GUIDs

The NDS information which is used to identify, name and track all Netware objects is stored by the CLSTRLIB NLM. Netware 5 had a problem with file control on SAN systems, as some NDS information was not transferred when a volume was migrated between servers.
The issue was that the trustee IDs for each user object were different for each server. On failover, it took several minutes to scan the entire file system and translate the trustee IDs to the new server, so the file trustee IDs were usually not translated at failover. The result was that disk space and directory space restrictions were not preserved.

In NetWare 6, server-linked trustee IDs are replaced with Globally Unique IDs (GUID), which are the same across all servers where the user has trustee rights of any kind. Volumes can now failover in seconds, and all trustee rights are preserved.

Backups had a similar problem. A file had to be restored to the same server it was backed up from, or trustee IDs would not match and the file could be corrupted. With NetWare 6 and NCS 1.6 any file can be backed up from any server and restored by any server without file corruption. The GUID remains intact, along with the appropriate user restrictions, regardless of physical server used for the backup and restore operation.