books

books

The concept of full-stack was pronounced dead at the close of 2019.
2020 Stack is the new name for the new generation. Follow ⚪⚪⚪

windrose

Administration layer

System stack

"If it works don't touch it" only applies if you have a good system administrator. This is because computers are never static: log files fill up, limits are exceeded, new stuff gets added without old stuff getting removed. Cruft happens.

Good computer administration requires a deep understanding of how the system behaves under normal conditions, so that troubleshooting under exceptional conditions can be done against a strong baseline. There is no magic to getting that baseline, it requires regular attention to all the small details. For example, what are the typical profiles for internals like CPU usage, memory consumption, network traffic, disk I/O, and disk usage; and what are the externals like computer room temperature, air filtration, equipment age, cabling fidelity, and power integrity.

Essential tasks

Layers

Five-layers of system software for operations and management:

Four essential tasks that are delegated to the admin layer are: real time monitoring, log inspection, backups, and baseline documentation.

Real time monitoring

Real time monitoring should be done at a frequency interval that matches the dynamic flux of the system. High traffic systems, and systems serving several disparate needs, require more frequent monitoring; and low traffic systems, or systems dedicated to single purpose tasks (like a dedicated mail server, database server, or DNS server) needing less frequent monitoring. Of course the nature of the monitoring is entirely dependent on the jobs running on the system, as each will have their own monitoring tools.

Log inspection

Log inspection can be done less frequently than real time monitoring. But just as you do with backups, choose a time period that will enable you to catch errors, trap security violations, adjust database indexes, and throttle network hogs before they impact your users.

Backups

Backups should be done with explicit goals in mind. Meeting these goals will determine both the frequency of backups and the backup methodology.

Media

Choose disk, tape or other media to match the size requirements of your backups, the cost of the media (consumable cost or electric power costs), and whether or not an attendant needs to be present to mount the media. Full and incremental backups are often the dividing line between tape and disk.

Protection goals

Choose the backup disk location and frequency based on which protection goal is sought. Use these definitions for the four backup disk locations described here:

  1. QuickSave. Backup from active directory structure to reserved backup location on the same disk.
  2. Near-line. Backup to another computer's disk.
  3. Off-line. Backup to removable media.
  4. Archive. Backup to permanent read-only media.

Here are some backup goals to consider:

Frequency and rotation

Here is one example of a frequency and rotation policy that meets various goals.

Baseline documentation

In the course of good administration, exceptional events will occur. These should be recorded in the system administrators notebook. This is especially useful when an extended period of time has been spent tracking down a problem and resolving it: document your solution using a wiki or similar tool so that you (or another administrator) can quickly recognize and solve the problem again in the future.

Troubleshooting

When problems occur, system administrators need to look to the four tools just outlined for help with troubleshooting. Problems are not solved by other people, they are solved by the system administrator, and the task is always easier when the baseline system is well understood, the logging systems are operational, the backups are up-to-date, and the documentation is in place.