Troubleshoot Power and Disk Issues Flashcards
To diagnose no power symptoms
- Check that other equipment in the area is working—There may be a fault in the power circuit or a wider complete failure of power (a blackout).
- Try plugging another piece of known-good basic electrical equipment, such as a lamp, into the wall socket. If it does not work, the wall socket is faulty. Get an electrician to investigate the fault.
- Check that the PSU cabling is connected to the PC and the wall socket correctly and that all switches are in the “on” position.
- Try another power cable—There may be a problem with the plug or fuse. Check that all the wires are connected to the correct terminals in the plug. Check the fuse resistance with a multimeter or swap with a known good fuse.
- Try disconnecting extra devices, such as a plug-in graphics card. If this solves the problem, either the PSU is underpowered and you need to fit one with a higher wattage rating, or one of the devices is faulty.
- If you can ensure a safe working environment, test the PSU using a multimeter or power supply tester.
You must take appropriate safety measures before testing a live power supply. PC power supplies are NOT user serviceable. Never remove the cover of a power supply.
If you still cannot identify the fault, then the problem is likely to be a faulty motherboard or power supply.
Troubleshoot POST Issues
The POST is a diagnostic program implemented in the system firmware that checks the hardware to ensure the components required to boot the PC are present and functioning correctly.
—Ask what has changed—If the system firmware has been updated and the PC has not booted since, the system firmware update may have failed. Use the reset procedure.
—Check cabling and connections, especially if maintenance work has just been performed on the PC—An incorrectly oriented storage adapter cable or a badly seated adapter card can stop the POST from running. Correct any errors, reseat adapter cards, and then reboot the PC.
—Check for faulty interfaces and devices—It is possible that a faulty adapter card or device is halting the POST. Try removing one device at a time to see if this solves the problem (or remove all non-essential devices, then add them back one by one).
—Check the PSU—Even though the fans are receiving power, there may be a fault that is preventing the power good signal from being sent to the CPU, preventing POST.
—Check for a faulty CPU or system firmware—If possible, replace the CPU chip with a known good one or update the system firmware.
Troubleshoot Boot Issues
Once the POST tests are complete, the firmware searches for devices as specified in the boot sequence. If the first device in the sequence is not found, the system attempts to boot from the next device.
If no boot device is found, the system displays an error message and halts the boot process.
If a fixed disk is not detected at boot, try to check that it is powering up. Drive activity is usually indicated by an LED on the front panel of the system unit case.
Check that data cables are not damaged and that they are correctly connected to the drive.
Troubleshoot Boot Sector Issues
There are two ways of formatting the boot information: MBR and GPT.
—In the legacy master boot record (MBR) scheme, the MBR is in the first sector of the first partition. Partitions allow a single disk device to be divided into multiple logical drives. The first sector contains information about the partitions on the disk plus some code that points to the location of the active boot sector. The boot sector is located either on the sector after the MBR or the first sector of each other partition.
—With the modern globally unique ID (GUID) partition table (GPT) boot scheme, the boot information is not restricted to a single sector but still serves the same basic purpose of identifying partitions and OS boot loaders.
Damage to these records results in boot errors such as “Boot device not found,” “OS not found,” or “Invalid drive specification.” If this problem has been caused by malware, the best way to resolve it is to use the boot disk option in your antivirus software.
If you don’t have the option of using a recovery disk created by the antivirus software, you can try to use the repair options that come with the OS setup disk.
Troubleshoot OS Errors and Crash Screens
—If a boot device is located, the code from the boot sector on the selected device is loaded into memory and takes over from the system firmware.
The boot sector code loads the rest of the operating system files into system memory. Error messages received after this point can usually be attributed to software or device driver problems rather than physical issues with hardware devices.
—Windows system will display a blue screen of death (BSOD) . This typically indicates that there is a system memory fault, a hardware device/driver fault, or corruption of operating system files.
The error should also be written to the System log with BugCheck as the source. The system will generate a memory dump that you can forward for analysis if you have a support contract.
—A blue screen is a Windows proprietary crash screen. A macOS system that suffers catastrophic process failure shows a spinning pinwheel (of death), also called a spinning wait cursor. Linux displays a kernel panic or “Something has gone wrong” message.
Troubleshoot Drive Availability
A hard disk drive (HDD) is most likely to fail due to mechanical problems either in the first few months of operation or after a few years.
A solid-state drive (SSD) is typically more reliable but also has a maximum expected lifetime. With any fixed disk, sudden loss of power can cause damage and/or file corruption, especially if power loss occurs in the middle of a write operation.
When experiencing any drive failure symptoms try to make a data backup and replace the disk as soon as possible to minimize the risk of data loss.
Troubleshoot Drive Availability:
Unusual noise (HDD only)
—A healthy hard disk makes a certain low-level noise when accessing the platters. A loud or grinding noise, or any sort of clicking sound , is a sign of a mechanical problem.
Troubleshoot Drive Availability:
No LED status indicator activity
—If disk activity lights are not active, the whole system might not be receiving power, or the individual disk unit could be faulty.
Troubleshoot Drive Availability:
Constant LED activity
—Constant activity, often referred to as disk thrashing, can be a sign that there is not enough system RAM so that the disk is being used continually for paging (virtual memory). It could also be a sign of a faulty software process or that the system is infected with malware.
Troubleshoot Drive Availability:
Bootable device not found
—If the PC fails to boot from the fixed disk, it is either faulty or there is file corruption.
Troubleshoot Drive Availability:
Missing drives in OS
—If the system boots, but a second fixed disk or removable drive does not appear in tools such as File Explorer or cannot be accessed via the command-line, first check that it has been initialized and formatted with a partition structure and file system. If the disk is not detected by a configuration tool such as Windows Disk Management, suspect that it has a hardware or cable/connector fault.
Troubleshoot Drive Availability:
Read/write failure
—This means that when you are trying to open or save a file, an error message such as “Cannot read from the source disk” is displayed. On an HDD, this is typically caused by bad sectors . A sector can be damaged through power failure or a mechanical fault. If you run a test utility, such as chkdsk, and more bad sectors are located each time the test is run, it is a sign that the disk is about to fail. On an SSD, the cause will be one or more bad blocks. SSD circuitry degrades over the course of many write operations. An SSD is manufactured with “spare” blocks and uses wear leveling routines to compensate for this. If the spare blocks are all used up, the drive firmware will no longer be able to compensate for ones that have failed.
Troubleshoot Drive Availability:
Blue screen of death (BSOD)
—A failing fixed disk and file corruption may cause a particularly severe read/write failure, resulting in a system stop error (a crash screen).
Troubleshoot Drive Reliability and Performance
—Most fixed disks have a self-diagnostic program called Self-Monitoring, Analysis, and Reporting Technology (SMART) . SMART can alert the operating system if a failure is detected. If you suspect that a drive is failing or if you experience performance issues such as extended read/write times , you should try to run more advanced diagnostic tests on the drive. Most fixed disk vendors supply utilities for testing drives, or there may be a system diagnostics program supplied with the computer system.
—You can also use Windows utilities to query SMART and run manual tests. In the case of performance, they can report statistics such as input/output operations per second (IOPS) . If performance is reduced from the vendor’s baseline measurements under test conditions, it is likely that the device itself is faulty.
—Extended read/write times can also occur because particular sectors (HDDs) or blocks (SSDs) fail (go “bad”). Data loss/corruption means that files stored in these locations cannot be opened or simply disappear. When bad sectors or blocks are detected, the disk firmware marks them as unavailable for use.
—If there is file corruption on a hard disk and no backup, you can attempt to recover data from the device using a recovery utility.
File recovery from an SSD is not usually possible without highly specialized tools.
Troubleshoot RAID Failure
Redundant Array of Independent Disks (RAID) is usually configured as a means of protecting data against the risk of a single fixed disk failing. The data is either copied to a second drive (mirroring) or additional information is recorded on multiple drives to enable them to recover from a device failure (parity). RAID can be implemented using hardware controllers or features of the operating system.
There are two main scenarios for RAID failure: failure of a device within the array and failure of the whole array or volume.
If one of the underlying devices fails, the volume will be listed as “degraded,” but the data on the volume will still be accessible and it should continue to function as a boot device if so configured.
If a volume is not available, either more than the tolerated number of disks has failed, or the controller has failed. If the boot volume is affected, then the operating system will not start. If too many disks have failed, you will have to turn to the latest backup or try to use file recovery solutions. If the issue is controller failure, then data on the volume should be recoverable, though there may be file corruption if a write operation was interrupted by the failure. Either install a new controller or import the disks into another system.
If the failure affects the boot process, use the RAID configuration utility to verify its status. If you cannot access the configuration utility, then the controller itself is likely to have failed.