Hardware Troubleshooting Flashcards
How list hardware seen by computer?
- lspci (list PCI devices)
- lsblk (list block devices - doesn’t include ram devices)
- lscpu (lists CPU info)
- lsscsi (lists SCSI info)
How check status of daemons running?
systemctl status
# systemctl start [name.service] # systemctl stop [name.service] # systemctl restart [name.service] # systemctl reload [name.service] $ systemctl status [name.service] # systemctl is-active [name.service] $ systemctl list-units --type service --
Where look for error messages?
-dmesg (display message/driver message)
dmesg displays messages from kernel ring buffer. Can view messages about specific devices like: #dmesg | grep -i memory # dmesg | grep -i dma # dmesg | grep -i usb # dmesg | grep -i tty Thedmesgmessages are grouped into categories called “facilities.” The list of facilities is:
kern: Kernel messages.
user: User-level messages.
mail: Mail system.
daemon: System daemons.
auth: Security/authorization messages.
syslog: Internal syslogd messages.
lpr: Line printer subsystem.
news: Network news subsystem.
We can askdmesgto filter its output to only show messages in a specific facility. To do so, we must use the-f(facility) option:
sudo dmesg -f daemon
-other logs in /var/log
How find IP/Network status?
- ifconfig
- ip addr show
- ip link
How look at network connectivity issues?
- ping
- traceroute
- nslookup
How test RAM?
- memtest86+ (can be installed using yum)
- memtester
How stress test CPU/RAM?
-“stress” (yum instal from EPEL repo)
What is OpenBMC?
BMC is Baseboard Management Controller. For context, a BMC is a specialized controller embedded in servers. It often comes in the form of a system-on-chip (SoC), with its own CPU, memory, and storage and lots of IO. A BMC connects to sensors to read environmental conditions and to fans to control temperature. It also provides other system management functions, including remote power control, serial over LAN, and monitoring and error logging of the server host CPU and memory.
A utility to check memory controller error detection
- edac-util - The default edac-util report is generated when the program is run without any options. If there are no errors logged by EDAC, this report will display “No errors to report.” to stdout. Otherwise, error counts for each MC, csrow, channel combination with attributed errors are displayed, along with corresponding DIMM labels, if these labels have been registered in sysfs.
What is IPMI?
Intelligent Platform Management Interface. Send to work on top of BMC built into motherboards.
how find make/model of storage drives?
First, you need the device names of your disks, for this you can use df or cat /proc/partitions:
- hdparm -i (device_name) (i.e /dev/sda)
- lshw -class disk
- smartctl -i /dev/sda
- hwinfo –disk
- ls /dev/disk/by-id
how check network card/device status?
- ipconfig
- up link
- cat /proc/net/dev
how find make/model of NIC?
- lspci | egrep -i –color ‘network|ethernet’
- lshw -class network
(other “class” options for lshw:
address
bridge
bus
communication
disk
display
generic
input
memory
multimedia
network
power
printer
processor
storage
system
tape
volume
)