5.1 How to Troubleshoot Flashcards
Change management
- Change control
- A formal process for managing change
- Avoid downtime, confusion, and mistakes
- Corporate policy and procedures
- Nothing changes without the process
- Plan for a change
- Estimate the risk associated with the change
- Have a recovery plan if the change doesn’t work
- Test before making the change
- Document all of this and get approval
- Make the change
Identify the problem
- Information gathering
- Get as many details as possible
- Duplicate the issue, if possible
- Identify symptoms - May be more than a single symptom
- Question users - Your best source of details
- Determine if anything has changed
- Who’s in the wiring closet?
- Approach multiple problems individually
- Break problems into smaller pieces
- Backup everything
- You’re going to make some changes
- You should always have a rollback plan
- What else has changed?
- The user may not be aware
- Environmental changes
- Infrastructure changes
- There may be some clues - Check OS log files
- Applications may have log information
Establish a theory
- Start with the obvious
- Occam’s razor applies
- Consider everything
- Even the not-so-obvious
- Make a list of all possible causes
- Start with the easy theories
- And the least difficult to test
- Research the symptoms
- Internal knowledgebase
- Google searches
Test the theory
- Confirm the theory
- Determine next steps to resolve problem
- Theory didn’t work?
- Re-establish new theory or escalate
- Call an expert
- The theory worked!
- Make a plan…
Create a plan of action
• Build the plan • Correct the issue with a minimum of impact • Some issues can’t be resolved during production hours • Identify potential effects • Every plan can go bad • Have a plan B • And a plan C
Implement the solution
- Fix the issue
- Implement during the change control window
- Escalate as necessary
- You may need help from a 3rd party
Verify full system functionality
- It’s not fixed until it’s really fixed
- The test should be part of your plan
- Have your customer confirm the fix
- Implement preventative measures
- Let’s avoid this issue in the future
Document findings
- It’s not over until you build the knowledge base
- Don’t lose valuable knowledge!
- What action did you take?
- What outcome did it have?
- Consider a formal database
- Help desk case notes
- Searchable database
Unexpected shutdowns
- No warning, black screen
- May have some details in your Event Viewer
- Heat-related issue
- High CPU or graphics, gaming
- Check all fans and heat sinks
- BIOS may show fan status and temperatures
- Failing hardware
- Has anything changed?
- Check Device Manager, run diagnostics
- Could be anything
- Eliminate what’s working
Lockups
- System completely stops
- Completely. Usually not much in the event log
- Similar to unexpected shutdowns
- Check for any activity
- Hard drive, status lights, try Ctrl-Alt-Del
- Update drivers and software patches
- Has this been done recently?
- Low resources
- RAM, storage
- Hardware diagnostics may be helpful
POST (Power On Self Test)
• Test major system components before booting the
operating system
• Main systems (CPU, CMOS, etc.)
• Video
• Memory
• Failures are usually noted with beeps and/or codes
• BIOS versions can differ, check your documentation
• Don’t bother memorizing the beep codes
• They’re all different between manufacturers
• Know what to do when you hear them
POST and boot
• Blank screen on boot • Bad video • Listen for beeps • BIOS configuration issue • BIOS time and setting • Maintained with the motherboard battery • Replace the battery • Attempts to boot to incorrect device • Set boot order in BIOS configuration • Confirm that the startup device has a valid operating system • Check for media in a startup device
Continuous reboots
• How far does the boot go before rebooting?
• BIOS only? OS splash screen?
• Bad driver or configuration
• F8, “Boot from last known working configuration”
• Try F8, Safe Mode
• If system starts, disable automatic restarts
in System Properties
• Bad hardware
• Try removing or replacing devices
• Check connections and reseat
No power
- No power
- No power at the source
- No power from the power supply
- Get out your multimeter
- Fans spin - no power to other devices
- Where is your fan power connected?
- No POST - bad motherboard?
- Case fans have lower voltage requirements
- Check the power supply output
Overheating
- No power
- No power at the source
- No power from the power supply
- Get out your multimeter
- Fans spin - no power to other devices
- Where is your fan power connected?
- No POST - bad motherboard?
- Case fans have lower voltage requirements
- Check the power supply output
Loud noises
- Computers should hum
- Not grind
- Rattling
- Loose components
- Scraping
- Hard drive issues
- Clicking
- Fan problems
- Pop
- Blown capacitor
Intermittent device failure
- Sometimes it works
- Sometimes it doesn’t
- Bad install
- Check and reseat
- Use all the screws
- Bad hardware
- Poor connection
- Heat and vibration
Indicator lights
- POST codes on the motherboard
- Power
- Link light
- Speed
- Activity
Smoke and burning smell
- Electrical problems
- The smoke makes everything work
- Always disconnect power
- There should never be a burned odor
- Locate bad components
- Even after the system has cooled down
- Replace all damaged components
Crash screens
- Windows Stop Error
- Blue Screen of Death - You don’t want this
- Contains important information
- Also written to event log
- Useful when tracking down problems
- Sometimes more useful for manufacturer support
The spinning ball of death
- The Mac OS X Spinning Wait Cursor
- Feedback that something is happening
- The spin starts, but it never stops
- You never get back control of your computer
- Many possible reasons
- Application bug, bad hardware, slow paging to disk
- Restart the computer
- There may be details in the console logs
Log entries
- Windows
- Event Viewer
- Boot logs
- System Configuration
- C:\Windows\ntbtlog.txt
- Linux
- Individual application logs - /var/log
- Mac OS X - Utilities / Console.app
Error messages
• The details of an error message can make or
break a troubleshooting session
• Write down everything
• Take a picture, make a video
• Train your users
• The error might not make sense
• Write it down anyway
• The Internet will tell you what it means
• Spend your time troubleshooting the right things