This guide is designed to help with trouble shooting and diagnosing common problems with the PR2. For more detailed information, check the PR2 manual, or the wiki at ros.org.

If you are looking for an overview of PR2 click here.

If you are looking for a general (non technical FAQ) on PR2 click here.

1. General

1.1. Runstop Lights

If you see the red light on your wireless runstop blink, that means the runstop is running low on batteries. Replace the batteries with four AA batteries. You'll have to open the runstop case with a small regular screw driver.

2. Sensors

2.1. Prosilica Camera

Unlike almost every other driver on the PR2, the Prosilica driver isn't robust to a packet drop or lost link during operation. We're hoping we can port the camera to the driver_base framework before dturtle, #3420.

If the camera node drops out, you can restart the node manually. You could also put "respawn=true" for the camera node in the launch file that you start the robot with.

However, that's only going to work if you can ping the camera. Try to ping it at "10.68.0.20" and if that doesn't work, file a ticket at WG Support.

It may be helpful to check if the Prosilica camera lights are on. Press the runstop to disable the PR2's motors. Open the "top hatch". The Prosilica camera is on the far right side of the head. You should see two lights near the ethernet jack.

  • Orange - Power
  • Green - Ethernet connection

Report the color of both lights when filing a ticket at Willow Garage support.

3. Computers

3.1. High Load / SIRQ Problem

The PR2 computers can report high load averages and high CPU utilization during normal robot operation. When processes like "sirq-net-rx" report high CPU usage, we call this the "SIRQ Problem". The ticket #3920 contains some of the symptoms and debug information.

If you see this problem, try resetting the wge100_camera drivers on the PR2.

  • Open the dynamic_reconfigure GUI.
  • Select the "pr2_camera_synchronizer" node.
  • Click the "Reset" button at the bottom of the frame to reset the cameras.

3.2. Not responding / Kernel panic

The PR2 runs two computers on a real-time (RT) kernel, on the same network as 7 ethernet cameras and multiple other sensors and devices. Occasionally, the computers on the PR2 lock up, hang, or otherwise don't respond. When we've seen this problem, it has usually been precipitated by high load on one or both CPU's, starting/stopping processes, or lots of network traffic (from a filesystem or cameras).

To mitigate this problem:

  • Throttle down the rates of the cameras if you're not using them. Use the dynamic_reconfigure system to set the default frame rates. This will not only reduce network traffic, but will reduce CPU load from your perception system.
  • "Turn off" or "hibernate" processes when not in use. If your robot is stationary, don't update the costmap.

If this problem occurs, you can let us know by filing a ticket at pr2support.willowgarage.com. If you can, provide a detailed description of what happened before the computers locked up.

We're experimenting with different kernels for the robots, and we hope to find long-term fixes for these problems.

3.3. Turning Off Computers

PR2's running Ubuntu 9.04 (Jaunty) will turn off the red server lights after running "sudo pr2-shutdown". PR2's running Ubuntu 10.04 (Lucid) do not turn off the red lights on shutdown. After you hear the PR2 give the shutdown beeps, you can safely power off the PR2 60 seconds later.

4. Diagnostics / Console Messages

4.1. Computers: Incorrect number of CPU cores

PR2 computers can boot without all 8 CPU cores becoming enabled. Willow Garage is working on this problem (see #43). If this happens, you will see an error message in the diagnostics saying "Incorrect number of CPU cores". pr2-systemcheck will also show an error.

If your PR2 does not have all 8 CPU cores enabled, reboot your robot. If the problem continues, file a ticket with Willow Garage Support.

4.2. NTP Errors

Errors and warnings about NTP (Network Time Protocol) in the diagnostics may mean that the computers on your robot are out of sync. The PR2 uses chrony to keep the computer clocks in sync, and in sync with the basestation.

Each computer has two times: the time chrony thinks it is, and the system time. When they disagree, chrony slowly slews the system time until they match again. When you do "ntpdate -q <server>" you compare host's chrony time with the local system time. Doing "ntpdate -q <hostname>" allows you to verify that the chrony time and the system time match.

If you see an error with the NTP diagnostics, run

sudo service chrony restart

on c1 and/or c2.

If you see an error with the NTP status for "10.68.255.1", this probably means that your robot cannot contact the basestation. That diagnostic status item monitors the clock between the basestation and the robot. If you do not have contact with your basestation, you can ignore this error.

4.3. /dev/sequencer Warning

The warning below is a warning from the sound driver:

open /dev/sequencer or /dev/snd/seq: No such file or directory

The message is printed to stdout after launching /etc/ros/robot.launch. The warning looks benign, and does not cause other problems.

Ticket in ros-pkg: #3570.

4.4. Wifi Status (ddwrt): No Updates

If you see error messages from "Wifi Status" in the diagnostics, this is probably a configuration problem with the WRT610n router on the PR2.

In order for "Wifi Status" to be properly updated in the diagnostics, the router must be able to talk to the wifi_ddwrt node. Check the ROS console for any error messages.

If you have changed the password for the WRT610n router, you will need to updated the password parameter for the wifi_ddwrt node for it to work properly. See the package documentation wifi_ddwrt.

Try running sudo pr2-systemcheck to see if you have an error messages from your router.

5. Basestation

5.1. Unable to upgrade libsasl2-2

There is a known problem during some upgrades on Ubuntu 9.04 (Jaunty) that causes apt-get upgrades to fail.

$ sudo apt-get -f dist-upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  libsasl2-2 libsasl2-modules
2 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 0B/284kB of archives.
After this operation, 0B of additional disk space will be used.
Do you want to continue [Y/n]?
E: Internal Error, Could not perform immediate configuration (2) on libsasl2-2

If you see this, add the following line to /etc/apt/apt.conf:

APT::Immediate-Configure "false";

More information here: https://bugs.launchpad.net/ubuntu/+source/cyrus-sasl2/+bug/194140 #18 comment

5.2. Cannot Find libCg.so

The library libCg.so is part of the "nvidia-cg-toolkit" package. For some reason, this library can fail to install during routine upgrades. If rviz won't start because "libCg.so" is missing, run:

sudo apt-get install nvidia-cg-toolkit --reinstall

6. Errors from PR2 Systemcheck

The tool PR2 Systemcheck is performs a self-test or a prestartup check to the PR2 and components. It is designed to check that all devices are plugged in and configured properly, and the computers are operational.

To run systemcheck, open a terminal on your robot and run:

sudo pr2-systemcheck

If you have a problem with pr2-systemcheck that is not listed here, file a ticket.

6.1. SKIPPED Checks

When the robot software is running, not all checks can be run. Some checks require a lock on a device, for example. Run "sudo robot stop" to stop all ROS processes before running sudo pr2-systemcheck.

If you do not run pr2-systemcheck as root, some checks will not run because they don't have the proper permissions.

6.2. Not all WGE100 cameras found

If the script checkwge100.sh reports a missing camera error:

Running 'c1/checkwge100.sh'     OK
 Camera name://narrow_stereo_l is missing

This is a known problem with systemcheck. Occasionally, the camera check script will not wait long enough for the cameras to respond and report that a camera is missing.

Note that the checkwge100.sh script reports OK. If it reports OK, that means it has found all cameras, and you can ignore the missing camera.

If the camera does not respond and pr2-systemcheck reports FAIL, file a ticket.

6.3. Unable to contact c2

Running scripts for c2
Testing ssh connection to c2... FAIL
 ssh: connect to host c2 port 22: Connection refused

If c2 is not up, systemcheck will fail. This can be caused by several problems.

  • c2 does not have power. Check to make sure both lights are on.
  • c2 cannot contact c1. Try pinging c1.
  • c1's network interfaces are malfunctioning. Make sure all interfaces are up.

Other, more complicated, problems could be:

  • c2 did not netboot properly off of c1
  • c2 has a hardware malfunction.

If c2 does not come up, you can try the following method to reset it:

  • Shut down c1. Use "sudo shutdown now"
  • Toggle the main breaker of the robot to disable power
  • After 15 seconds, enable power to the robot

If c2 doesn't not come up, file a ticket.

6.4. Interface is not Up

 * Running 'c1/checkifacec1.sh'...                 FAIL (Return code: 2)
  lan1 is not up

Systemcheck makes sure that all network interface on both c1 and c2 are UP. If any interface is not up, enable it manually.

sudo ifdown lan1; sudo ifup lan1

If that does not work, file a ticket.

6.5. Check EtherCAT Failure

 * Running 'c1/checkethercat.sh'...                FAIL (Return code: 1)
  On torso_lift_motor :
    On port p1 :
      expected : Hub (head)
      found    : *nothing*

The checkethercat.sh script uses a tool called ec-diag to make sure the PR2's etherCAT chain is functioning. It verifies that all MCB's (motor controller boards) are present and properly configured.

If you see a failure of checkethercat.sh

  • Make sure your MCB power is on. All breakers must be "ON" or "Standby" (green or yellow)
  • Make sure all appendages to your robot are present and there is no custom hardware installed.
  • Record the error message so you can file a ticket.

To attempt to clear this error, you can try power cycling the MCB's. Using your pr2_dashboard, you can disable the power to all MCB's (red), and re-enable the power. After you reset the power, you can stop all robot software using "sudo robot stop" and re-run pr2-systemcheck.

If you have a problem with checkethercat.sh, file a ticket.

6.6. Check Projector Firmware

The script checkprojectorfirmware.sh checks the projector board by querying the etherCAT chain on the PR2.

If the "checkprojectorfirmware.sh" script fails, follow the procedures above for the "Check EtherCAT Failure".

7. Motors Halted / Warnings

For safety purposes, the realtime loop (pr2_etherCAT) will disable all motors when almost any type of error of is detected. It will also report warnings for certain anomalous conditions.

One or more EtherCAT Devices are reporting Safety Lockout. When an EtherCAT device detects a certain internal error conditions it will shut off its motor* output and flag an error to higher level software. The motor can only be re-enabled by sending a special command ([ResetMotors]) when the condition causing the error has cleared.

The Safety Lockout is specific the EtherCAT device that reports the condition. However, there are cases where multiple EtherCAT devices will go into Safety Lockout at the same time.

The are two fields that will be useful in determining the cause of a safety disable :

  • Safety Disable Status

  • Safety Disable Status Hold

Safety Disable Status provides an indicator of what conditions are currently occuring. Safety Disable Status Hold shows what conditions caused a safety disable. Safety Disable Status Hold will hold is value until the safety disable is manually reset (set ResetMotors).

Why both values? Safety Disable Status Hold is the best way to determine the cause of a safety disable that has already occurred. Safety Disable Status is useful because the cause of a safety disable is still occurring. If the cause of the safety disable is still present using ResetMotors will not work because device will immediately go back into safety disable.

In the following screenshot, Safety Disable Status Hold lists the cause of a safety diable as UNDERVOLTAGE. However Safety Disable Status does not show UNDERVOLTAGE. This means that the device was UNDERVOLTAGE, but not longer is. Using ResetMotors will clear the safety disable.

MCB is undervoltage

In the follow screenshot, both Safety Disable Status Hold and Safety Disable Status show UNDERVOLTAGE. This means the device is still UNDERVOLTAGE. ResetMotors will not work because the device will immediately go back into safety disable.

MCB is undervoltage

There are 5 different causes of a safety disable. To figure out the cause, look at the Safety Disable Status Hold of the EtherCAT device. When a safety disable occured, this usually have DISABLED plus one or more keywords describing the cause of a safety disable are listed below.

  • UNDERVOLTAGE
  • OVER_CURRENT
  • BOARD_OVER_TEMP
  • HBRIDGE_OVER_TEMP
  • WATCHDOG

7.1. Undervoltage Lockout

This is caused by the high voltage supply dropping below 24V. The high voltage supply provides power to all of the EtherCAT MCBs (Motor Control Boards). The EtherCAT hubs are powered off a separate 12 volt supply and will be effected by an undervoltage lockout. Please note the the undervoltage safety disable is not caused by the high voltage supply being turned off completely, this will will cause a different set of errors.

Its possible to guess at what caused the undervoltage event by looking at the devices that were effected by the event.

7.1.1. Runstop

If all EtherCAT devices report an undervoltage safety disable, the likely culprate is the runstop.

Pushing on the runstop button or wireless runstop causes all MCB supply voltage to drop to approx 18 volts. This is used a signal to the motor control board to disable to the motors. When runstop is engaged, all motor control boards will report a safety disable caused by being undervoltage. EtherCAT hubs run off a seperate 12 volt supply so they will not a safety disable. In some case the wireless runstop will run out of battery power, this will have the same effect as engaging the runstop.

The pr2_dashboard provides information on whether the runstop button or wireless runstop is enagaded.

7.1.2. Some Motors are Undervoltage

The high voltage power for the robot are split into 3 different parts.

  • The right arm
  • The left arm
  • The base, body, and spine.

The power for the MCBs can be control by commands from the computer. Its possible that the power want inadvertnetly shutoff by one of these commands. It is also possible that the power board may have detected a too much current draw (short circuit) on a single supply and lowered the voltage to 18V.

7.1.3. One Motor Undervoltage

There definitely something wrong with the hardware. First check of a loose power connector. If not, the EtherCAT MCB may be damaged.

7.2. Over Current

Each EtherCAT MCB measures its own supply current. If an MCB detects that it is pulling too much current, it will shutdown the motors as a safety precaution. The MCB is damaged, or something is causing a short circuit on the MCB itself. Sometimes metal filings, lost screw, or wire strands can cause a short circuit on the MCB. If loose piece of metal is causing a short on the MCB, please replace the MCB and put a note of this in the inventory system. The short circuit can cause damage to the MCB electronics that causes future failures (even if the MCB currently works).

Since the MCB output to the motors is current controlled. A short circuit across the motor inputs is unlikely to cause this type of problem.

7.3. Board over Temperature or Bridge over Temperature

There are two temperature sensors on the MCB. One is close to the power electrics (Bridge), and one father away (Board).

If robot is being used in an extremely warn environment this could occur.

7.4. Watchdog

A WATCHDOG lockout means that the robot's computers lost contact with the motor controller boards for >0.1s. This loss of contact can indicate a loose cable or connection between two MCB's, or it could mean that the robot's computers glitched or malfunctioned.

This condition can be fixed by using the "Reset Motors" button on the pr2_dashboard.

7.4.1. Some EtherCAT Devices in Watchdog

This is likely a cable or connector problem. File a ticket.

7.4.2. All EtherCAT Devices in Watchdog

If all devices are in WATCHDOG, this usually means that the realtime control loop has glitched. Check the diagnostics and ROS console for any error messages. When the CPU's are loaded, this could cause the realtime control loop to glitch.

If this problem persists, file a ticket.

7.5. Accelerometer Sampling Frequency

We've noticed that the gripper motor controller boards (MCB's) can report a "Bad Accelerometer Sampling Frequency" warning. This means that the accelerometer built into the MCB isn't reporting at 3KHz.

The warning will go away once you power cycle the MCB's. If you're not using the accelerometers, then the problem won't affect any other system in your robot.

If you are using the accelerometers, there is more troubleshooting information available here.

7.6. Other Motor Problems

When pr2_etherCAT loses contact with the MCB's, it may print out a message like:

low_level_output: Cannot Send : Resource temporarily unavailable
low_level_output: Cannot Send : Resource temporarily unavailable
low_level_output: Cannot Send : Resource temporarily unavailable
low_level_output: Cannot Send : Resource temporarily unavailable

This generally means the connection from the ecat0 port on c1 to the MCB's is broken. If you see this message, file a ticket at Willow Garage Support.

8. Batteries

8.1. Error: Batteries report "No Good"

The "No Good" message means the batteries are in a lockout/error state. When a battery is drained too low, it can disable itself to prevent further damage, and enable the "No Good" state.

In order to reset this message, the ocean battery server needs to be completely power cycled.

If you see this message on a PR2, file a ticket.

8.2. Error: Battery Report "Stale updates" or "No updates"

The batteries periodically publish their status to the ocean battery driver. If they go too long before publishing status, the battery driver will report a warning or error. The timeouts for these warnings and errors can be adjusted with the ~lag_timeout and ~stale_timeout parameters to the ocean_battery_driver.

If a battery reports this message, and the timeout is sufficiently long (over 60 seconds, for most purposes), this probably means a connection problem with the battery.

At the end of the diagnostic message for each battery, they report "Time since update (s)". If this value is "N/A", this means the battery has not updated, and is probably disconnected.

8.2.1. Fix/Mitigation

If your battery is not updating, file a support request. This condition is not user serviceable, and usually indicates a bad battery.

8.3. Battery Temperature

What happens if your batteries report "High Temperature"?

8.3.1. Background

The PR2 battery system reports a warning in the diagnostics if the temperature goes above 50C, which is the maximum recommended operational temperature of the batteries.

The batteries are exothermic when discharging, and tend to get hottest when they are near minimum charge. During a work day, when your PR2 is constantly charging and discharging, the temperature of the batteries can slowly climb. If your lab or workspace is warm (above 25C) you are much more likely to see this problem.

After the battery temperature goes above approximately 54C, the batteries will not charge until the battery temperature drops below 46C. This condition is reported in the diagnostics with the message "Charge Inhibited, High Temperature".

8.3.2. Fix/Mitigation

If your batteries are reporting that they are too hot, then you need to reduce the load on the batteries. Generally, it is best to plug the robot in as soon as possible. It may take several hours for the batteries too cool down and charge.

The power board fan will go to 100% power after your batteries reach 46C to help cool the batteries down. Make sure your main fan filter is clean and free of debris.

PR2Wiki: FAQ (last edited 2011-03-07 16:39:06 by KevinWatts)