Eucalyptus cloud admins are encouraged to consult the Known Bugs page before diving into the investigation of unexpected behavior.
If an administrator ever needs to stop/start a Eucalyptus front-end because of a configuration change, or if the machine on which the front-end is running reboots unexpectedly, the administrator must terminate all running instances in the system before bringing Eucalyptus back online. (It is possible to restart the cloud controller using /etc/init.d/eucalyptus restart on the head-node without affecting the rest of the system, but then some of the configuration is not reloaded. Doing stop followed by start on the head-node will reload the configuration, but will also destroy the virtual network setup among the running VMs, making them inaccessible.)
If the restart is planned, the administrator can use the client tools to terminate all users instances before stopping/reconfiguring/starting Eucalyptus. If the restart was unplanned (front-end machine crashes), the admin can try to start Eucalyptus and immediately terminate all running instances, or can manually stop all eucalyptus components, destroy all running Xen instances using 'xm shutdown' or 'xm destroy' on the nodes, and starting all Eucalyptus components.
If something is not working right with your Eucalyptus installation, the best first step (after making sure that you have followed the installation/configuration/networking documents faithfully) is to make sure that your cloud is up and running, that all of the components are communicating properly, and that there are resources available to run instances. After you have set up and configured Eucalyptus, set up your environment properly with your admin credentials (source eucarc), and use the following command to see the 'status' of your cloud:
ec2-describe-availability-zones verbose
You should see output similar to the following:
AVAILABILITYZONE cluster <hostname of your front-end> AVAILABILITYZONE |- vm types free / max cpu ram disk AVAILABILITYZONE |- m1.small 0128 / 0128 1 128 10 AVAILABILITYZONE |- c1.medium 0128 / 0128 1 256 10 AVAILABILITYZONE |- m1.large 0064 / 0064 2 512 10 AVAILABILITYZONE |- m1.xlarge 0064 / 0064 2 1024 20 AVAILABILITYZONE |- c1.xlarge 0032 / 0032 4 2048 20 AVAILABILITYZONE |- <node-hostname-a> certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009 AVAILABILITYZONE |- <node-hostname-b> certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009 AVAILABILITYZONE |- <node-hostname-c> certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009 AVAILABILITYZONE |- <node-hostname-d> certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009 AVAILABILITYZONE |- <node-hostname-e> certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009 AVAILABILITYZONE |- <node-hostname-f> certs[cc=true,nc=true] @ Sun Jan 04 15:13:30 PST 2009 ...
If the output is empty, missing even the cluster then first make sure that you've added one (consult the 'Configuration' tab of the admin Web interface) and then consult Cloud Controller (CLC) logs and Cluster Controller (CC) logs, described below, to figure out why CLC is not getting valid status information from the CC.
If the output of ec2-describe-availability-zones verbose has the cluster mentioned, but is missing some or all of the nodes, then you will need to figure out why the Cluster Controller (CC) is not getting valid status information from the Node Controller (NC) on that node. To do so, start with the CC logs, which may show that CC is trying to connect to an NC on the wrong host or port. If CC is invoking the right NC endpoint, then consult the NC logs.
On each machine running a Eucalyptus component, the log files are located in:
$EUCALYPTUS/var/log/eucalyptus/
On the front-end, the Cloud Controller (CLC) logs primarily to 'cloud-output.log' and 'cloud-debug.log'. Consult these files if your client tool (ec2 API tools) output contains exception messages, or if you suspect that none of your operations are ever being executed (never see Xen activity on the nodes, network configuration activity on the front-end, etc.).
The Cluster Controller (CC) also resides on the front-end, and logs to 'cc.log' and 'httpd-cc_error_log'. Consult these logfile in general, but especially if you suspect there is a problem with networking. 'cc.log' will contain log entries from the CC itself, and 'httpd-cc_error_log' will contain the STDERR/STDOUT from any external commands that the CC executes at runtime.
A Node Controller (NC) will run on every machine in the system that you have configured to run VM instances. The NC logs to 'nc.log' and 'httpd-nc_error_log'. If these files do not exist, then CC is not invoking the NC on the node. Consult these files in general, but especially if you believe that there is a problem with VM instances actually running (i.e., it appears as if instances are trying to run - get submitted, go into 'pending' state, then go into 'terminated' directly - but fail to stay running).
FinishedVerify PROBLEM: Not enough resources available. MSG-TYPE: RunInstancesType`then your Eucalyptus installation is running at maximum capacity. Most likely its capacity is actually 0 because of a misconfiguration. See Diagnostics above.