Also, due to the comparatively small plane wave cutoffs, you will have small but significant modulations of the density in especially in regions with little electron density. These lead to "strange" effects with gradient corrected functionals, causing the optimization to fail. To avoid this, you can skip the calculation of the gradient correction for low electron density areas using GC-CUTOFF with a value between 1.D-6 and 1.D-5 in the &DFT section.
In case of geometry optimizations, also the accurate calculation of the forces due to the augmentation charges may need a higher density cutoff and/or a tighter real space grid. This can be achieved by either using a higher plane wave cutoff or via increasing DUAL to 5.0 or even 6.0 and/or setting the real space grid explicitely via the MESH keyword in the &SYSTEM section. For the same reason, these options may be needed to increase energy conservation during molecular dynamics runs. Use these options with care, as they will increase the cpu time and memory requirements significantly und thus can easily take away one of the major advantages of ultra-soft pseudopotentials.