Problems: Hepgrid5 & 6 crashed with Dcache software problems, may need more RAM. They were rebooted. Hepgrid4 had hard disk failure, repaired. Cluster is still off the GRID until PT returns next week. He is at CERN at a GRID meeting this week.
Dcache and SE Status: Disk audit: 3Tb have been identified, with another 3TB possible. The free space on SE is still ~0.7 Tb.
Internal network issues: FORCE10/DELL switch connectivity. It appears that several cabinets have the intermittent fault that causes the cluster to go off line for short periods. This problem is still under investigation. There is FORCE10 hardware stored in the room the status of which is uncertain, and there appear to be no plans to use. This needs to be clarified as its re-sale value decreases with time. Action RF/JB.
Plans: upgrades/mods whatever. Memory to repair nodes is being sourced, will be added to RAM needed to upgrade some desktops. Expected cost ~£1K total. PS needs a laptop to take abroad that runs ProEngineer.
ATLAS software: Await the cluster coming back online to install V 12.0.4
Non-LCG cluster report
Number of racks in use: still 3 racks are used at present. One node is causing problems (sucking in jobs) that needs fixing asap. ATLAS V12.0.4 still has problems that should be fixed with V12.0.5 that is due out this week, and will be installed asap.
Status of users' software: Front end machines for Cockcroft and T2K. Some software has been installed on the T2K front end.
Statistics: no news
Trash/CDF and SAM
Linking SAM to LCG nodes. It’s understood how to link SAM to LCG and will be tested as soon as the cluster is back. Another necessary step has been identified before Trash/CDF jobs can run; SF will assist to sort this out.
Ok, no problems.
A few new faults have been fixed. The machine “Fermion” has a faulty RAID controller, and the 10 node interactive cluster is nearly ready. For the time being SLinux3 but this will need to go to SLinux4 soon.
Documentation: The Twiki is up and in use. Users must register to write to it, but its got world-wide visibility.
Clean Room PC upgrades. The situation is being investigated. It may be possible to write/obtain USB drivers for some kit that would mean ancient machines could be replaced with new ones.
PP Web page support. The Webpage is important, both for ongoing student recruitment but also for the RAE later this year. PA will employ KS for some weeks to work on the webpage and produce up-to-date material and a system that will be easier to maintain. It's important that this new website follows the latest University Corporate identity and format. The page is hosted on the hep machine, but the new webpage front end might migrate to CSD machines. Action JV and SJM will arrange to visit CSD with KS to discuss the best way forward.
Any Other Business
Date of next meeting
Monday 05 Feb 2007, VC room.
This topic: Computing > Meetings > MeetingMinutes22Jan2007
Topic revision: 18 Oct 2012, JohnBland