Minutes of HEP Computing Users Meeting 28: 7th July 2008

Present: MAH, CG, JB. Apologies: TG, SJM, JB, CT, MK, TJVB, PA, CG, JNJ, N Mc, RF, DH, BTK

(1) LCG Cluster report:

(a)Cluster is running with 14 racks (481 nodes). Now getting lots of LHCb jobs. Some outstanding upgrades to gLite need installing. ATLAS jobs have run with 100% efficiency in the last 24 hours.

(b)The FORCE10: running OK

(c)Plans and news: A new CE on which to attach nodes in the NW Grid cluster is on order using Gridpp3 money.

The Gridpp DB has announced a new accounting period to share out the next tranche of Grid hardware money. For each Tier-2 site this will be the best quarter in 2008 plus the first 2 quarters of 2009. Currently we are the best performing UK Tier-2, delivering about 11% of the total CPU.

(d) Dcache and SE Status:

D-cache needs some software upgrades and the firmware in the 3ware RAID card. This will mean a re-boot at some stage.

Testing of DPM as possible replacement for d-Cache has started.

(2) Plans and news:

(a)Network within the OL: The special HEP computing meeting last week concluded that an approach to the University should be made to fund an upgrade the entire building network as this needs specialist contractors to deal with asbestos in the building and cable ducts. The present HEP network is in a delicate state as the installed cables seem not work with new switches and hubs which will cause major problems if these fail.

(b) The interactive nodes as UIs need more testing with Ganga: action CG.

(c) The GRIDPP3 funding announcement has arrived: we got £72.7k, of which there is about ~£54k left. Most of this will be spent on RAID.

(e) Interviews for the Sys admin post will take place this Thursday 10th July; there are 7 in the shortlist.

(3) ATLAS jobs: only a few ATLAS jobs in the last week, but run with high efficiency.

(4) Non-LCG cluster report:

(a)Currently 177 nodes are running flat out with T2K MC. Data will be stored on T2K-FE and hepstore.

(d)Cockcroft : no news

(5) Trash/CDF and SAM: The Trash/CDF disks have been powered off, and the Trash/CDF rack will soon be stripped out and moved, ready to be re-furbished. The UPS and network switch etc for this work are on order.

(6) Network Issues: see above

(7) BATCH CLUSTER. The 40 DELL nodes SL4 batch cluster is running well, with 100s of jobs in the Q. The new /scratch RAID will be installed again soon as the firmware fix to the 3ware RAID card seems to work.

(8)Plans for:-

(a)Documentation: no news

(b)Clean Room PC upgrades. Ongoing but delayed due to DM being off ill.

(c) PP Web page support. No more news

(10) AOB:

(a) FM (Dave Dutton) promised once again to install the cable from the chiller units on the roof down to the cluster room so that the voltages can be monitored. This request was first made in Nov 2007. (b) A FM rep attended the special HEP computer planning meeting last week and is trying to get a power audit of the whole building done as a matter of urgency. There is need to understand what the power implications are of adding new multi-core nodes into the cluster room.

(c) Current ongoing tasks are;

(1) Finish the roll out of the MON system with auto warnings of node failure; (2) Continue testing DPM. (3) Update the 3ware cards in the RAIDs; (4) Repair failed non-LCG system nodes (5) Continue clean room upgrades; A software upgrade to the DAGE machine was due this afternoon. (6) Install new CE for the NW-GRID CSD hardware; (7) Look at the LUSTRE file system.

(11) Date of next meeting: Monday Mon 21st July 2008.
Topic revision: r2 - 18 Oct 2012, JohnBland
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback