MeetingMinutes19Feb2007

  • Present
    • PT, MAH, JB, RF, JV
  • Apologies
    • PA, BTK, TJVB, SF

LCG cluster report

  1. Status of infrastructure: DM arranged for the refrigerant leak to be fixed at no cost to us. The temperature monitoring hardware has been linked to a laptop via the serial port so that a visual check on the rack temperatures can be made periodically. Software to shut the racks down in a controlled way via a USB port connection is being developed. The 13 water-cooled racks remain off at present to test the cooling. When there is no load on the nodes in a rack the temperature is 20 C; one air-cooled rack is running on the GRID.
  2. Problems: none.
  3. Dcache and SE Status. The SE is nearly full. PT has identified that the Dcache is crashing due to loss of connection between the SE Dcache manager and the pull-nodes (6 external write servers). This might be cured by actions listed in (4) below. PT also proposes that the empty rack 12 be used to house ~30 repaired nodes as part of the Dcache. These could be upgraded with extra disk (if we decide to go down that route) without disturbing the rest of the LCG cluster. Also if a spare node was needed urgently it could be removed from this rack without compromising the stored data. The meeting approved this idea.
  4. Internal network issues: FORCE10/DELL switch connectivity. PT has found the DELL switches are running at full duplex and seem to lose control packets. He proposes that they be re-configured to run at half duplex. This will need a reprogram of the EPROM on each node. Software to do this might be available, if not it will mean manually connecting to each node. To do this task will require scheduled downtime for the cluster. This could take place with any FORCE10 upgrades.
  5. Plans: (a) Memory has in fact only just been ordered and awaits delivery. (b) JB and RF have proposed that the FORCE10 switch be upgraded using the existing line cards to give a much faster and robust internal network that would support future expansion. They have circulated a document detailing this. The only cost is ~£500 for a new patch-panel. Upgrade software for the FORCE10 can be downloaded from their website (once registered). Post this meeting TJVB expressed his enthusiasm for this proposal.
  6. ATLAS software: installation of 12.0.5 and 12.0.6 is on-going.

Non-LCG cluster report

  1. Number of racks in use 3: It is hoped to add 2 more racks in the near future.
  2. Status of users' software: JV asked if the LCG cluster could support the Linux ‘MPI’ (Multi-Processor) command to enable Cockcroft jobs to run the nodes as one computer. The meeting requested that full details of this request be sent to the helpdesk@hepREMOVETHIS.ph.liv.ac.uk account, and that this command should be tested first on the Trash/BaBar cluster rather than LCG as it might have unexpected consequences.
  3. Statistics: no news.

Trash/CDF and SAM

  1. No news.

Network Issues

  1. None.

Plans

  1. Documentation: The Twiki is up and in use.
  2. Clean Room PC upgrades. JB/RF plan to visit clean rooms soon to investigate what is needed.
  3. PP Web page support. KS continues to work.
  4. There were no objections raised about the proposed GRID Site Operations Agreement, see https://edms.cern.ch/file/726129/1/AgreementSiteOperations-20060808-0.9.doc .
  5. GRIDPP Site Visit. There will be a one day site visit, before the end of April, by GRIDPP reps to review the status of each GRID site before full production starts. A questionnaire will be supplied before the visit. More details later. We have suggested some time in the week starting 16th April for the Liverpool visit.

Any Other Business

  1. None.

Date of next meeting

Monday 5th March 2007 at 14:00.
Topic revision: r2 - 18 Oct 2012, JohnBland
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback