Slow Controls Elders' Notes

This document is intended to contain some useful notes for SC elders providing on-call. If it is useful I will carry on with it.

If you update this file, please modify the HTML source file, DELPHI$ONLINE:[SLOW_CONTROL.NOTES]SC_ELDERS.HTML.

Contents:

WWW HIPE giving out-of-date information

A problem has been seen in which the WWW version of HIPE can display out-of-date information. This arises because the HIPE_SERVER process is attached to an old copy of the EP global section.

This situation arises because the EP is restarted. Normally the EP (via an exit handler) can inform the HIPE_SEREVR that it is stopping. The HIPE_SERVER then knows to let go of the global section, and it will pick up the new. However, if the EP is brutally stopped (e.g. by a "STOP" command, rather than FORCEX or DELETE/ENTRY). (Note: The job_control system does the correct thing when restarting jobs, and should not cause a problem.)

The solution is to restart the HIPE_SERVER process on the relevant node.

Gareth Smith, 15.8.94

How to Change the Ethernet Address on a G64/Ethernet card

When swapping a G64/ethernet interface it is usually easier (from the software point of view) to put the address of the old card into the replacement. (This cannot be done if swapping a double card for a single or vice-versa.)

Double-card interface:
The address is contained in the EPROMS labelled "CATS 8". This pair of chips labelled: CATS 8 1 - 7 and CATS 8 8 - 15 should be swapped between the old and new cards.

(In fact the address is only in one of the chips, but as I can never remember which of the chips it is in, please change both.)

Single-card interface:
The address is changed in a small chip that is underneath one of the memory boards (these are the stips of chips on long legs). The last 4 numbers of the card's address are written on the chip. (E.g. "106A"). These chips are in chip-holders (rather than being soldered directly onto the card as for many of the chips.) This chip should be swapped between the olf and new cards.

Gareth Smith, 15.8.94

CALB database recovery (after 'no exclusive access' error)

  1. to be on the safe side stop all EP's (that access the DB).
  2. ASSIGN FOR050, FOR051, FOR052 to the appropriate DB files. (see EP logfile for this).
  3. $Run d$onl:[database.cargo]dedit
  4. +REC
  5. restart the EP's again.

André Augustinus, 9.2.95

Clearing Gas Alarms for which the Cancellation Message has been "lost"

Sometimes (especially when systems - e.g. AXDESC - are restarted, it is possible to get a gas alarm which is cleared (as indicated by the gas supervisor) but for which the message stating that the alarm has gone away has become lost.

The sequence is that the state of the gas system is injected into EMU - and also forwarded to GSS via the D_TO_G program.

A help has been written by André and can be accessed by typing the command GAS_ALARM_CLR_HLP when logged in as SC_OPERATOR. You should read that as well as the rest of this note.

You can see the name of the outstanding alarms by looking in the file

DISK$USER:[SC_OPERATOR]D_TO_G.ALM
You can type this file using EZTYPE.

If there are outstanding alarms they will look like, for example:

HPCDISB4***01A

In this case the alarm is on the HPC distributor on B4, and it is alarm number 01 (or just 1). You can clear this alarm (of course, having first ensured that the alarm is really not there) using a program of André's. This program is run (on vax only) by typing (under SC_OPERATOR) the command GAS_ALARM_CLR.

The program asks several aquestions:

And it disappears as if by magic!

Note that this does NOT clear the alarm states in the gas SMI. To do this you should restart the GAS_ALARM process.

Please read André's notes in GAS_ALARM_CLR_HLP

Gareth Smith/André Augustinus, 17.8.94 (updated 24.9.99)

Restarting EMU on WSDESO

There is an updated text here.

The EMU connection between the Solenoid VAXstation (WSDESO) and the SC Maestro's VAXstation (WSDESC) has been known to break. This most often happens when WSDESC is rebooted. The connection can be tested using the procedure described [in section 17.2 of the SC Maestros' Guide].

If the connection is broken, you should beep the Slow Controls expert. If there is no response from the expert, you can try fixing it yourself by restarting the EMU router on the Solenoid VAXstation, WSDESO. This is similar to the procedure for restarting EMU on WSDESC, described [in section 6.4 of the SC Maestros' Guide]. Log into the "SYSTEM_CP" account on WSDESO ("SET HOST WSDESO" -- the password is the same as on WSDESC). Select the "menu for EMU" and then the "Restart just the EMU router" option. When the operation is complete, wait a few minutes and test the connection again (you have to log in as SOLMON to do this).

Note that the most commonly encountered problem can be cured by restarting restarting just the router. If this fails, you can try a full EMU restart. If the problem persists, contact the Slow Controls expert.

Whether the problem is solved or not, you should inform the Solenoid on-call (13*7035) of the situation during the day only.

(This section may soon be placed in the SC Maetsro's guide.)

Tim Adye

Switching the Solenoid NMR probes on and off

Rack D1512 contains the G64 controlling and reading out the NMR probes, and, immediately underneath it, a multiplexer. Both are switched on/off by a single switch at the back. The cables to each of 5 probes are connected at the base of the mutiplexer. There are 2 cables per probe, situated vertically above one another. To disconnect a single probe, pull out both the relevant cables.

The G64 control console is in the Solenoid barrack (key needed to enter). It is the upper of the 2 G64 keyboards situated in the rack at the far end of the room (clearly indicated by a label). The NMR control is on window 2 (CTRL F6) of this monitor. Obtain the Flex prompt (+++) and type

0.s_gpib.bin

Monitoring will start. Each probe is tried for up to 20s before passing to the next one. If no reading is obtained "XXXXXX" is written to the screen. Note that the NMR probes only give meaningful readings when the field is close to its nominal value of 1.2 Tesla.

R.S., 1/11/94

Restoration of "old" version of EP for ID and OD High Voltage objects

The ID and OD are running a new version of the EP for HV control, in which the HV is described by ".SC_RELATED" and ".RUN_RELATED" subobjects, (see SC_NEWS). In case it is necessary to go back to the HV EPs for ID and OD which do not contain subobjects, I append the instructions supplied by André Augustinus and Mark Dönszelmann respectively. Also, the Motif SMI displays for these detectors would have to be changed back to the one showing only a single HV object. Instructions from Mark Dönszelmann are given to do this (though it may be necessary to have privileges to carry these out).

RS, 7/11/94

ID

As some of you already experinced, we are now running a new version of the HV controls (it has been tested over the last few days). A special version of the EP has been created by Tim for this: Version 4.7. This splits the original HV objects (HV_JT and HV_FS) in two so called subobjects for each object: HV_xx becomes HV_xx.SC_RELATED and HV_xx.RUN_RELATED

The SC_RELATED is exactly the same as the old HV_xx object with its states ON, OFF, ERROR, CHANGING, LOADING etc. The commands are given to this subobject. The RUN_RELATED subobject has (besides the 2 states DEAD and NO_CONTROL) only 2 states: READY and NOT_READY. It is READY when most of the channels are on, and NOT_READY when more than a certain fraction is off. The 'calculation' of the overall RUN_RELATED object (that drives Big Brother) is based on these HV_xx.RUN_RELATED objects.

To see all this the 'color' SMI display has been changed, the CMD button is now on the HV_xx.SC_RELATED object. You cannot give commands to the run-related subobjects.

For the moment I have put in some (rather) arbitrary fractions for the run_related subobjects:
for HV_JT 92% should be ON = 31 out of 34 (24Jet+10Trigger) channels.
for HV_FS 81% should be ON = 20 out of 25 (24Jet+Cylinder) channels
(these are EP channels, so 4 channels Off means 2 LeCroy channels Off. Thes numbers can easily be changed (just restarting the EP with a new parameter).

In the case we would want to swap back to the old version I created for all relevant files a 'nosub' version (no subobjects). The modifications only concern the HV EP's and SMI. To restore the old version, don't hesitate to phone me, but if I'm not reachable try the following:

  1. in detector$specific:[slow_control.elementary_process] (go_ep): copy start_ep.com_nosub to start_ep.com (this file is used by the start of EP's from the menubar)
  2. in detector$specific:[slow_control.elementary_process.start] (go_ep, down start): copy ep_hv_jt.par_nosub to ep_hv_jt.par and copy ep_hv_fs.par_nosub to ep_hv_fs.par (these files are used by job_control (and thus sc_maestro) to start/stop EP's)
  3. in detector$specific:[slow_control.control.smi$id] (go_smi): copy id.exe_nosub to id.exe
  4. restart the processes: SMI_ID_SC, EP_ID_HV_JT, EP_ID_HV_FS
  5. the SMI_display will now show NO_LINK for the HV (sub)objects, the easiest way to have control now is to use the 'old' smi_display using the command: smi_sc_display (the SC_maestro will still get the 'no_link' version when popped up from his main display, but you can't have it all.....)

I'll leave a hardcopy of these instructions in the logbook and/or bible.

André

OD Instructions to do with EPV4.7

As some of you may have noticed, we are now running a new version of the EPs (which have been tested over the last few days). This affects in particular the HV control. The original HVAN object is split in two so-called subobjects: HVAN.SC_RELATED and HVAN.RUN_RELATED. The SC_RELATED subobject is exactly the same as the old HVAN object with its states on, OFF, ERROR, CHANGING, etc. The commands are given to this subobject. The RUN_RELATED subobject has, besides the 2 states DEAD and NO_CONTROL, only 2 states: READY and NOT_READY. It is READY when most of the channels are on, and NOT_READY when more than a certain fraction is off. The 'calculation' of the overall RUN_RELATED object (which drives Big Brother) is based on this HVAN.RUN_RELATED object.

To see all this the 'color' SMI display has been changed, the CMD button is now on the HVAN.SC_RELATED object. You cannot give commands to the RUN_RELATED subobject.

For the moment there is a (rather) arbitrary fraction for the RUN_RELATED subobject: 94% should be on = 25 out of 26 channels (23 planks plus 1 plank split into 3). This number can easily be changed, by just restarting the EP with a new parameter.

In the case we need to swap back to the old version there are new and old versions of all the relevant files (with and without subobjects). The modifications concern all EPs and SMI. To restore the old version, do not hesitate to phone me (phone numbers available online), but if I'm not reachable try the following:

Martin McCubbin, 7/11/94

Motif SMI display (DUI) for OD and ID SC

In directory d$onl:[motif.dui.misc] the file dui_misc.uid (latest version) contains subobjects, the one names dui_misc.uid_nosub contains no subobjects.

matter of copying...

DUNS

Tim Adye

What to look for if the central "Genoa" ramp down (panic) button does not work.

There are typically 2 causes for this button not work, other than a hardware failure in one of the boxes.

Gareth Smith, 27.4.95

Possible problems switching on Fastbus power supplies

Sometimes a fastbus power supply refuses to switch on, even when you are downstairs pressing the ON button. Note the following:

Gareth Smith, 26.7.95

How to redirect a Beep number to another beep.

If you have a failure of a beep, the calls to that beep can be redirected to another. This is done by telephoning the 'Standard' (111) and asking for the beep to be redirected.

Some comments on the CAEN systems.

During last winter (97/98) most of the 'old' CAEN HV systems (the SY127 crates) were upgraded to version 6.6 that is claimed to improve the 'NO_CONTROL' situation. It is important that both the main controller and communications controller in the CAEN crate are at version 6.6. Upgrading is easy - it is just one EPROM in each controller to change. Spare EPROMs are in the top-left drawer for the desk next to the rack in Andre/Gareth's office in building 3000. The EPROMS are labelled:

  RINFRESCO 6.6   -  For the EPROMS for the main controller.
  COM 6.6         -  For the EPROMS for the communications controller.

One other cause of failure in the CAEN systems may be a failure of the EEPROM in the communications controller. This is used to store some settings, and may be reformatted from one of the menus (when a terminal is attached to the CAEN crate.) If, for example, the re-formatting only provides a temporary fix to problems, then maybe the EEPROM is failing.These EPROMs are of type (as copied off an example):

  Japan 9220
  HN58C65P-25
  R0432550

There is a spare one of these in the same drawer as the EPROMs referred to above.

Experience in the FCB suggests that it is worth reformatting the EEPROM after a power failure, at the start of the year and at other convenient times in order to reduce the number of 'NO CONTROLs' that might otherwise occur.

Gareth Smith, 28.4.98

How to reset an Actis card.

The Actis card does not have a reset button. However, there are a pair of pins located just behind the pair of LEDS on the card. Shorting between these pins momentarily will cause the card to reset.

(There should be 3 LEDs on the card. An upper pair, and one lower down below the network ('UTP') connection.)

Gareth Smith, 13.8.98

Netscape Bookmarks for SC_OPERATOR.

The Netscape bookmarks are important for the SC Operator. A simple system has been put in place to make sure a SC Maestro cannot loose the bookmarks (for example by incorrectly editing them).

As part of the SC_OPERATOR login process the file

    [SC_OPERATROR.NETSCAPE]BOOKMARKS_GOOD.HTML     is copied to be 
    [SC_OPERATROR.NETSCAPE]BOOKMARKS.HTML

If it is necessar\ay to change the bookmarks, either edit the BOOKMARKS_GOOD.HTML file, or use Netscape and to update the bookmarks and then do the copy the other way around.

Gareth Smith, 23.9.98

Recovering Corrupt Central Database Files

These notes were provided by Andre Augustinus. Note: They are in French. The main dtabase files can become corrupt if (for example) there is crash of one of the main VMS servers (AXDES1 or AXDES2).

Symptomes:

Problemes dans l'ecriture sur database, soit par SC (p.ex. TPC, ID), soit par des processes LEP (p.ex. process lcp_dbase reste en 'reconnect'). En plus un des processes lm_server_600 ou db_server_600 donne des messgaes d'erreur dans leur logfile (dans delphi$joblog:[cp]) du type: CRNKA700 001106.1303 KXXINP: Can not read record 141137 on unit 51 = IOS Wait and try again...

Solution:

est en principe automatique, dans le job lm_server_600 ou db_server_600. Ce qu'il fait est:

  1. mise a jour du database dans DISK$DATABASE:[DATABASE.DDB.AXDES1] depuis l'offline (comme on le fait pour t4 et delpit) {en fait ce DB est un des DB utilise par T4, donc d'habitude il est deja a jour}
  2. les fichiers du DB corrumpue sont sauves comme *.dat_sv
  3. les fichiers DB sont copie de DISK$DATABASE:[DATABASE.DDB.AXDES1] au DISK$DATABASE:[DATABASE.ODB600]
  4. Il faut relance les deux database servers a la main (et les jobs qui se sont plante a cause de ces problemes).

Problemes probable:

  1. les gens en shift ne laisse pas finir le recuperation et tuent lm_ ou db_server_600; la procedure de copier tout les fichiers est assez long (un petit heure...).
  2. puisque le procedure fait d'abord un sauvegarde des fichiers corrumpues, il y un forte probabilite qu'il n'y pas assez de place sur disque (il faut donc virer des fichiers) pour copier les nouvelles fichiers.

Ce que j'ai eu a faire donc a deux reprises c'est de finir de copier les fichiers DB a la main, puisque la procedure de recuperation automatique n'etait pas fini correctement.

Gareth Smith, 10.6.00