All of this data must be read out the VME crate in a timely fashion and
packed together to be handed off to the next program in the chain. The
tasks of the collector are to get the data out of the VME crate as
quickly as possible, then to get rid of that
batch of data so it is free to read some more.
1.2 VME Bus Addressing
The VME bus has some special features that make addressing VME modules
slightly different than other buses. Under the VME specification
[4], the 16, 24, and 32 bit address spaces are actually
independent to avoid having smaller addresses repeat themselves up
through the address space of modules which use longer addresses. The
address space is selected by use of an address modifier code which is
placed on 7 additional address lines on the VME bus. Accessing a VME
address is a two-step process , first the correct address modifier code is
written to the bus, then the address may be accessed.
The data transfer may also be configured. This is usually done by a
hardware jumper setting on the VME module. The data width may be
configured to 8, 16, or 32 bit words. A burst mode is also allowed under
the VME specification.
1.3 Collector Program Flow
In order to readout all the data from the front end electronics modules in
a timely manner, the collector program must follow a strict procedure. A
flow chart showing the general program flow is shown
here.
The main parts of the program are initialization, run control, V533
reading loop, DPM reading, NOVA memory management, and data quality
monitoring.
Following the Bit 3 Adaptor initialization, the network shared memory is set up. This first initializes the message server that handles messages sent from the collector to the main host computer. Then a region of shared memory is allocated and structured according to definitions known to all networked online workstations.
The VME-SG GPS board is also initialized at this point. The board must be instructed to look at the DC-shifted IRIG-B input signal. Putting this initialization here allows the board to phase-lock to the input signal before the run is actually activated. This avoids the problem of having the first several hundred events of a run having false GPS time while the unit attempts to lock to the reference signal. The GPS board is described further in Chapter 2.
The SETUP stage is for initializing front end hardware. At the end of the SETUP stage the entire detector must be completely ready to take data. For the anticounter, this involves reinitializing the DC2 modules which write into the Dual Port Memories, opening run log files, and initializing the VME modules. The FSCC modules have already been rebooted. The V533 modules must be programmed to the correct pipeline length and transfer size (see chapter 3). The gps module must be programmed to use the DC-shifted IRIG-B pulse and an interrupt generated on the workstation when it is triggered (see chapter 2).
It turns out to be important that the DC2 modules are initialized after the FSCCs are completely rebooted. This is because when the DC2s are initialized first, the FSCC reboot can cause a fake ``event'' to be latched into the DPMs. This appears as a corruption of the data in the first NOVA buffer to be sorted.
Only after all workstation report SETUP_OK via their shared memory flags will the main host computer allow the shift worker to start the actual run. When he or she presses the start button, an ACTIVATE flag is flashed to all workstations. The trigger host begins delivering triggers and all workstations are expected to begin gathering data and feeding it to the main host event builder.
Two other shared memory flags are commonly used for run control. The STOP flag instructs each process to end normal data taking and revert to the INIT state in preparation for the start of a new run. The ABORT flag is a special cue that causes each online process to die abruptly.
Processes connect to the NOVA daemon using the nova_open() call. This registers the process with a priority. The nova_get() and nova_put() calls are used to request access to a shared memory block and then to release control of it when the process is finished. The first time a block is requested, a large default size is allocated. When the first process is finished, it may snip the buffer to the actual length used. Subsequent processes receive only the smaller block. Processes may always shorten the length of the buffer they put out, but can never expand the buffer size once it has been snipped. The unused memory after a snip is returned to the shared memory pool to be reused in the next available buffer.
The choice of default size for the allocated blocks must be made carefully. In the OD software, it was found that the collector was unable to read out the front end electronics for periods on the order of tens of seconds because it had to wait excessively long for too large a default block of memory.
Processes are assigned priorities, buffers are given significance. The priority determines the order in which a buffer is delivered to various processes. The significance determines whether a buffer is delivered to all processes. If the priority plus the significance is greater than a given threshold, the buffer is always delivered to the process. If not, the buffer may skip the process.
The FIFO reading loop is the heart of the collector program. It actually consists of two nested loops, creatively called ``inner'' and ``outer''. The outer loop continues until the number of events read exceeds 256. Data is actually read in inner loop, which exits only when there is no more to be read from the FIFO Multievent Buffers. This structure causes the fifo data to be read in bursts. Events pile up in the fifos, then the inner loop reads continously until they are empty. This allows the collector to spend less cpu time polling the status of the fifos, and ensures that when it switches to other tasks, such as reading out the dual port memory modules, the fifo buffers are completely empty and available for storing new events.
While the number of events has not yet exceeded 256, collector uses periods during which the fifo buffers are empty to either read the gps board if a new timestamp has been signaled, or to simply sleep, allowing other running programs to share the cpu.
Only fifo 1 is polled for data availability. Since all 3 modules are triggered together, this is enough. When data is available in the Multievent buffers, collector reads 3 words from each module. This corresponds to the minimum number of 32-bit words copied from the pipeline into the Multievent buffer at each trigger. All 3 modules are programmed to use this minimum number. The middle word is the one that will eventually be put into the data stream. This is determined by programming the fifo pipeline length described in Chapter 3.
Several data quality checks are made at this point. Fifo modules 1 and 3 are latching the local time clock. The 3 words latched by each module must be consecutive values of the 32-bit local time. Further, the same time should be latched by both modules. It turns out that this is not precisely true. Even though both modules receive the same input and the same trigger, the internal delays of the two modules are slightly different and the time value can vary by up to 1 bin, but no farther. The 50MHz Local Time Clock is both the input and the clock signal driving the latch and the chips on the V533s are not quite fast enough to guarantee a response within the 20ns bin of the LTC. The actual condition used in the check is that the two latched times must be within 1 bin of each other.
Another check is that one and only one of the BIP and GLB trigger flags latched by fifo 2 must be set for every event. This can be violated in the case where a global trigger just happens to arrive at the same time as a trigger from the end-of-bip strobe. A warning is generated, but no other action is taken. Also, the lower 16 bits of the data latched by fifo 2 must be identical in all 3 of the words for each event as these bits are the event number and should not change until more than 200ns after the trigger. This can be violated for the same reason as the previous check.
Once the data is read out of all 3 modules, the next step depends on whether the BIP or GLB trigger flag is set for the event. These flags are latched into the upper bits of fifo 2. If the event is a global (GBL) trigger, the LTC time, the event number, the GPS time, LTC time at last GPS update, and the status bits from the upper bits of fifo 2 are all copied into the appropriate locations of the header structure for the current event. An array of header structures is maintained and simply copied into the NOVA buffer after the fifo reading loop. The count of events read is updated for every global trigger.
If the event was a BIP trigger, the LTC recorded in Fifo 3 is copied into the ltcbip location in the header structure corresponding to the last event which caused a common stop in the TDCs. The idea is that if there are multiple triggers in the window, each trigger does not contribute additional deadtime. The deadtime information should only be placed in the header for the event which caused the common stop. The trick is deciding which event caused the common stop. Since all fifo modules are triggered together, the following rules can be followed:
The Fifo reading loop in the separated BIP trigger case is largely the same as when the BIP and GLB triggers are combined.
The main difference in the separated case is that all 3 modules can no longer be read out together because fifo 3 is triggered exclusively by the BIP trigger and may or may not have data available when the two fifos triggered by the global trigger area read out. Fifo 3 must be read out as data is available. This means that events are no longer guaranteed to be sequential. It is possible that we could read out several global triggers from fifos 1 and 2, followed by several BIP trigger events from fifo 3. The task is to insert the deadtime measurements into the proper headers without relying on the order in which the data is read. The main differences will be the assumptions about reading data from fifo 3, the loop exit conditions, and the method for determining which event caused a TDC common stop.
The structure of the outer and inner fifo loops stays the same in this case. The inner loop is entered is data is reported available in fifo 1. In the inner loop, 1 event is read from fifos 1 and 2. The GPS status bits (which give the quality of the reference lock) are OR'd into the V533 status bits and copied with the LTC, Event number, GPS time, and LTC at last GPS update, into the header structure for the current event. The number of events is updated. Fifo 3 is then checked for data available and read out if available. The LTCBIP that is read out is copied into the header structure for the last event designated as having caused a common stop of the TDCs. The loops exits through the outer loop when no global trigger data is available in fifos 1 and 2. This prevents multiple triggers from being split across NOVA buffers. Also, the number of events must be greater than 256 and the last read LTCBIP must be later than any previous LTCTRG. This assures that the last event has its deadtime properly accounted for.
The rules for determining which event caused a common stop are almost identical to the previous case with one important difference. Since the order of readout cannot be relied on, collector must use the latched local time clock for each type of trigger. An event is considered to have caused a common stop of the TDCs if it is the first global trigger (in time) to be latched after the time of a bip trigger. The effect is the same as before, but now the events must be sorted on the local time clock value in order to accomplish it. Special attention has to be paid to events near the rollover of the local time clock.
Reading out the dual port memory modules involves telling the DC2s to finish writing the current event and to switch to the second DPM of the pair for each quadrant.
The data is simply copied out of the modules into the NOVA buffer in block copy mode. Collector never sees anything related to the individual events in this data.
A quick check is then made for a flag file. If the file exists, collector will do another reboot of the FSCCs and reinitialize the DC2s. This is the so-called ``manual'' or ``pushbutton'' reboot that allows the shift worker to cause a reboot on demand using a single command from the main host.
Various diagnostics are printed out about the buffer. Several more checks are made of the header data at this point. GPS times are tested to see if they are in range, TDC deadtimes are calculated and checked for sanity as well as other parts of the header.
When all this is complete, the NOVA buffer is ``put'' out to the sorter and a new buffer is retrieved so the entire process can start all over again.
The XL-DC receiver module outputs a time code on a fiber optic cable. This cable runs approximately 2km down the Atotsu mine tunnel and terminates in the central hut anticounter VME crate. There is connects to a simple VME board which receives the laser signal on the fiber optic input connector and converts it to RS-485. The converted signal is passed through a connector on the back side of the VME backplane to a TrueTime VME-SG Timing Card [5].
The VME-SG is designed to receive a time signal from a satellite antenna and lock on to that signal so that the internal time stays accurate even through short periods of a missing or noisy signal. The antenna is physically connected via a fiber optic cable which connects to the fiber optic board in the VME crate. On this board, the laser signal is converted to RS-485 and fed into the GPS board on a homemade connector in the *back* of the VME crate, on the VSB side. This explains why you don't see a connection on the front.
The signal arriving from the antenna is a standard serial communications format known in the GPS business as IRIG-B. It comes in two flavors in this device; AM -- amplitude modulated, or DC-shifted. The AM version signal has a (10kHz?) carrier whose amplitude goes fluctuates up or down to carry the data information, just like your radio. The DC-shifted version uses a different voltage level, and has square pulses whose *width* carries the information. The choice is based on how the signal is sent, i.e., it easy to turn a laser on or off at given times, harder to make the intensity wiggle. Once the bits are plucked out of the carrier method, the format of the data is what is known as IRIG-B.
The fiber optic board delivers the DC-shifted flavor of IRIG-B. This is
taken from the connector in the back of the VME crate. However, the gps
board assumes as a default that the AM flavor will be used, taken from
the front panel BNC. For some reason it can't autodetect where the
signal is coming from so the program is required to set a bit in a
configuration register to tell the board to use the DC-shifted signal.
Once this is done, the signal is detected and the board proceeds to phase
lock to the IRIG-B input. When that is accomplished, absolute time accurate
to 1µs available.
UTC Readout
So how do we read it out? The GPS board can be programmed to generate
an interrupt from several sources. The one of interest to us is called
the External Event input. This means that somebody outside sticks a
pulse in, causing the current time to be "frozen" in a set of registers,
and generates an interrupt on the VME bus which is carried to the Sun
via the Bit3 cable, and handled there.
The signal we use for an external event is one of the upper bits of the local time clock. There is a set of jumpers on the ltc which allow one to select one of the upper bits to be carried through a driver chip and onto an external line. I selected bit 29, counting from 0 to 31. It is the 3rd most significant bit. The clock pulse width is 20ns, so 2**29 * 20ns is a little over 10 seconds. This is the width of a 1 or 0 for that bit, the next rising edge actually occurs after a 1 AND a 0, so every 21 seconds. This signal comes out of the back of the VME crate and goes on a twisted pair to the external event input in the back of the gps board. There is an ext evt input on a BNC connector on the gps board front panel,but the cabling was simpler to just use the back of the crate.
So what happens?
Every 20 seconds, the gps board gets a rising edge on the ext event input. This freezes the current time in a set of "freeze" registers. Now, if one has configured an interrupt on external event, and if the master interrupt enable has been set, then the gps board will generate a VME interrupt on the VME backplane. (see setup_gps() in collector).
The bit3 has its jumpers set to pass through IRQ1--4 on its cable. This lets the VME interrupt generate an interrupt on the Sbus in the SUN. An interrupt handler on the sun (see the gps routines i added to bts_stub.c in /usr/local/bit3/944/v3.2/src) recognizes the interrupt, tells the Bit3 adaptor on the VME side to start a VME interrupt acknowledge cycle, and sends a UNIX signal (now SIGUSR2) to any process which has registered for it. Collector has done this in setup_gps(), so it receives the signal SIGUSR2 as notification that the gps board has received an external event.
confused? so where are we...
The ltc clock upper bit rolled over and "froze" the gps time register. This generated an interrupt whose end result was to send a signal to collector as notification. Since the time is "frozen" in the register, we actually have 20 seconds to read it out before we might miss another ltc signal. So time is not critical. When the signal arrives at collector a flag is set and at the next convenient opportunity (i.e., end of the buffer), the gps is read out. The ltc at the time of the signal arrival is stored, we know which bit rolled over anyway, so the exact time we store the ltc is irrelevant. The ltc signaled the gps at, (in binary), XX100000000000000000000000000000 or whatever.
When the collector reads out the "freeze" registers, it converts the time into a timeval structure containing seconds and microseconds. Note that GPS does not supply the year, the information is simply missing. So at that point, the time is seconds since 12:00am Jan 1 of the current year. I attempted to help fix this by adding in an offset to the start of UNIX time (UNIX's "birthday") so that the time structure could be read naturally with the usual UNIX time routines, at least the date through seconds part. The microseconds is handled separately anyway. If this offset is not added, then a naive use of the UNIX time calls reveals that the data was taken sometime in early January of 1970. not a good thing.
When the offset is calculated, it goes to the start of the current year as determined by the system clock on sukant. This offset does not depend on whether sukant's time is correct, only the year. UNIX handles leap years, etc., for all years prior to the current year in this offset. The UTC from the GPS receiver is only from the start of the current year, but leap years, etc., are handled by the satellites as well, so all this is handled for the current year and years past.
The ltc clock is not "disciplined" at all. At least in the sense that there is no feedback whatever to the ltc clock. All we do is monitor that clock and watch it drift around compared to the gps clock. As long as this is monitored, it is ok for the clock to be free running.
The CAEN V533 modules contain a 32 bit differential TTL pipeline clocked by the external 50MHz signal from the Local Time Clock. The pipeline is a circular buffer whose length can be programmed up to 255 words. At an external trigger signal, a certain number of words can be removed from the circular buffer and stored in a separate fifo without stopping the aquisition of data into the pipeline. Programming the pipeline length allows the user to select how far back in the pipeline to look for the desired words when the trigger arrives. Effectively, it allows one to look back in time from the trigger signal [2].
As used in the SuperKamiokande experiment, the V533 modules are set up to latch 3 32 bit words around the trigger signal, the minimum allowed. This means that data from up to 85 triggers may be stored in the V533 module without overflowing it and stopping the aquisition. At present, the triggers are an OR of the TRG global trigger signal from the inner detector TRG module, and a strobe signal formed from the end of the BIP from the FSCCs. This means that the V533s trigger nearly twice per event, once from the initial global trigger and once from the strobe signalling the end of the busy period for that event. This second triggers allows latching of the Local Time Clock and a a calculation of the dead time for that event. The combined trigger is not exactly twice the global TRG rate because for some events, there are multiple TRG signals recorded in a single TDC window, with a single dead time.
The VME base address of the 3 V533 Modules are:
| Module | Address | Use |
|---|---|---|
| 1 | 0xf10000 | Local Time Clock |
| 2 | 0xf20000 | Event number, Bip flag |
| 3 | 0xf30000 | Status Flags, Busy bits |
The bit usage of the 3 modules is:
| Module | Mask Used | For |
|---|---|---|
| 1 | 0xffffffff | LTCTRG |
| 2 | 0x00020000 | Bip Flag |
| 2 | 0x0000ffff | Event Number |
| 3 | 0x0000000f | LTCBIP |
The modules are initialized by a sequence of read and writes to onboard registers. The steps to initialize the module are as follows:
Disable the aquisition
Set the LOAD line to 1
RESET the Module
Perform a CLOCK operation
Write the value (255 - Desired Pipeline Length)
to the Writing Register
Perform 2 CLOCK operations
Write the value (255 - Desired Pipeline Length)
to the Writing Register
Perform a CLOCK operation
Write the value 0 to the Writing Register
Perform a CLOCK operation
Set the LOAD line to 0
Set the Configuration register
enable external busy and transfer 3 words
from the pipeline each trigger
Perform 2 readouts of the FIFO data registers
These steps are reproduced here because the description in the V533
manual is incorrect [2]. CAEN is aware of the error,
but an updated manual has not been published.In the SuperKamiokande experiment, the Desired Pipeline Length is 0. This means that the words latched by the module are those directly around the trigger as opposed to looking back up the pipeline for them. This sequence of operations will ensure that the module is properly setup and ready to use. The only remaining operation is to re-enable the external clock and trigger signal (enable the aquisition) at an appropriate time.
Readout of the V533 modules is done by polling an onboard status register. Since the boards are all triggered together, it is sufficient to poll only one board's register. The status register contains a bit which is set if the data memory is not empty. This can be used to tell whether to read out the 3 module's multievent buffers or not. If so, they are read continuously until the data memory not empty bit is unset, meaning there is no data left to be read.
If the modules are not read out quickly enough, events pile up in the multievent buffer. If more than 85 triggers (about 42 global triggers with their associated bip triggers) arrive before the multievent buffer is read out, a bit is set in the status register and the aquisition is halted until the buffer is read out. This case is an error and should never happen. An error in collector that forced it to wait indefinitely for a new NOVA buffer did allow this error to happen. After correcting this problem the busy error is extremely rare. If or when it ever happens, it is likely related to activity on the workstation that slows down or pauses the collector.
It is planned to re-arrange the usage of the V533 modules so that one module will be used to latch the local time clock at the bip trigger while the other modules trigger only on the global trigger. This will nearly double the possible rate of the global trigger because no module will have to trigger on both the global trigger and the bip trigger.