成功案例设置

This document describes a simplified process of the initial steps for data analysis, converting
acquired raw data to energy spectra, and performing a fit for the converted spectrum using
simulated particle-detector data. You will be given three data sets in pickled (Python)
format, and a snippet of code to illustrate how to load the data. You are expected to
analyze the data with your own Python code, and write your process and results in a report.
The report needs to take the form of a self-contained scientific paper, including a brief
introduction, description of the experimental setup and data collected – with the information
provided by this document, what you have done and the rationales behind them, the results
and conclusions.
The expected audience is your peer students with no knowledge of this project. All results
should be presented in forms of graphs or tables, clearly labelled with proper captions, and
referenced in the main text when appropriate. Appropriate references and citations
are mandatory. This applies to quoting information from this document as well; you are
expected to properly rephrase the information you quote from here, and cite this document.
Note: you are especially required to cite this document for information that you
cannot find elsewhere. For example, this document describes that each data set of a
pulse begins with 1000 quiet data points before the pulse was detected. This specific piece
of information cannot be found anywhere else, so it needs to be cited. Since this document
is not published (formally), it’s a bit awkward in how you reference it, but the citations
are more important since we know how to find this document. An adequate reference is
“PHY324 Data Analysis Project Documentation”.
This is an individual assignment. While you can get help from anyone you wish,
you must do the analysis yourself, make your own graphs, and write your own
report. You are encouraged to ask TAs or professors during class time about any part of
this assignment.
The ultimate goal of this project is to discover what particles are found in the
signal data and to report what energies they have. Your report is meant to mimic a
scientific journal publication.
Your report needs to include the following:
One graph showing a sample of data showing what a pulse looks like. Either a single
good example or multiple examples on top of each other (probably in different colours).
Two graphs showing your best energy estimator both before and after calibration (i.e.
once in V or mV, and once in keV).
One table summarizing all your energy estimators. See Table 2.
1
All your non-best energy estimator graphs in the appendix in the same before and
after calibration format as your best estimator graph. You should have between 6 and
8 pairs of graphs in total for your calibration step.
One graph of your energy spectrum of the ‘real’ data (as opposed to the calibration
data) with a best fit line. This should be ONLY the post-calibration graph (i.e. in
keV). Do not include a pre-calibration version of this graph.
An introduction and a methodology section explaining what the data is simulating and
how you analyzed it.
A results section for both the calibration process and the analysis of the ‘real’ data.
Clear, quantitative conclusions about how good your calibration process was and how
good your analysis of the ‘real’ data was. This must include goodness of fit results and
commentary.
A conclusion summarizing your report.
The starter code you are given does not make A+ graphs. Nor are any of the graphs in this
document A+. If you copy the style from either document, do not expect an A+. Please
try to improve the graphs.
Expect this assignment to take a long time, more than 20 hours total (possibly
more than 30 hours) even if things go relatively smoothly for you. At this risk of repeating
ourselves, please ask for help at any stage of this project.
Brief overview of the project: use some calibration data to figure out how to convert a pulse
into the energy (estimation) of the detected particle (with a known energy value). Then
apply this calibrated estimation algorithm to the signal data to discover something new.
2 Experiment and data
This experiment is conducted in a particle detection scenario. With a typical particle de-
tector, the sensor converts each particle incident energy into measurable electrical signals,
in the form of an excursion from the quiescent voltage. We recognize these excursions as
“pulses”. A typical pulse looks like Fig. 2. The shape of the pulse is often dictated by the
characterization of the detector and its readout electronics system, as well as the type of
energy deposition. In this scenario, we assume a single species of energy deposition, namely
electron-recoils in the detector material caused by high energy photons, either through pho-
toabsorption effect, or through Compton scattering. With a specific detector setup, this
leads to a fixed pulse shape. The detector and its readout circuit is characterized to give a
20 µs rise time (τrise) and a 80 µs fall time (τfall) to the pulses, following the functional form
of
y = A ∗ C ∗ (e−t/τrise − e−t/τfall), (1)
where
C =
(
τfall
τrise
)− τrise
τfall−τrise ·
(
τrise − τfall
τfall
)
(2)
2
Figure 1: An idealized pulse with no noise. Each pulse is stored as 4096 voltage samples
readout by a 1 MHz data acquisition system. The pulse onset is 1 ms (1000 samples) from
the beginning.
is a normalization factor so that the term without the scale-able amplitude A has an ampli-
tude of unity. The amplitude of the pulse, A, is varying as a function of the energy deposited
in the detector. Here we have a detector with a perfectly linear response, meaning A is pro-
portional to the energy the detector senses. An idealized pulse shape with no noise is shown
in Fig. 1. Note that the pulse starts at 1 ms. We generally refer to the area before the pulse
onset, from 0 ms to 1 ms in this case, as the “pre-pulse” region.
A realistic pulse, as shown in Fig. 2, is taken with an acquisition system that measures the
voltage output of the detector at 1 MHz rate. With every particle incident, the system
stores 4096 samples of the voltage measurements as a “trace”, thanks to a trigger system.
The idealized trigger system senses each pulse onset, and position the start of the pulse at
the 1000 sample of the corresponding trace. The system also adjusts the quiescent voltage
to around 0 V, though due to the presence of low-frequency noise, the quiescent voltage
does fluctuate a bit. As usual, noise is present in the voltage readout. These noises can
be attributed to sources intrinsic and extrinsic to the detector and its readout circuit. The
intrinsic noise sources include noise caused by electron random motions, instability caused
by temperature fluctuations, noise induced by the readout circuit, etc. The extrinsic noise
sources include the instability in the power of the system and its corresponding ground,
pickups from the environmental electromagnetic waves, etc. To quantify the effect of the
noise, we acquired a set of data with only noise.
The noise is superpositioned with the pulses induced by energy deposited by the incident
particles, making it hard to reconstruct the size of the pulse and thus infer the energy of
sensed by the detector. In some scenarios, it also makes detecting tiny pulses impossible,
though such a scenario is beyond the scope of this analysis, thanks to the idealized trigger
system we employed. Said differently, the trigger efficiency of the system we constructed
here is 100% irrelevant to the energy deposition.
To quantify the detector response to energy deposition, we also took a set of “calibration”
data. This set of data is acquired by exposing the detector to a known calibration source
emitting 10 keV photons, with the knowledge that the photons will interact with the detector
3
Figure 2: An example pulse from the detector used in this project. Each pulse is stored as
4096 voltage samples readout by a 1 MHz data acquisition system. The pulse onset is 1 ms
(1000 samples) from the beginning. The quiescent voltage is adjusted to be around 0 V,
though fluctuating due to low frequency noise present in the system. A typical pulse has a
20 µs rise time, and a 80 µs fall time. The pulse is superpositioned on top of noises caused by
assorted physical phenomena, including electron random motions, pickup of environmental
electromagnetic waves, etc.
through photoabsorption process, thus depositing all of its energy. We note that despite that
this is supposed to be a calibration data, background events persist, and are indistinguish-
able from the calibration events on an event-by-event basis. Background events are often
caused by radioactive isotopes in the environment or in the detector itself. In our data, it
presents itself as a group of events with energies distributed randomly in our energy Region
of Interest (ROI) of 0-20 keV. Thus, the deposited energy spectrum of this calibration data is
a narrow Gaussian peak at 10 keV from the calibration source on top of a uniform distribu-
tion caused by backgrounds. However, noise will broaden the Gaussian peak, and the size of
the broadening depends heavily on the “energy estimator” we use to estimate the size of the
pulse. Finding an optimized energy estimator is often a critical step for data analyses. We
will need to explore a few energy estimators, calibrate each individually, assess the energy
resolution of them, and use the best one we could find.
After calibrating the detector and the energy estimator, we can then expose the detector to
the “signal source” we want to measure, and extract the energy spectrum. We did so and took
another set of data. For this set of data, we also limited the ROI to 0-20 keV. We note that
background is also present while we measure the signal source and remains indistinguishable
on an event-by-event basis. Luckily, it remains an uniform distribution across our ROI. The
ultimate goal of this measurement is to reconstruct the energy spectrum measured from this
signal source, and attempt to fit it with a functional form. Typically a followup analysis
would use this fitted functional form to extract physical information of the signal source.
Such a followup analysis is beyond the scope of this project.
4
3 Tasks breakdown
3.1 Brief Version
There are 6 basic energy estimators you must implement. None of them are amazing, and
some are terrible. Ideally, you can figure out a 7th energy estimator which combines various
properties of the 6 basic estimators which will yield a much better energy estimator. Note
that a good energy estimator has a narrow standard deviation as measured in keV. The 6
basic energy estimators are:
1. Maximum value minus minimum value.
2. Maximum value minus baseline average.
3. Sum of all values.
4. Sum of all (values minus baseline average). That is, first subtract the baseline average
from all the data values, then sum the result.
5. Sum of just the values in the pulse.
6. Fit of equation (1).
The baseline is the first nearly 1000 data points before the pulse happens. For each detected
particle, that’s the best estimate of “nothing happening” you can get. You should notice
that “nothing happening” rarely looks like nothing actually happening due to the noise of
the detector.
You should apply your best estimator to the noise data set, but use the same calibration
factor determined from the calibration data. The noise data should not have any particles,
so anything your estimator “finds” in the noise is just noise. This should give you an idea
of the range of sensitivity of your detector.
Finally, you should apply your best estimator to the signal data set. Again, use the same
calibration factor you found during the calibration step. You calibrate it once (using the
calibration data), then apply the result two more times (noise and signal data).
3.2 Longer Version
While the previous section contains all the information about this project, here are a few
steps we suggest you to follow to achieve the final goal of fitting the spectrum from the signal
source with a functional form.
1. It is always a good practice to have a well-established starting point of any project.
As a first step, make sure you can successfully read the data sets provided. A snippet
of code is provided in Appendix A of this document; a longer and more useful Python
code can be found on the course website. Reproducing Fig. 2 will establish that you
can successfully load the data. One hint is that Fig. 2 was produced with the event
labelled as “evt 2” in the calibration data set.
5
Figure 3: Example histogram of the energy estimator reconstructed from the calibration
data. Note a flat background component and a Gaussian peak corresponding to the 10 keV
energy depositions from the calibration source.
2. The next step would be to establish an energy calibration of the detector. An energy
calibration is a relation between the size of the pulse and the energy deposited in the
detector. This can be achieved by reconstructing an energy estimator from each pulse in
the calibration data, making a histogram of them, and identifying the structure in such
a histogram – in this case a Gaussian peak due to the 10 keV calibration source. Fig. 3
shows an example of this histogram. Then you can perform a fit on the histogram,
with a Gaussian function on top of a flat background (i.e. a Gaussian function plus a
constant). The fit can be a chi-squared fit, with each bin of the histogram following a
Poisson distribution, approximated by a Gaussian distribution. For each chi-squared
fit, we need to establish the fit quality, by evaluating the chi-squared probability based
on chi-squared calculated and its degree of freedom. The mean of the Gaussian function
corresponds to 10 keV in energy. The whole spectrum can thus be converted to energy
by multiplying the original data by the appropriate scale factor with units keV/V or
keV/mV. The width of the Gaussian function, after converting to energy, is
the resolution of this detector using this energy estimator.
3. Commonly used energy estimators are often either the amplitude of the pulse, or the
integral of it. How to estimate the amplitude or the integral can be tricky though.
(a) The simplest amplitude estimation can be max-min (maximum value subtract
the minimum value), or max-baseline, where baseline can be estimated with the
average of the pre-pulse region. Give both estimations a try, and show their
performances in your report.
(b) A simple integral can be the sum across the whole trace. A baseline subtraction
might be able to enhance the performance a bit, just like the amplitude. Further-
more, limiting the range of the integral can also improve the performance. Give
all three options a go, and show their performances in your report.
6
(c) A more sophisticated way to estimate the amplitude of a pulse can involve a
chi-squared fit of the pulse to a known shape – in this case a pulse shape with
a 20 µs rise time and a 80 µs fall time as in Eqn. ?? is well justified. The
uncertainties of each voltage measurement can be estimated with the noise data
with the averaged standard deviations of the traces. However, this is not ideal
either – as the underlying assumption for a chi-squared fit is not strictly satisfied
in this scenario, due to correlations in the noise.
(d) The above six methods are mandatory. You can do better than them by
mixing and matching some of the ideas with each other. This is optional
but highly recommended.