1. Introduction
Ultrasound (US) is an interactive non-invasive imaging technique that provides quantitative information on anatomical districts through the propagation of ultrasound waves in soft tissues. Major US advantages compared with other imaging techniques, e.g., computed tomography or magnetic resonance, are its ease of use, real-time imaging, cost-effectiveness, portability, and patient safety [
1,
2]. In the last decades, active research in the US field has led to advancements in transducer technology and digital electronics with a consequent improvement of diagnostic information content [
2,
3]. Therefore, the US technique is applied by clinicians from different medical fields to provide diagnosis and treatment [
4,
5,
6,
7,
8,
9]. As a consequence, the use of US devices increased in recent years, and the worldwide market for medical ultrasound is projected to reach USD 8.4 billion in 2023, with an average annual growth rate of roughly 5.9% [
10].
Color Doppler (CD) imaging, developed in the 1980s, allows the 2D real-time representation of blood flow superimposed on the anatomical image [
1,
2,
11,
12,
13]. A color map codes and quantifies the velocity of blood flow inside a region of interest (or color box) adjusted by the operator on the B-mode grayscale ultrasound image as a function of the clinical requirements. Currently, CD is among the most widely used techniques in the medical field [
1,
2] since it is a powerful tool that allows hemodynamic monitoring and the visualization of the flow patterns in blood vessels. However, in the scientific community, controversy about whether the CD technique provides qualitative diagnostic data—non-repeatable and subjective estimations—rather than quantitative information—repeatable and objective measurements—of flow velocity still exists. This disagreement may be justified by high CD measurement uncertainties that can reach up to 50% [
14]. Moreover, it is worth pointing out that a commonly accepted worldwide standard for Doppler ultrasound equipment testing has not been developed yet [
15,
16,
17]. Attempts to define theoretical and experimental methods for medical US equipment Quality Assessment (QA) were made by several national and international organizations [
16] over the years, with the consequent investigation of suitable tests for B-mode imaging, as well-documented in the literature [
18,
19,
20,
21,
22,
23]. Among these professional organizations, the American Institute of Ultrasound in Medicine (AIUM), the American Association of Physicists in Medicine (AAPM), the American College of Radiology (ACR), the European Federation of Societies for Ultrasound in Medicine and Biology (EFSUMB), and the Institute of Physics and Engineering (IPEM) are included.
Nowadays, although the demand for proper QA protocols has increased in the last years [
16,
24,
25,
26,
27,
28], performance evaluation of Doppler systems is still an open issue in the scientific research field. This is mostly due to the lack of consensus among the professional bodies about the US system configuration settings, as well as which and how many quality parameters to be processed and included in a Quality Control (QC) program for Doppler testing [
16]. In this regard, the wide range of Doppler performance parameters proposed in the literature [
16,
25] often represents a considerable burden that requires an approach summarizing all their contributions in a few meaningful quantities that can be easily and quickly interpreted by the technician. This critical aspect is very common in several scientific fields where an effective representation of multivariate data is needed, and it is often achieved by means of a Kiviat diagram (or Kiviat plot, spider plot) [
29,
30]. This type of plot is characterized by a series of spokes projecting from a center point, with each spoke representing a different variable axis. The values of the variables are encompassed into the spoke length, and the plotted values are connected to form a polygon. The shape of the Kiviat diagram makes it easy to visualize and useful to compare different variables in a single graphical plot, especially when a reference or gold standard polygon is included. Nowadays, it is considered a useful comparative tool for outcome metrics since it allows both to convey a large amount of information and provide a standardized overview of different indicators [
29,
31]. In this regard, the Kiviat diagram could be a promising tool also in the assessment of Doppler system performance by integrating the outcomes of multiple meaningful test parameters.
Kiviat diagrams were introduced in the 1980s as a means for monitoring computer system hardware performance, and, to date, they are commonly used in several fields such as social sciences, economics, engineering, computing, and information technology and are mostly used as a tool for comparing performance metrics [
29]. Although the use of the Kiviat plot in health-related literature is not so widespread, some examples should be mentioned. For instance, Kiviat plots have found utility in presenting data related to performance benchmarking at the patient and hospital levels for orthopedics surgery [
31] or diagnostic performance of ultrasonography in patients with pneumonia [
32].
From the above considerations, the aim of the present study is to propose and investigate the first approach to the effective combination of five parameters to be included in a novel QA protocol for Color Doppler diagnostic systems based on Kiviat diagrams. The proposed approach would give a contribution to the field since it allows quantifying the overall Doppler performance of US systems according to a probe-setting pair. Performance data could be used both to compare US systems manufactured by different companies and monitor Doppler system degradation over time. The latter usually occurs as a slow and progressive worsening of the image quality that could negatively affect the accuracy and efficacy of clinical diagnosis [
33,
34].
Three brand-new ultrasound systems, each of them equipped with a phased and convex array probe, were tested in two configuration settings. In this first comparative study, CD performance was evaluated in terms of: blind angle [
35], registration error [
36], average maximum velocity sensitivity [
37], velocity measurements accuracy, and temporal resolution. These performance parameters, derived from QC tests already proposed in the literature [
16] and recommended by international organizations [
38,
39], allow for quantifying Color Doppler functionality. They were obtained from the post-processing of Color Doppler data by means of automatic and objective image analysis procedures, whose measurement uncertainty contribution was estimated through the implementation of Monte Carlo Simulations (MCSs). One of the main advantages of the methods proposed is the possibility to overcome the intrinsic limits of visually-assessed performance tests since several test parameters recommended by the abovementioned professional organizations are qualitatively defined and suffer from operator-related errors [
10,
28,
38,
39].
The study herein proposed is organized as follows:
Section 2 deals with the experimental setup adopted, the QA test parameters definition and description, as well as the normalization procedure proposed to combine and compare the outcomes retrieved. In
Section 3, the measurement uncertainty analysis of the implemented image analysis-based methods through MCSs is carried out. In
Section 4, experimental results are presented. In
Section 5, the obtained outcomes are discussed, and future research directions are highlighted. Finally, the conclusions are outlined in
Section 6.
3. Monte Carlo Simulation
The measurement uncertainty contribution due to the image analysis-based methods was estimated through the Monte Carlo Simulation [
47], a proper and robust tool already experienced in previous studies [
43,
48,
49]. An MCS with 10
4 iterations was run for each combination of test parameters, US systems, and probes, as well as configuration and phantom settings. The standard deviation (SD) from each MCS was then estimated and combined with the corresponding repeatability SD retrieved in
Section 2.2.
Uniform distributions, expressed as mean ± SD, were assigned to the variables influencing the assessment of the QC parameters investigated in this study (
Table 8). In the MCSs involving Color Doppler video processing, both the number of average images
N and the number of averaged frames
M were maintained constant throughout the iterations, while the frames to be averaged were randomized at each cycle without repetition among all the frames acquired in 3 s.
The distributions for the blind angle assessment were assigned in an analogous way to [
35], while those for the registration error assessment also included an input distribution associated with the brightness filter threshold whose standard deviation
σb was set to 6% of the mean value
μb. On the other hand, the same distributions were used for the assessment of both AMVS and VeMeA parameters. Finally, for temporal resolution parameter assessment, uniform distributions were assigned to the quantities in Equation (9), assuming for both
Acolor and
Atot a standard deviation set to 3% of the corresponding mean value.
4. Results
Experimental outcomes for each combination of test parameters, US systems, and probes, as well as configuration and phantom settings, are reported as mean ± SD in
Table 9,
Table 10,
Table 11,
Table 12 and
Table 13. Standard deviations were computed by combining
σBA,
σRE%,
σAMVS and
σVeMeA values with the corresponding ones estimated from MCSs. As regards the TR parameter, standard deviations were retrieved directly from the data distributions.
From blind angle outcomes (
Table 9), it can be noticed that the tested phased probes showed global compatibility between the two configuration settings by considering the same flow regime. Such compatibility was no longer guaranteed for the convex array probes, for which higher BA results were retrieved in configuration A than in B. Moreover, the results obtained for both the probes of system one showed, as expected, a decreasing trend for increasing flow rates, while a reversed trend was found for the convex array probe of system three in configuration B. As per system two in configuration A, the mean value retrieved at medium flow regime
QM was higher than the one at high flow regime
QH for both the probes, and the same behavior was also found for the phased array probe of system three. Finally, for the convex probe of system three in configuration A, blind angle results were compatible and did not show a specific trend.
As regards the percentage registration error, the outcomes obtained (
Table 10) for the phased array probes globally showed an increasing trend for increasing flow rates, while a well-defined behavior cannot be inferred for the convex array probes. Furthermore, RE
% results for system one were the closest to 0% among all three phased probe-system pairs in both configurations. On the other hand, system three, equipped with the convex array probe, showed results closer to the optimal value in configuration A only, probably due to the higher wall filter setting (
Table 2) included in its clinical preset.
AMVS outcomes (
Table 11) were retrieved among velocities belonging to the same flow regime, maintaining a fixed flow step of 1.5 mL·s
−1, as listed in
Table 3. They show a similar behavior between the two configurations independently of the US system for both probe models. The lowest sensitivity values that significantly deviate from one were obtained with the phased array probe of system two at a high flow rate regime
QH.
By focusing on VeMeA outcomes (
Table 12), an increasing trend for increasing flow rates was found for all the convex array probes, while a distinct behavior cannot be inferred for the phased array ones. They were generally compatible between configurations A and B, and for system one was noticed that the results obtained for the convex probe were always lower than the corresponding ones for the phased probe. On the other hand, independently of the probe model, system two showed a higher occurrence of results closest to the optimal value.
Finally, temporal resolution results (
Table 13) obtained for both probe models of all US systems showed, as expected, a decreasing trend for increasing Color Doppler line density setting. Moreover, by comparing each outcome in configuration A with the corresponding one in configuration B, higher TR values were always found in the latter configuration. This could probably be due to the reduction of both pre- and post-processing settings. Best outcomes (closest to 0.5) were found for system one and system two with the phased and convex array probes, respectively.
Experimental results were normalized according to the normalization steps described in
Section 2.3 to allow the combination of the five test parameters retrieved for each probe at the same phantom and system settings. This allowed their representation on Kiviat diagrams and the direct comparison with the gold standard for which all the normalized QA test parameters were set to one. Therefore, the area of each polygon was computed and used as an index to quantify the overall Doppler performance of the US systems depending on the probe-configuration pair: the greater the polygon area, the higher the Doppler system performance. For ease of interpretation, the areas of the diagrams were normalized with respect to the total area of the gold standard. In this perspective, the normalized area was expected to be as close as possible to one.
Kiviat diagrams for systems one, two, and three equipped with phased and convex array probes in configurations A and B are shown in
Figure 10 and
Figure 11. In particular, the QA parameters retrieved at a high flow rate
QH (
Table 14) were used for the diagrams plot of the phased array probes since this model, preferred for echocardiography, is designed to detect high blood velocities [
1]. On the other hand, the QA parameters at medium flow rate
QM (
Table 15) were used for the diagrams plot of the convex array probes since this model is typically designed for abdominal imaging [
1]. As regards the TR parameter, results obtained at medium CD line density setting
LDM were considered for both probe models. Alongside the Kiviat diagram plot, the normalized mean area
S* and the corresponding standard deviation
σS* were computed (
Figure 10 and
Figure 11). The latter was estimated through the error propagation law.
Finally, as regards the normalized areas (
Table 14 and
Table 15), compatible performance was found between the two configurations for both probe models of systems one and three. As regards system two, a higher area was obtained in configurations A and B for the phased and convex array probes, respectively. By focusing on the phased array probes, system one showed the highest diagram area independently of the configuration setting (0.41 ± 0.07 and 0.45 ± 0.07 in A and B, respectively), while the lowest one was found for system two in configuration B (0.23 ± 0.03). On the other hand, the highest and lowest areas for the convex array probes were found for system two in configuration B (0.45 ± 0.06) and A (0.25 ± 0.05), respectively.
5. Discussion
The present study is proposed as a first approach to the combination of five Doppler test parameters based on the Kiviat diagram to quantify the performance of the US systems according to a probe-setting pair. As a first attempt, the diagram area normalized with respect to the gold standard was assumed as an index of the overall Color Doppler system performance. The assessed parameters were the blind angle, registration error, average maximum velocity sensitivity, velocity measurements accuracy, and temporal resolution. They were objectively assessed through custom-written image analysis-based methods and procedures (
Figure 1,
Figure 3,
Figure 5,
Figure 8 and
Figure 9) and then normalized in the same range for the graphical representation. Three brand-new ultrasound systems, equipped with a phased and convex array probe each, were tested in two configuration settings at different flow rate regimes set on a Doppler reference device (
Table 2 and
Table 3).
As regards the results obtained for each single test parameter (
Table 9,
Table 10,
Table 11,
Table 12 and
Table 13), it should be noticed that independently of the US system tested, BA outcomes retrieved for the phased array probes were the closest to the optimal value. By comparing the US systems, better results (closest to 0) were found for both probe models of system one independently of the configuration setting. By focusing on the percentage registration error, the phased array probes globally showed better results (closest to 0) with respect to the convex array one for both systems one and two in configuration A. Moreover, independently of the configuration, RE
% results for system one were the closest to the optimal value among all three phased probe-system pairs. On the other hand, AMVS results obtained for the probes of the three US systems are globally compatible among them at both configurations. However, it should also be noted that the sensitivity index is the one showing the highest SD values among the proposed QA test parameters. By considering the VeMeA parameter, the results were generally compatible between the two configurations and independent of the probe model. Lastly, temporal resolution results for both probe models of all US systems always showed, as expected, higher TR values in configuration B, probably due to the reduction of pre- and post-processing settings. Best outcomes (closest to 0.5) were found for system one and system two with the phased and convex array probes, respectively. Moreover, SD values were almost constant for all the tested phased probes in both configurations, while a limited increment was found for some convex probes.
The use of Kiviat diagrams allowed combining the quality parameters (
Figure 10 and
Figure 11) and estimating a single index (normalized diagram area) that provided a more immediate assessment of the CD system quality. QA parameters assessed at high and medium flow rates were used for the diagrams plot of the phased and the convex array probes, respectively. Conversely, temporal resolution results at a medium number of CD scan lines were considered for both probe models. For these cases, the outcomes (
Table 14 and
Table 15) confirmed that a higher polygon area was found for the probe-system pair showing higher values of the test parameters discussed above (e.g., phased array probe of system one in both configurations). Moreover, diagrams with comparable areas corresponded to US systems whose test parameters showed compatibility. These aspects suggest that the Kiviat diagram may be a useful tool for US system assessment since it seems to be directly related to the system performance. Globally, the normalized areas did not show, as expected, significant discrepancies among them since the US systems tested in this study were all brand-new systems at the same technology level. As a last remark, it should be noted that the normalized area of the diagram, together with its shape, has the advantage of preserving the relationship among the test parameters with respect to other mathematical operators, such as the arithmetic or geometric mean of the test parameters. Moreover, the Kiviat plot could provide the technician with a quick overview of the values of the single parameters highlighting both weaknesses and strengths of the Doppler system under testing and allowing the US system performance monitoring over time.
Despite the promising results, the present study is a first attempt at the use of the Kiviat diagram applied to QCs of Doppler equipment. Therefore, further investigations should be performed to assess the sensitivity as well as determine the specificity of the proposed approach. In particular, studies aimed to estimate how much the variation (due to the US system deterioration) of one (or more) of the quality parameters affects the diagram area are going to be carried out. On the other hand, US systems that have been used in the clinical setting for a few years should be tested, and the areas of their Kiviat diagrams should be compared with the ones retrieved for brand-new systems at the same technology level. This could be useful to understand whether the proposed approach is able to detect significant discrepancies among the areas of the diagrams due to an objectively evident state of deterioration. As a last remark, further investigations may include the deepening of the relationship among the QA parameters and how it could affect the shape of the Kiviat diagram.