This function uses two disclosure control methods to generate non-disclosive coordinates that are returned to the client that generates the non-disclosive scatter plots.

scatterPlotDS(x, y, method.indicator, k, noise)

Arguments

x

the name of a numeric vector, the x-variable.

y

the name of a numeric vector, the y-variable.

method.indicator

an integer either 1 or 2. If the user selects the deterministic method in the client side function the method.indicator is set to 1 while if the user selects the probabilistic method this argument is set to 2.

k

the number of the nearest neghbours for which their centroid is calculated if the deterministic method is selected.

noise

the percentage of the initial variance that is used as the variance of the embedded noise if the probabilistic method is selected.

Value

a list with the x and y coordinates of the data to be plot

Details

If the user chooses the deterministic approach, the function finds the k-1 nearest neighbours of each data point in a 2-dimensional space. The nearest neighbours are the data points with the minimum Euclidean distances from the point of interest. Each point of interest and its k-1 nearest neighbours are then used for the calculation of the coordinates of the centroid of those k points. Centroid here is referred to the centre of mass, i.e. the x-coordinate of the centroid is the average value of the x-coordinates of the k nearest neighbours and the y-coordinate of the centroid is the average of the y-coordinates of the k nearest neighbours. If the user chooses the probabilistic approach, the function adds random noise to $x$ and $y$ separately. Each random noise follows a normal distribution with zero mean and variance equal to 10 disclosure we fix the random number generator in a value that is specified by the input variables. Thus the function returns always the same noisy data for a given pair of variables.

Author

Demetris Avraam for DataSHIELD Development Team