Diabetes is a group of metabolic disorders characterized by a high blood sugar level over a prolonged period of time. Symptoms often include frequent urination, increased thirst, and increased appetite.
In this project I will use association rules to find out association between diabetes and symptoms that can be related to it, defining the most common and significant. The dataset contains 520 cases and 17 features collected using direct questionnaires from the patients of Sylhet Diabetes Hospital in Sylhet (Bangladesh) and approved by a doctor.
It’s available on the following repository: http://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset.
The seventeen features are:
library(arules)
library(arulesViz)
library(dplyr)
dataframe <- read.csv("diabetes_data_upload.csv")
head(dataframe, 10)
| Age | Gender | Polyuria | Polydipsia | sudden.weight.loss | weakness | Polyphagia | Genital.thrush | visual.blurring | Itching | Irritability | delayed.healing | partial.paresis | muscle.stiffness | Alopecia | Obesity | class |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 40 | Male | No | Yes | No | Yes | No | No | No | Yes | No | Yes | No | Yes | Yes | Yes | Positive |
| 58 | Male | No | No | No | Yes | No | No | Yes | No | No | No | Yes | No | Yes | No | Positive |
| 41 | Male | Yes | No | No | Yes | Yes | No | No | Yes | No | Yes | No | Yes | Yes | No | Positive |
| 45 | Male | No | No | Yes | Yes | Yes | Yes | No | Yes | No | Yes | No | No | No | No | Positive |
| 60 | Male | Yes | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Positive |
| 55 | Male | Yes | Yes | No | Yes | Yes | No | Yes | Yes | No | Yes | No | Yes | Yes | Yes | Positive |
| 57 | Male | Yes | Yes | No | Yes | Yes | Yes | No | No | No | Yes | Yes | No | No | No | Positive |
| 66 | Male | Yes | Yes | Yes | Yes | No | No | Yes | Yes | Yes | No | Yes | Yes | No | No | Positive |
| 67 | Male | Yes | Yes | No | Yes | Yes | Yes | No | Yes | Yes | No | Yes | Yes | No | Yes | Positive |
| 70 | Male | No | Yes | Yes | Yes | Yes | No | Yes | Yes | Yes | No | No | No | Yes | No | Positive |
The dataset contains record from patients of different ages so, in order to extend the association for age groups and to factorize the data, i will create four similar groups (in terms of number of records):
The sample age group is composed by patients of different ages:
sort(unique(dataframe$Age))
## [1] 16 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
## [26] 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 72 79 85
## [51] 90
In order to factorize the patient’s age, i will group them in four normally distributed groups:
groups <- cbind(
count(filter(dataframe, Age<40)),
count(filter(dataframe, Age>=40, Age<50)),
count(filter(dataframe, Age>=50, Age<60)),
count(filter(dataframe, Age>=60)),
count(filter(dataframe, Gender=="Male")),
count(filter(dataframe, Gender=="Female"))
)
colnames(groups) <- c("A", "B", "C", "D", "Male", "Female")
groups
## A B C D Male Female
## 1 144 151 130 95 328 192
Groups are normally distributed and in our dataset we have 328 men and 192 females.
dataframe$Age = ifelse(
dataframe$Age < 40, "A", ifelse(
dataframe$Age < 50, "B", ifelse(
dataframe$Age < 60, "C", "D"
)))
for (col in colnames(dataframe)){
dataframe[col] = lapply(dataframe[col], factor)
}
Apriori is an algorithm for frequent item set mining and association rule learning over relational databases/dataset. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database/dataset.
Now I can proceed to transform the dataset to transactional data needed to proceed to perform the Association Rules with Apriori Algorithm.
Before that, I will filter for class=Positive to take under consideration only patients that have been tested positive to diabetes and to find any association with the common symptoms.
diabetes.positive = filter(dataframe, class=="Positive")
diabetes.positive = as(diabetes.positive, "transactions")
inspect(head(diabetes.positive, 2))
## items transactionID
## [1] {Age=B,
## Gender=Male,
## Polyuria=No,
## Polydipsia=Yes,
## sudden.weight.loss=No,
## weakness=Yes,
## Polyphagia=No,
## Genital.thrush=No,
## visual.blurring=No,
## Itching=Yes,
## Irritability=No,
## delayed.healing=Yes,
## partial.paresis=No,
## muscle.stiffness=Yes,
## Alopecia=Yes,
## Obesity=Yes,
## class=Positive} 1
## [2] {Age=C,
## Gender=Male,
## Polyuria=No,
## Polydipsia=No,
## sudden.weight.loss=No,
## weakness=Yes,
## Polyphagia=No,
## Genital.thrush=No,
## visual.blurring=Yes,
## Itching=No,
## Irritability=No,
## delayed.healing=No,
## partial.paresis=Yes,
## muscle.stiffness=No,
## Alopecia=Yes,
## Obesity=No,
## class=Positive} 2
Once I generated the transactional data, I can use a frequency plot to show the 20 most common elements within the transactions:
itemFrequencyPlot(diabetes.positive,topN=20,type="absolute")
Apart from the first element that is the most common because of the dataset filtering (it’s the filtering criteria), I have evidence that the most common symptoms registered among the sample are Polyuria and Polidypsia, while most of them don’t suffer of obesity.
Now I will run the Apriori Algorithm to find association with symptoms and diabetes using support and confidence limit of respectively [0.0025; 0.3]:
rules_diabetes.positive<-apriori(data=diabetes.positive, parameter=list(supp=0.0025,conf = 0.3),
appearance=list(default="lhs", rhs="class=Positive"), control=list(verbose=F))
rules_diabetes.positive = sort(rules_diabetes.positive, by='support')
inspect(head(rules_diabetes.positive, 25))
## lhs rhs support confidence
## [1] {} => {class=Positive} 1.000000 1
## [2] {Obesity=No} => {class=Positive} 0.809375 1
## [3] {Polyuria=Yes} => {class=Positive} 0.759375 1
## [4] {Alopecia=No} => {class=Positive} 0.756250 1
## [5] {Genital.thrush=No} => {class=Positive} 0.740625 1
## [6] {Polydipsia=Yes} => {class=Positive} 0.703125 1
## [7] {weakness=Yes} => {class=Positive} 0.681250 1
## [8] {Irritability=No} => {class=Positive} 0.656250 1
## [9] {Alopecia=No,Obesity=No} => {class=Positive} 0.625000 1
## [10] {Genital.thrush=No,Obesity=No} => {class=Positive} 0.612500 1
## [11] {Genital.thrush=No,Alopecia=No} => {class=Positive} 0.609375 1
## [12] {Polyuria=Yes,Obesity=No} => {class=Positive} 0.609375 1
## [13] {Polyuria=Yes,Polydipsia=Yes} => {class=Positive} 0.603125 1
## [14] {partial.paresis=Yes} => {class=Positive} 0.600000 1
## [15] {Polyphagia=Yes} => {class=Positive} 0.590625 1
## [16] {sudden.weight.loss=Yes} => {class=Positive} 0.587500 1
## [17] {Polyuria=Yes,Alopecia=No} => {class=Positive} 0.584375 1
## [18] {muscle.stiffness=No} => {class=Positive} 0.578125 1
## [19] {Polydipsia=Yes,Alopecia=No} => {class=Positive} 0.575000 1
## [20] {Polyuria=Yes,Genital.thrush=No} => {class=Positive} 0.562500 1
## [21] {Polyuria=Yes,weakness=Yes} => {class=Positive} 0.556250 1
## [22] {Polydipsia=Yes,Obesity=No} => {class=Positive} 0.553125 1
## [23] {visual.blurring=Yes} => {class=Positive} 0.546875 1
## [24] {Polydipsia=Yes,weakness=Yes} => {class=Positive} 0.546875 1
## [25] {Gender=Female} => {class=Positive} 0.540625 1
## coverage lift count
## [1] 1.000000 1 320
## [2] 0.809375 1 259
## [3] 0.759375 1 243
## [4] 0.756250 1 242
## [5] 0.740625 1 237
## [6] 0.703125 1 225
## [7] 0.681250 1 218
## [8] 0.656250 1 210
## [9] 0.625000 1 200
## [10] 0.612500 1 196
## [11] 0.609375 1 195
## [12] 0.609375 1 195
## [13] 0.603125 1 193
## [14] 0.600000 1 192
## [15] 0.590625 1 189
## [16] 0.587500 1 188
## [17] 0.584375 1 187
## [18] 0.578125 1 185
## [19] 0.575000 1 184
## [20] 0.562500 1 180
## [21] 0.556250 1 178
## [22] 0.553125 1 177
## [23] 0.546875 1 175
## [24] 0.546875 1 175
## [25] 0.540625 1 173
The association result gives us evidence that among the patients that tested positive to diabetes:
plot(head(rules_diabetes.positive, 101), method="paracoord", control=list(reorder=TRUE))
plot(head(rules_diabetes.positive, 100), method="graph",control = list(cex=0.7))
Now that i found out the most significant associations between symptoms and diabetes, I will check what are the most common symptoms of patients that are tested negative to diabetes to check if there’s a relevance/significance of those symptoms.
diabetes.negative = filter(dataframe, class=="Negative")
diabetes.negative = as(diabetes.negative, "transactions")
rules_diabetes.negative = apriori(data=diabetes.negative, parameter=list(supp=0.0025,conf = 0.3),
appearance=list(default="lhs", rhs="class=Negative"), control=list(verbose=F))
diabetes.negative = sort(rules_diabetes.negative, by='support')
inspect(head(diabetes.negative, 25))
## lhs rhs support confidence coverage lift count
## [1] {} => {class=Negative} 1.000 1 1.000 1 200
## [2] {Polydipsia=No} => {class=Negative} 0.960 1 0.960 1 192
## [3] {Polyuria=No} => {class=Negative} 0.925 1 0.925 1 185
## [4] {Irritability=No} => {class=Negative} 0.920 1 0.920 1 184
## [5] {Gender=Male} => {class=Negative} 0.905 1 0.905 1 181
## [6] {Polyuria=No,
## Polydipsia=No} => {class=Negative} 0.885 1 0.885 1 177
## [7] {Polydipsia=No,
## Irritability=No} => {class=Negative} 0.880 1 0.880 1 176
## [8] {Obesity=No} => {class=Negative} 0.865 1 0.865 1 173
## [9] {Gender=Male,
## Polydipsia=No} => {class=Negative} 0.865 1 0.865 1 173
## [10] {Polyuria=No,
## Irritability=No} => {class=Negative} 0.865 1 0.865 1 173
## [11] {sudden.weight.loss=No} => {class=Negative} 0.855 1 0.855 1 171
## [12] {partial.paresis=No} => {class=Negative} 0.840 1 0.840 1 168
## [13] {Genital.thrush=No} => {class=Negative} 0.835 1 0.835 1 167
## [14] {Polydipsia=No,
## sudden.weight.loss=No} => {class=Negative} 0.830 1 0.830 1 166
## [15] {Polyuria=No,
## Obesity=No} => {class=Negative} 0.830 1 0.830 1 166
## [16] {Polydipsia=No,
## Obesity=No} => {class=Negative} 0.830 1 0.830 1 166
## [17] {Gender=Male,
## Irritability=No} => {class=Negative} 0.830 1 0.830 1 166
## [18] {Gender=Male,
## Polyuria=No} => {class=Negative} 0.830 1 0.830 1 166
## [19] {Irritability=No,
## Obesity=No} => {class=Negative} 0.825 1 0.825 1 165
## [20] {Polyuria=No,
## Polydipsia=No,
## Irritability=No} => {class=Negative} 0.825 1 0.825 1 165
## [21] {Polydipsia=No,
## partial.paresis=No} => {class=Negative} 0.820 1 0.820 1 164
## [22] {Polyuria=No,
## partial.paresis=No} => {class=Negative} 0.800 1 0.800 1 160
## [23] {Polyuria=No,
## sudden.weight.loss=No} => {class=Negative} 0.800 1 0.800 1 160
## [24] {Polydipsia=No,
## Genital.thrush=No} => {class=Negative} 0.795 1 0.795 1 159
## [25] {Polyuria=No,
## Polydipsia=No,
## Obesity=No} => {class=Negative} 0.795 1 0.795 1 159
Among patients that tested negative to diabetes:
plot(head(rules_diabetes.negative, 101), method="paracoord", control=list(reorder=TRUE))
plot(head(rules_diabetes.negative, 100), method="graph", control = list(cex=0.7))
Once I’ve retrieved the most statistically significant symptoms of diabetes, I will extend the same correlation analysis by age groups that I defined during the data preparation step (A, B, C, D):
for (age_group in c("A", "B", "C", "D")){
X <- filter(dataframe, class=="Positive", Age==age_group)
X <- as(X, "transactions")
X <- apriori(data=X, parameter=list(supp=0.0025,conf = 0.3),
appearance=list(default="lhs", rhs="class=Positive"), control=list(verbose=F))
X <- sort(X, by='support')
print(paste("Association Rules for group", age_group))
inspect(head(X[seq(1, length(X), by=2)], 25))
}
## [1] "Association Rules for group A"
## lhs rhs support confidence coverage lift count
## [1] {} => {class=Positive} 1.0000000 1 1.0000000 1 85
## [2] {Obesity=No} => {class=Positive} 0.8941176 1 0.8941176 1 76
## [3] {Alopecia=No} => {class=Positive} 0.8000000 1 0.8000000 1 68
## [4] {Genital.thrush=No} => {class=Positive} 0.7764706 1 0.7764706 1 66
## [5] {Genital.thrush=No,
## Obesity=No} => {class=Positive} 0.7411765 1 0.7411765 1 63
## [6] {Polyuria=Yes} => {class=Positive} 0.7176471 1 0.7176471 1 61
## [7] {Age=A,
## Polyuria=Yes} => {class=Positive} 0.7176471 1 0.7176471 1 61
## [8] {Alopecia=No,
## Obesity=No} => {class=Positive} 0.6941176 1 0.6941176 1 59
## [9] {Gender=Female} => {class=Positive} 0.6588235 1 0.6588235 1 56
## [10] {Genital.thrush=No,
## Alopecia=No} => {class=Positive} 0.6588235 1 0.6588235 1 56
## [11] {Polyuria=Yes,
## Obesity=No} => {class=Positive} 0.6470588 1 0.6470588 1 55
## [12] {Age=A,
## Polyuria=Yes,
## Obesity=No} => {class=Positive} 0.6470588 1 0.6470588 1 55
## [13] {visual.blurring=No} => {class=Positive} 0.6352941 1 0.6352941 1 54
## [14] {Gender=Female,
## Genital.thrush=No} => {class=Positive} 0.6352941 1 0.6352941 1 54
## [15] {muscle.stiffness=No} => {class=Positive} 0.6235294 1 0.6235294 1 53
## [16] {Age=A,
## muscle.stiffness=No} => {class=Positive} 0.6235294 1 0.6235294 1 53
## [17] {Gender=Female,
## Alopecia=No} => {class=Positive} 0.6235294 1 0.6235294 1 53
## [18] {Age=A,
## Gender=Female,
## Alopecia=No} => {class=Positive} 0.6235294 1 0.6235294 1 53
## [19] {Genital.thrush=No,
## Alopecia=No,
## Obesity=No} => {class=Positive} 0.6235294 1 0.6235294 1 53
## [20] {Polyuria=Yes,
## Alopecia=No} => {class=Positive} 0.6000000 1 0.6000000 1 51
## [21] {Gender=Female,
## Genital.thrush=No,
## Obesity=No} => {class=Positive} 0.6000000 1 0.6000000 1 51
## [22] {Age=A,
## Gender=Female,
## Genital.thrush=No,
## Alopecia=No} => {class=Positive} 0.6000000 1 0.6000000 1 51
## [23] {muscle.stiffness=No,
## Obesity=No} => {class=Positive} 0.5882353 1 0.5882353 1 50
## [24] {Gender=Female,
## Alopecia=No,
## Obesity=No} => {class=Positive} 0.5882353 1 0.5882353 1 50
## [25] {Polydipsia=Yes,
## Obesity=No} => {class=Positive} 0.5764706 1 0.5764706 1 49
## [1] "Association Rules for group B"
## lhs rhs support confidence coverage lift count
## [1] {} => {class=Positive} 1.0000000 1 1.0000000 1 88
## [2] {Alopecia=No} => {class=Positive} 0.8750000 1 0.8750000 1 77
## [3] {Obesity=No} => {class=Positive} 0.8181818 1 0.8181818 1 72
## [4] {weakness=Yes} => {class=Positive} 0.7727273 1 0.7727273 1 68
## [5] {Genital.thrush=No} => {class=Positive} 0.7613636 1 0.7613636 1 67
## [6] {Irritability=No} => {class=Positive} 0.7386364 1 0.7386364 1 65
## [7] {Alopecia=No,
## Obesity=No} => {class=Positive} 0.7386364 1 0.7386364 1 65
## [8] {Polyuria=Yes} => {class=Positive} 0.7159091 1 0.7159091 1 63
## [9] {Age=B,
## Polyuria=Yes} => {class=Positive} 0.7159091 1 0.7159091 1 63
## [10] {Genital.thrush=No,
## Alopecia=No} => {class=Positive} 0.7045455 1 0.7045455 1 62
## [11] {weakness=Yes,
## Alopecia=No} => {class=Positive} 0.6931818 1 0.6931818 1 61
## [12] {Irritability=No,
## Alopecia=No} => {class=Positive} 0.6590909 1 0.6590909 1 58
## [13] {sudden.weight.loss=Yes} => {class=Positive} 0.6477273 1 0.6477273 1 57
## [14] {Polyuria=Yes,
## Polydipsia=Yes} => {class=Positive} 0.6477273 1 0.6477273 1 57
## [15] {Age=B,
## Polyuria=Yes,
## Polydipsia=Yes} => {class=Positive} 0.6477273 1 0.6477273 1 57
## [16] {weakness=Yes,
## Obesity=No} => {class=Positive} 0.6363636 1 0.6363636 1 56
## [17] {Irritability=No,
## Obesity=No} => {class=Positive} 0.6250000 1 0.6250000 1 55
## [18] {sudden.weight.loss=Yes,
## Alopecia=No} => {class=Positive} 0.6136364 1 0.6136364 1 54
## [19] {Polydipsia=Yes,
## Alopecia=No} => {class=Positive} 0.6136364 1 0.6136364 1 54
## [20] {Age=B,
## sudden.weight.loss=Yes,
## Alopecia=No} => {class=Positive} 0.6136364 1 0.6136364 1 54
## [21] {Age=B,
## Polydipsia=Yes,
## Alopecia=No} => {class=Positive} 0.6136364 1 0.6136364 1 54
## [22] {Genital.thrush=No,
## Alopecia=No,
## Obesity=No} => {class=Positive} 0.6136364 1 0.6136364 1 54
## [23] {Genital.thrush=No,
## Irritability=No} => {class=Positive} 0.6022727 1 0.6022727 1 53
## [24] {delayed.healing=No} => {class=Positive} 0.5909091 1 0.5909091 1 52
## [25] {Polyuria=Yes,
## Obesity=No} => {class=Positive} 0.5909091 1 0.5909091 1 52
## [1] "Association Rules for group C"
## lhs rhs support confidence coverage lift count
## [1] {} => {class=Positive} 1.0000000 1 1.0000000 1 81
## [2] {Polydipsia=Yes} => {class=Positive} 0.8024691 1 0.8024691 1 65
## [3] {Polyuria=Yes} => {class=Positive} 0.7901235 1 0.7901235 1 64
## [4] {sudden.weight.loss=Yes} => {class=Positive} 0.7654321 1 0.7654321 1 62
## [5] {Obesity=No} => {class=Positive} 0.7654321 1 0.7654321 1 62
## [6] {Age=C,
## Alopecia=No} => {class=Positive} 0.7654321 1 0.7654321 1 62
## [7] {Genital.thrush=No} => {class=Positive} 0.7530864 1 0.7530864 1 61
## [8] {partial.paresis=Yes} => {class=Positive} 0.7407407 1 0.7407407 1 60
## [9] {weakness=Yes} => {class=Positive} 0.7037037 1 0.7037037 1 57
## [10] {Polyuria=Yes,
## sudden.weight.loss=Yes} => {class=Positive} 0.6790123 1 0.6790123 1 55
## [11] {Polydipsia=Yes,
## partial.paresis=Yes} => {class=Positive} 0.6666667 1 0.6666667 1 54
## [12] {Age=C,
## Polydipsia=Yes,
## partial.paresis=Yes} => {class=Positive} 0.6666667 1 0.6666667 1 54
## [13] {Irritability=No} => {class=Positive} 0.6543210 1 0.6543210 1 53
## [14] {Genital.thrush=No,
## Alopecia=No} => {class=Positive} 0.6543210 1 0.6543210 1 53
## [15] {muscle.stiffness=No} => {class=Positive} 0.6419753 1 0.6419753 1 52
## [16] {Polydipsia=Yes,
## sudden.weight.loss=Yes} => {class=Positive} 0.6419753 1 0.6419753 1 52
## [17] {Age=C,
## Polydipsia=Yes,
## sudden.weight.loss=Yes} => {class=Positive} 0.6419753 1 0.6419753 1 52
## [18] {Polyphagia=Yes} => {class=Positive} 0.6296296 1 0.6296296 1 51
## [19] {Polydipsia=Yes,
## weakness=Yes} => {class=Positive} 0.6296296 1 0.6296296 1 51
## [20] {partial.paresis=Yes,
## Alopecia=No} => {class=Positive} 0.6296296 1 0.6296296 1 51
## [21] {Polyuria=Yes,
## Alopecia=No} => {class=Positive} 0.6296296 1 0.6296296 1 51
## [22] {Age=C,
## Genital.thrush=No,
## partial.paresis=Yes} => {class=Positive} 0.6296296 1 0.6296296 1 51
## [23] {Age=C,
## Polyuria=Yes,
## partial.paresis=Yes} => {class=Positive} 0.6296296 1 0.6296296 1 51
## [24] {Polyuria=Yes,
## Genital.thrush=No} => {class=Positive} 0.6172840 1 0.6172840 1 50
## [25] {Polydipsia=Yes,
## Obesity=No} => {class=Positive} 0.6172840 1 0.6172840 1 50
## [1] "Association Rules for group D"
## lhs rhs support confidence coverage lift count
## [1] {} => {class=Positive} 1.0000000 1 1.0000000 1 66
## [2] {Polyuria=Yes} => {class=Positive} 0.8333333 1 0.8333333 1 55
## [3] {visual.blurring=Yes} => {class=Positive} 0.7727273 1 0.7727273 1 51
## [4] {Obesity=No} => {class=Positive} 0.7424242 1 0.7424242 1 49
## [5] {weakness=Yes} => {class=Positive} 0.7272727 1 0.7272727 1 48
## [6] {Polyphagia=Yes} => {class=Positive} 0.7121212 1 0.7121212 1 47
## [7] {partial.paresis=Yes} => {class=Positive} 0.6969697 1 0.6969697 1 46
## [8] {Polydipsia=Yes} => {class=Positive} 0.6666667 1 0.6666667 1 44
## [9] {Genital.thrush=No} => {class=Positive} 0.6515152 1 0.6515152 1 43
## [10] {Polyuria=Yes,
## visual.blurring=Yes} => {class=Positive} 0.6515152 1 0.6515152 1 43
## [11] {Polyuria=Yes,
## weakness=Yes} => {class=Positive} 0.6363636 1 0.6363636 1 42
## [12] {Polyuria=Yes,
## Polyphagia=Yes} => {class=Positive} 0.6212121 1 0.6212121 1 41
## [13] {Polyuria=Yes,
## partial.paresis=Yes} => {class=Positive} 0.6060606 1 0.6060606 1 40
## [14] {Polyuria=Yes,
## Obesity=No} => {class=Positive} 0.6060606 1 0.6060606 1 40
## [15] {Age=D,
## weakness=Yes,
## visual.blurring=Yes} => {class=Positive} 0.6060606 1 0.6060606 1 40
## [16] {Polydipsia=Yes,
## visual.blurring=Yes} => {class=Positive} 0.5909091 1 0.5909091 1 39
## [17] {Gender=Male} => {class=Positive} 0.5757576 1 0.5757576 1 38
## [18] {Polyuria=Yes,
## Genital.thrush=No} => {class=Positive} 0.5757576 1 0.5757576 1 38
## [19] {visual.blurring=Yes,
## partial.paresis=Yes} => {class=Positive} 0.5757576 1 0.5757576 1 38
## [20] {Age=D,
## weakness=Yes,
## partial.paresis=Yes} => {class=Positive} 0.5757576 1 0.5757576 1 38
## [21] {Itching=Yes} => {class=Positive} 0.5606061 1 0.5606061 1 37
## [22] {weakness=Yes,
## Polyphagia=Yes} => {class=Positive} 0.5606061 1 0.5606061 1 37
## [23] {Age=D,
## weakness=Yes,
## Polyphagia=Yes} => {class=Positive} 0.5606061 1 0.5606061 1 37
## [24] {sudden.weight.loss=No} => {class=Positive} 0.5454545 1 0.5454545 1 36
## [25] {Polydipsia=Yes,
## weakness=Yes} => {class=Positive} 0.5454545 1 0.5454545 1 36
Using the Apriori algorithm, I have evidence that Polyuria and Polydipsia have the highest correlation and support with being diagnosed with diabetes. The evidence is supported by a correlation analysis on both positive and negative patients (records).
The claim is also supported by scientific researches that can be found in the reference section below.
Furthermore, considering the four age groups it’s evident that while all the patients experience Polyuria and Polydipsia symptoms no matter the age, the oldest group (patients more than 60 years old) also experience other illnesses like blurred vision (77%), general weakness (73%) and Polyphagia (71%). Noteworthy is also the sudden weight loss registered for 77% of patients between 50 and 60 years old.