Introduction

Diabetes is a group of metabolic disorders characterized by a high blood sugar level over a prolonged period of time. Symptoms often include frequent urination, increased thirst, and increased appetite.

In this project I will use association rules to find out association between diabetes and symptoms that can be related to it, defining the most common and significant. The dataset contains 520 cases and 17 features collected using direct questionnaires from the patients of Sylhet Diabetes Hospital in Sylhet (Bangladesh) and approved by a doctor.

It’s available on the following repository: http://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset.

The seventeen features are:

library(arules)
library(arulesViz)
library(dplyr)

dataframe <- read.csv("diabetes_data_upload.csv")
head(dataframe, 10)
Age Gender Polyuria Polydipsia sudden.weight.loss weakness Polyphagia Genital.thrush visual.blurring Itching Irritability delayed.healing partial.paresis muscle.stiffness Alopecia Obesity class
40 Male No Yes No Yes No No No Yes No Yes No Yes Yes Yes Positive
58 Male No No No Yes No No Yes No No No Yes No Yes No Positive
41 Male Yes No No Yes Yes No No Yes No Yes No Yes Yes No Positive
45 Male No No Yes Yes Yes Yes No Yes No Yes No No No No Positive
60 Male Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Positive
55 Male Yes Yes No Yes Yes No Yes Yes No Yes No Yes Yes Yes Positive
57 Male Yes Yes No Yes Yes Yes No No No Yes Yes No No No Positive
66 Male Yes Yes Yes Yes No No Yes Yes Yes No Yes Yes No No Positive
67 Male Yes Yes No Yes Yes Yes No Yes Yes No Yes Yes No Yes Positive
70 Male No Yes Yes Yes Yes No Yes Yes Yes No No No Yes No Positive

Data Preparation

The dataset contains record from patients of different ages so, in order to extend the association for age groups and to factorize the data, i will create four similar groups (in terms of number of records):

  • GROUP A: Less than 40 years old
  • GROUP B: Between 40 and 50 years old (50 not included)
  • GROUP C: Between 50 and 60 years old (60 not included)
  • GROUP D: More than 60 years old


The sample age group is composed by patients of different ages:

sort(unique(dataframe$Age))
##  [1] 16 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
## [26] 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 72 79 85
## [51] 90

In order to factorize the patient’s age, i will group them in four normally distributed groups:

groups <- cbind(
    count(filter(dataframe, Age<40)),
    count(filter(dataframe, Age>=40, Age<50)),
    count(filter(dataframe, Age>=50, Age<60)),
    count(filter(dataframe, Age>=60)),
    count(filter(dataframe, Gender=="Male")),
    count(filter(dataframe, Gender=="Female"))
)
colnames(groups) <- c("A", "B", "C", "D", "Male", "Female")
groups
##     A   B   C  D Male Female
## 1 144 151 130 95  328    192

Groups are normally distributed and in our dataset we have 328 men and 192 females.

dataframe$Age = ifelse(
    dataframe$Age < 40, "A", ifelse(
    dataframe$Age < 50, "B", ifelse(
    dataframe$Age < 60, "C", "D"
)))
for (col in colnames(dataframe)){
    dataframe[col] = lapply(dataframe[col], factor)
}

Apriori Algorithm for Association Rules

Apriori is an algorithm for frequent item set mining and association rule learning over relational databases/dataset. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database/dataset.


  • SUPPORT: how popular an itemset is, as measured by the proportion of transactions in which an itemset appears.
  • CONFIDENCE: how likely Y is present when X is, expressed as {X -> Y}. This is measured by the proportion of transactions with X, in which Y also appears.
  • LIFT: how likely Y is present when X is, while controlling for how popular Y is.

Data Analysis

Now I can proceed to transform the dataset to transactional data needed to proceed to perform the Association Rules with Apriori Algorithm.

Before that, I will filter for class=Positive to take under consideration only patients that have been tested positive to diabetes and to find any association with the common symptoms.

diabetes.positive = filter(dataframe, class=="Positive")
diabetes.positive = as(diabetes.positive, "transactions")
inspect(head(diabetes.positive, 2))
##     items                   transactionID
## [1] {Age=B,
##      Gender=Male,
##      Polyuria=No,
##      Polydipsia=Yes,
##      sudden.weight.loss=No,
##      weakness=Yes,
##      Polyphagia=No,
##      Genital.thrush=No,
##      visual.blurring=No,
##      Itching=Yes,
##      Irritability=No,
##      delayed.healing=Yes,
##      partial.paresis=No,
##      muscle.stiffness=Yes,
##      Alopecia=Yes,
##      Obesity=Yes,
##      class=Positive}                    1
## [2] {Age=C,
##      Gender=Male,
##      Polyuria=No,
##      Polydipsia=No,
##      sudden.weight.loss=No,
##      weakness=Yes,
##      Polyphagia=No,
##      Genital.thrush=No,
##      visual.blurring=Yes,
##      Itching=No,
##      Irritability=No,
##      delayed.healing=No,
##      partial.paresis=Yes,
##      muscle.stiffness=No,
##      Alopecia=Yes,
##      Obesity=No,
##      class=Positive}                    2

Once I generated the transactional data, I can use a frequency plot to show the 20 most common elements within the transactions:

itemFrequencyPlot(diabetes.positive,topN=20,type="absolute")

Apart from the first element that is the most common because of the dataset filtering (it’s the filtering criteria), I have evidence that the most common symptoms registered among the sample are Polyuria and Polidypsia, while most of them don’t suffer of obesity.

Now I will run the Apriori Algorithm to find association with symptoms and diabetes using support and confidence limit of respectively [0.0025; 0.3]:

rules_diabetes.positive<-apriori(data=diabetes.positive, parameter=list(supp=0.0025,conf = 0.3),
appearance=list(default="lhs", rhs="class=Positive"), control=list(verbose=F))
rules_diabetes.positive = sort(rules_diabetes.positive, by='support')
inspect(head(rules_diabetes.positive, 25))
##      lhs                                 rhs              support  confidence
## [1]  {}                               => {class=Positive} 1.000000 1
## [2]  {Obesity=No}                     => {class=Positive} 0.809375 1
## [3]  {Polyuria=Yes}                   => {class=Positive} 0.759375 1
## [4]  {Alopecia=No}                    => {class=Positive} 0.756250 1
## [5]  {Genital.thrush=No}              => {class=Positive} 0.740625 1
## [6]  {Polydipsia=Yes}                 => {class=Positive} 0.703125 1
## [7]  {weakness=Yes}                   => {class=Positive} 0.681250 1
## [8]  {Irritability=No}                => {class=Positive} 0.656250 1
## [9]  {Alopecia=No,Obesity=No}         => {class=Positive} 0.625000 1
## [10] {Genital.thrush=No,Obesity=No}   => {class=Positive} 0.612500 1
## [11] {Genital.thrush=No,Alopecia=No}  => {class=Positive} 0.609375 1
## [12] {Polyuria=Yes,Obesity=No}        => {class=Positive} 0.609375 1
## [13] {Polyuria=Yes,Polydipsia=Yes}    => {class=Positive} 0.603125 1
## [14] {partial.paresis=Yes}            => {class=Positive} 0.600000 1
## [15] {Polyphagia=Yes}                 => {class=Positive} 0.590625 1
## [16] {sudden.weight.loss=Yes}         => {class=Positive} 0.587500 1
## [17] {Polyuria=Yes,Alopecia=No}       => {class=Positive} 0.584375 1
## [18] {muscle.stiffness=No}            => {class=Positive} 0.578125 1
## [19] {Polydipsia=Yes,Alopecia=No}     => {class=Positive} 0.575000 1
## [20] {Polyuria=Yes,Genital.thrush=No} => {class=Positive} 0.562500 1
## [21] {Polyuria=Yes,weakness=Yes}      => {class=Positive} 0.556250 1
## [22] {Polydipsia=Yes,Obesity=No}      => {class=Positive} 0.553125 1
## [23] {visual.blurring=Yes}            => {class=Positive} 0.546875 1
## [24] {Polydipsia=Yes,weakness=Yes}    => {class=Positive} 0.546875 1
## [25] {Gender=Female}                  => {class=Positive} 0.540625 1
##      coverage lift count
## [1]  1.000000 1    320
## [2]  0.809375 1    259
## [3]  0.759375 1    243
## [4]  0.756250 1    242
## [5]  0.740625 1    237
## [6]  0.703125 1    225
## [7]  0.681250 1    218
## [8]  0.656250 1    210
## [9]  0.625000 1    200
## [10] 0.612500 1    196
## [11] 0.609375 1    195
## [12] 0.609375 1    195
## [13] 0.603125 1    193
## [14] 0.600000 1    192
## [15] 0.590625 1    189
## [16] 0.587500 1    188
## [17] 0.584375 1    187
## [18] 0.578125 1    185
## [19] 0.575000 1    184
## [20] 0.562500 1    180
## [21] 0.556250 1    178
## [22] 0.553125 1    177
## [23] 0.546875 1    175
## [24] 0.546875 1    175
## [25] 0.540625 1    173

The association result gives us evidence that among the patients that tested positive to diabetes:

plot(head(rules_diabetes.positive, 101), method="paracoord", control=list(reorder=TRUE))
plot(head(rules_diabetes.positive, 100), method="graph",control = list(cex=0.7))

Now that i found out the most significant associations between symptoms and diabetes, I will check what are the most common symptoms of patients that are tested negative to diabetes to check if there’s a relevance/significance of those symptoms.

diabetes.negative = filter(dataframe, class=="Negative")
diabetes.negative = as(diabetes.negative, "transactions")
rules_diabetes.negative = apriori(data=diabetes.negative, parameter=list(supp=0.0025,conf = 0.3),
appearance=list(default="lhs", rhs="class=Negative"), control=list(verbose=F))
diabetes.negative = sort(rules_diabetes.negative, by='support')
inspect(head(diabetes.negative, 25))
##      lhs                        rhs              support confidence coverage lift count
## [1]  {}                      => {class=Negative}   1.000          1    1.000    1   200
## [2]  {Polydipsia=No}         => {class=Negative}   0.960          1    0.960    1   192
## [3]  {Polyuria=No}           => {class=Negative}   0.925          1    0.925    1   185
## [4]  {Irritability=No}       => {class=Negative}   0.920          1    0.920    1   184
## [5]  {Gender=Male}           => {class=Negative}   0.905          1    0.905    1   181
## [6]  {Polyuria=No,
##       Polydipsia=No}         => {class=Negative}   0.885          1    0.885    1   177
## [7]  {Polydipsia=No,
##       Irritability=No}       => {class=Negative}   0.880          1    0.880    1   176
## [8]  {Obesity=No}            => {class=Negative}   0.865          1    0.865    1   173
## [9]  {Gender=Male,
##       Polydipsia=No}         => {class=Negative}   0.865          1    0.865    1   173
## [10] {Polyuria=No,
##       Irritability=No}       => {class=Negative}   0.865          1    0.865    1   173
## [11] {sudden.weight.loss=No} => {class=Negative}   0.855          1    0.855    1   171
## [12] {partial.paresis=No}    => {class=Negative}   0.840          1    0.840    1   168
## [13] {Genital.thrush=No}     => {class=Negative}   0.835          1    0.835    1   167
## [14] {Polydipsia=No,
##       sudden.weight.loss=No} => {class=Negative}   0.830          1    0.830    1   166
## [15] {Polyuria=No,
##       Obesity=No}            => {class=Negative}   0.830          1    0.830    1   166
## [16] {Polydipsia=No,
##       Obesity=No}            => {class=Negative}   0.830          1    0.830    1   166
## [17] {Gender=Male,
##       Irritability=No}       => {class=Negative}   0.830          1    0.830    1   166
## [18] {Gender=Male,
##       Polyuria=No}           => {class=Negative}   0.830          1    0.830    1   166
## [19] {Irritability=No,
##       Obesity=No}            => {class=Negative}   0.825          1    0.825    1   165
## [20] {Polyuria=No,
##       Polydipsia=No,
##       Irritability=No}       => {class=Negative}   0.825          1    0.825    1   165
## [21] {Polydipsia=No,
##       partial.paresis=No}    => {class=Negative}   0.820          1    0.820    1   164
## [22] {Polyuria=No,
##       partial.paresis=No}    => {class=Negative}   0.800          1    0.800    1   160
## [23] {Polyuria=No,
##       sudden.weight.loss=No} => {class=Negative}   0.800          1    0.800    1   160
## [24] {Polydipsia=No,
##       Genital.thrush=No}     => {class=Negative}   0.795          1    0.795    1   159
## [25] {Polyuria=No,
##       Polydipsia=No,
##       Obesity=No}            => {class=Negative}   0.795          1    0.795    1   159

Among patients that tested negative to diabetes:

plot(head(rules_diabetes.negative, 101), method="paracoord", control=list(reorder=TRUE))
plot(head(rules_diabetes.negative, 100), method="graph", control = list(cex=0.7))

Analysis by age groups

Once I’ve retrieved the most statistically significant symptoms of diabetes, I will extend the same correlation analysis by age groups that I defined during the data preparation step (A, B, C, D):

for (age_group in c("A", "B", "C", "D")){
    X <- filter(dataframe, class=="Positive", Age==age_group)
    X <- as(X, "transactions")
    X <- apriori(data=X, parameter=list(supp=0.0025,conf = 0.3),
    appearance=list(default="lhs", rhs="class=Positive"), control=list(verbose=F))
    X <- sort(X, by='support')
    print(paste("Association Rules for group", age_group))
    inspect(head(X[seq(1, length(X), by=2)], 25))
}

GROUP A

## [1] "Association Rules for group A"
##      lhs                      rhs                support confidence  coverage lift count
## [1]  {}                    => {class=Positive} 1.0000000          1 1.0000000    1    85
## [2]  {Obesity=No}          => {class=Positive} 0.8941176          1 0.8941176    1    76
## [3]  {Alopecia=No}         => {class=Positive} 0.8000000          1 0.8000000    1    68
## [4]  {Genital.thrush=No}   => {class=Positive} 0.7764706          1 0.7764706    1    66
## [5]  {Genital.thrush=No,
##       Obesity=No}          => {class=Positive} 0.7411765          1 0.7411765    1    63
## [6]  {Polyuria=Yes}        => {class=Positive} 0.7176471          1 0.7176471    1    61
## [7]  {Age=A,
##       Polyuria=Yes}        => {class=Positive} 0.7176471          1 0.7176471    1    61
## [8]  {Alopecia=No,
##       Obesity=No}          => {class=Positive} 0.6941176          1 0.6941176    1    59
## [9]  {Gender=Female}       => {class=Positive} 0.6588235          1 0.6588235    1    56
## [10] {Genital.thrush=No,
##       Alopecia=No}         => {class=Positive} 0.6588235          1 0.6588235    1    56
## [11] {Polyuria=Yes,
##       Obesity=No}          => {class=Positive} 0.6470588          1 0.6470588    1    55
## [12] {Age=A,
##       Polyuria=Yes,
##       Obesity=No}          => {class=Positive} 0.6470588          1 0.6470588    1    55
## [13] {visual.blurring=No}  => {class=Positive} 0.6352941          1 0.6352941    1    54
## [14] {Gender=Female,
##       Genital.thrush=No}   => {class=Positive} 0.6352941          1 0.6352941    1    54
## [15] {muscle.stiffness=No} => {class=Positive} 0.6235294          1 0.6235294    1    53
## [16] {Age=A,
##       muscle.stiffness=No} => {class=Positive} 0.6235294          1 0.6235294    1    53
## [17] {Gender=Female,
##       Alopecia=No}         => {class=Positive} 0.6235294          1 0.6235294    1    53
## [18] {Age=A,
##       Gender=Female,
##       Alopecia=No}         => {class=Positive} 0.6235294          1 0.6235294    1    53
## [19] {Genital.thrush=No,
##       Alopecia=No,
##       Obesity=No}          => {class=Positive} 0.6235294          1 0.6235294    1    53
## [20] {Polyuria=Yes,
##       Alopecia=No}         => {class=Positive} 0.6000000          1 0.6000000    1    51
## [21] {Gender=Female,
##       Genital.thrush=No,
##       Obesity=No}          => {class=Positive} 0.6000000          1 0.6000000    1    51
## [22] {Age=A,
##       Gender=Female,
##       Genital.thrush=No,
##       Alopecia=No}         => {class=Positive} 0.6000000          1 0.6000000    1    51
## [23] {muscle.stiffness=No,
##       Obesity=No}          => {class=Positive} 0.5882353          1 0.5882353    1    50
## [24] {Gender=Female,
##       Alopecia=No,
##       Obesity=No}          => {class=Positive} 0.5882353          1 0.5882353    1    50
## [25] {Polydipsia=Yes,
##       Obesity=No}          => {class=Positive} 0.5764706          1 0.5764706    1    49

GROUP B

## [1] "Association Rules for group B"
##      lhs                         rhs                support confidence  coverage lift count
## [1]  {}                       => {class=Positive} 1.0000000          1 1.0000000    1    88
## [2]  {Alopecia=No}            => {class=Positive} 0.8750000          1 0.8750000    1    77
## [3]  {Obesity=No}             => {class=Positive} 0.8181818          1 0.8181818    1    72
## [4]  {weakness=Yes}           => {class=Positive} 0.7727273          1 0.7727273    1    68
## [5]  {Genital.thrush=No}      => {class=Positive} 0.7613636          1 0.7613636    1    67
## [6]  {Irritability=No}        => {class=Positive} 0.7386364          1 0.7386364    1    65
## [7]  {Alopecia=No,
##       Obesity=No}             => {class=Positive} 0.7386364          1 0.7386364    1    65
## [8]  {Polyuria=Yes}           => {class=Positive} 0.7159091          1 0.7159091    1    63
## [9]  {Age=B,
##       Polyuria=Yes}           => {class=Positive} 0.7159091          1 0.7159091    1    63
## [10] {Genital.thrush=No,
##       Alopecia=No}            => {class=Positive} 0.7045455          1 0.7045455    1    62
## [11] {weakness=Yes,
##       Alopecia=No}            => {class=Positive} 0.6931818          1 0.6931818    1    61
## [12] {Irritability=No,
##       Alopecia=No}            => {class=Positive} 0.6590909          1 0.6590909    1    58
## [13] {sudden.weight.loss=Yes} => {class=Positive} 0.6477273          1 0.6477273    1    57
## [14] {Polyuria=Yes,
##       Polydipsia=Yes}         => {class=Positive} 0.6477273          1 0.6477273    1    57
## [15] {Age=B,
##       Polyuria=Yes,
##       Polydipsia=Yes}         => {class=Positive} 0.6477273          1 0.6477273    1    57
## [16] {weakness=Yes,
##       Obesity=No}             => {class=Positive} 0.6363636          1 0.6363636    1    56
## [17] {Irritability=No,
##       Obesity=No}             => {class=Positive} 0.6250000          1 0.6250000    1    55
## [18] {sudden.weight.loss=Yes,
##       Alopecia=No}            => {class=Positive} 0.6136364          1 0.6136364    1    54
## [19] {Polydipsia=Yes,
##       Alopecia=No}            => {class=Positive} 0.6136364          1 0.6136364    1    54
## [20] {Age=B,
##       sudden.weight.loss=Yes,
##       Alopecia=No}            => {class=Positive} 0.6136364          1 0.6136364    1    54
## [21] {Age=B,
##       Polydipsia=Yes,
##       Alopecia=No}            => {class=Positive} 0.6136364          1 0.6136364    1    54
## [22] {Genital.thrush=No,
##       Alopecia=No,
##       Obesity=No}             => {class=Positive} 0.6136364          1 0.6136364    1    54
## [23] {Genital.thrush=No,
##       Irritability=No}        => {class=Positive} 0.6022727          1 0.6022727    1    53
## [24] {delayed.healing=No}     => {class=Positive} 0.5909091          1 0.5909091    1    52
## [25] {Polyuria=Yes,
##       Obesity=No}             => {class=Positive} 0.5909091          1 0.5909091    1    52

GROUP C

## [1] "Association Rules for group C"
##      lhs                         rhs                support confidence  coverage lift count
## [1]  {}                       => {class=Positive} 1.0000000          1 1.0000000    1    81
## [2]  {Polydipsia=Yes}         => {class=Positive} 0.8024691          1 0.8024691    1    65
## [3]  {Polyuria=Yes}           => {class=Positive} 0.7901235          1 0.7901235    1    64
## [4]  {sudden.weight.loss=Yes} => {class=Positive} 0.7654321          1 0.7654321    1    62
## [5]  {Obesity=No}             => {class=Positive} 0.7654321          1 0.7654321    1    62
## [6]  {Age=C,
##       Alopecia=No}            => {class=Positive} 0.7654321          1 0.7654321    1    62
## [7]  {Genital.thrush=No}      => {class=Positive} 0.7530864          1 0.7530864    1    61
## [8]  {partial.paresis=Yes}    => {class=Positive} 0.7407407          1 0.7407407    1    60
## [9]  {weakness=Yes}           => {class=Positive} 0.7037037          1 0.7037037    1    57
## [10] {Polyuria=Yes,
##       sudden.weight.loss=Yes} => {class=Positive} 0.6790123          1 0.6790123    1    55
## [11] {Polydipsia=Yes,
##       partial.paresis=Yes}    => {class=Positive} 0.6666667          1 0.6666667    1    54
## [12] {Age=C,
##       Polydipsia=Yes,
##       partial.paresis=Yes}    => {class=Positive} 0.6666667          1 0.6666667    1    54
## [13] {Irritability=No}        => {class=Positive} 0.6543210          1 0.6543210    1    53
## [14] {Genital.thrush=No,
##       Alopecia=No}            => {class=Positive} 0.6543210          1 0.6543210    1    53
## [15] {muscle.stiffness=No}    => {class=Positive} 0.6419753          1 0.6419753    1    52
## [16] {Polydipsia=Yes,
##       sudden.weight.loss=Yes} => {class=Positive} 0.6419753          1 0.6419753    1    52
## [17] {Age=C,
##       Polydipsia=Yes,
##       sudden.weight.loss=Yes} => {class=Positive} 0.6419753          1 0.6419753    1    52
## [18] {Polyphagia=Yes}         => {class=Positive} 0.6296296          1 0.6296296    1    51
## [19] {Polydipsia=Yes,
##       weakness=Yes}           => {class=Positive} 0.6296296          1 0.6296296    1    51
## [20] {partial.paresis=Yes,
##       Alopecia=No}            => {class=Positive} 0.6296296          1 0.6296296    1    51
## [21] {Polyuria=Yes,
##       Alopecia=No}            => {class=Positive} 0.6296296          1 0.6296296    1    51
## [22] {Age=C,
##       Genital.thrush=No,
##       partial.paresis=Yes}    => {class=Positive} 0.6296296          1 0.6296296    1    51
## [23] {Age=C,
##       Polyuria=Yes,
##       partial.paresis=Yes}    => {class=Positive} 0.6296296          1 0.6296296    1    51
## [24] {Polyuria=Yes,
##       Genital.thrush=No}      => {class=Positive} 0.6172840          1 0.6172840    1    50
## [25] {Polydipsia=Yes,
##       Obesity=No}             => {class=Positive} 0.6172840          1 0.6172840    1    50

GROUP D

## [1] "Association Rules for group D"
##      lhs                        rhs                support confidence  coverage lift count
## [1]  {}                      => {class=Positive} 1.0000000          1 1.0000000    1    66
## [2]  {Polyuria=Yes}          => {class=Positive} 0.8333333          1 0.8333333    1    55
## [3]  {visual.blurring=Yes}   => {class=Positive} 0.7727273          1 0.7727273    1    51
## [4]  {Obesity=No}            => {class=Positive} 0.7424242          1 0.7424242    1    49
## [5]  {weakness=Yes}          => {class=Positive} 0.7272727          1 0.7272727    1    48
## [6]  {Polyphagia=Yes}        => {class=Positive} 0.7121212          1 0.7121212    1    47
## [7]  {partial.paresis=Yes}   => {class=Positive} 0.6969697          1 0.6969697    1    46
## [8]  {Polydipsia=Yes}        => {class=Positive} 0.6666667          1 0.6666667    1    44
## [9]  {Genital.thrush=No}     => {class=Positive} 0.6515152          1 0.6515152    1    43
## [10] {Polyuria=Yes,
##       visual.blurring=Yes}   => {class=Positive} 0.6515152          1 0.6515152    1    43
## [11] {Polyuria=Yes,
##       weakness=Yes}          => {class=Positive} 0.6363636          1 0.6363636    1    42
## [12] {Polyuria=Yes,
##       Polyphagia=Yes}        => {class=Positive} 0.6212121          1 0.6212121    1    41
## [13] {Polyuria=Yes,
##       partial.paresis=Yes}   => {class=Positive} 0.6060606          1 0.6060606    1    40
## [14] {Polyuria=Yes,
##       Obesity=No}            => {class=Positive} 0.6060606          1 0.6060606    1    40
## [15] {Age=D,
##       weakness=Yes,
##       visual.blurring=Yes}   => {class=Positive} 0.6060606          1 0.6060606    1    40
## [16] {Polydipsia=Yes,
##       visual.blurring=Yes}   => {class=Positive} 0.5909091          1 0.5909091    1    39
## [17] {Gender=Male}           => {class=Positive} 0.5757576          1 0.5757576    1    38
## [18] {Polyuria=Yes,
##       Genital.thrush=No}     => {class=Positive} 0.5757576          1 0.5757576    1    38
## [19] {visual.blurring=Yes,
##       partial.paresis=Yes}   => {class=Positive} 0.5757576          1 0.5757576    1    38
## [20] {Age=D,
##       weakness=Yes,
##       partial.paresis=Yes}   => {class=Positive} 0.5757576          1 0.5757576    1    38
## [21] {Itching=Yes}           => {class=Positive} 0.5606061          1 0.5606061    1    37
## [22] {weakness=Yes,
##       Polyphagia=Yes}        => {class=Positive} 0.5606061          1 0.5606061    1    37
## [23] {Age=D,
##       weakness=Yes,
##       Polyphagia=Yes}        => {class=Positive} 0.5606061          1 0.5606061    1    37
## [24] {sudden.weight.loss=No} => {class=Positive} 0.5454545          1 0.5454545    1    36
## [25] {Polydipsia=Yes,
##       weakness=Yes}          => {class=Positive} 0.5454545          1 0.5454545    1    36

Conclusion

Using the Apriori algorithm, I have evidence that Polyuria and Polydipsia have the highest correlation and support with being diagnosed with diabetes. The evidence is supported by a correlation analysis on both positive and negative patients (records).

The claim is also supported by scientific researches that can be found in the reference section below.

Furthermore, considering the four age groups it’s evident that while all the patients experience Polyuria and Polydipsia symptoms no matter the age, the oldest group (patients more than 60 years old) also experience other illnesses like blurred vision (77%), general weakness (73%) and Polyphagia (71%). Noteworthy is also the sudden weight loss registered for 77% of patients between 50 and 60 years old.

References

https://www.healthline.com/health/diabetes/3-ps-of-diabetes
https://www.jdrf.org/t1d-resources/about/symptoms/extreme-thirst/
https://www.beatoapp.com/blog/what-is-polydipsia-is-it-a-symptom-of-diabetes/