diff --git a/charts.html b/charts.html index 3b58146..60a65ea 100644 --- a/charts.html +++ b/charts.html @@ -1,85 +1,52 @@ - - - - - - - - Koodiklinikan Palkkakysely - - - - - - - - - - - - - + + Koodiklinikan Palkkakysely + + - - - - - - - - -
- - - - - - - + - + }, 10, root) + } + })(window); + }); + }; + if (document.readyState != "loading") fn(); + else document.addEventListener("DOMContentLoaded", fn); + })(); + - \ No newline at end of file diff --git a/data.xlsx b/data.xlsx index e9cd4f2..9dd9784 100644 Binary files a/data.xlsx and b/data.xlsx differ diff --git a/index.html b/index.html index 19037e7..5275c10 100644 --- a/index.html +++ b/index.html @@ -43,7 +43,7 @@
  • Vastaukset raakamuodossa (Google Sheets) \ No newline at end of file diff --git a/profiling_report.html b/profiling_report.html index 3ee0afc..7013b47 100644 --- a/profiling_report.html +++ b/profiling_report.html @@ -1,4 +1,4 @@ -Pandas Profiling Report

    Overview

    Dataset statistics

    Number of variables15
    Number of observations500
    Missing cells1018
    Missing cells (%)13.6%
    Duplicate rows0
    Duplicate rows (%)0.0%
    Total size in memory46.9 KiB
    Average record size in memory96.1 B

    Variable types

    DateTime1
    Categorical8
    Numeric5
    Boolean1

    Warnings

    Rooli has a high cardinality: 261 distinct values High cardinality
    Työpaikka has a high cardinality: 73 distinct values High cardinality
    Vuositulot is highly correlated with Kk-tulotHigh correlation
    Kk-tulot is highly correlated with VuositulotHigh correlation
    Työpaikka is highly correlated with Vapaa sanaHigh correlation
    Kilpailukykyinen is highly correlated with Vapaa sanaHigh correlation
    Vapaa sana is highly correlated with Työpaikka and 1 other fieldsHigh correlation
    Sukupuoli has 35 (7.0%) missing values Missing
    Työaika has 19 (3.8%) missing values Missing
    Rooli has 13 (2.6%) missing values Missing
    Kuukausipalkka has 44 (8.8%) missing values Missing
    Vuositulot has 13 (2.6%) missing values Missing
    Kilpailukykyinen has 15 (3.0%) missing values Missing
    Työpaikka has 387 (77.4%) missing values Missing
    Vapaa sana has 462 (92.4%) missing values Missing
    Kk-tulot has 13 (2.6%) missing values Missing
    Vapaa sana is uniformly distributed Uniform
    Timestamp has unique values Unique

    Reproduction

    Analysis started2021-05-25 12:52:35.165247
    Analysis finished2021-05-25 12:52:41.362197
    Duration6.2 seconds
    Software versionpandas-profiling v2.11.0
    Download configurationconfig.yaml

    Variables

    Timestamp
    Date

    UNIQUE

    Distinct500
    Distinct (%)100.0%
    Missing0
    Missing (%)0.0%
    Memory size4.0 KiB
    Minimum2021-02-15 11:57:08.316000
    Maximum2021-02-27 17:49:24.789000
    2021-05-25T12:52:41.461133image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Overview

    Dataset statistics

    Number of variables15
    Number of observations500
    Missing cells1018
    Missing cells (%)13.6%
    Duplicate rows0
    Duplicate rows (%)0.0%
    Total size in memory46.9 KiB
    Average record size in memory96.1 B

    Variable types

    DateTime1
    Categorical8
    Numeric5
    Boolean1

    Alerts

    Rooli has a high cardinality: 261 distinct values High cardinality
    Työpaikka has a high cardinality: 73 distinct values High cardinality
    Työkokemus is highly correlated with Kuukausipalkka and 2 other fieldsHigh correlation
    Kuukausipalkka is highly correlated with Työkokemus and 2 other fieldsHigh correlation
    Vuositulot is highly correlated with Työkokemus and 2 other fieldsHigh correlation
    Kk-tulot is highly correlated with Työkokemus and 2 other fieldsHigh correlation
    Työkokemus is highly correlated with KuukausipalkkaHigh correlation
    Kuukausipalkka is highly correlated with Työkokemus and 2 other fieldsHigh correlation
    Vuositulot is highly correlated with Kuukausipalkka and 1 other fieldsHigh correlation
    Kk-tulot is highly correlated with Kuukausipalkka and 1 other fieldsHigh correlation
    Kuukausipalkka is highly correlated with Vuositulot and 1 other fieldsHigh correlation
    Vuositulot is highly correlated with Kuukausipalkka and 1 other fieldsHigh correlation
    Kk-tulot is highly correlated with Kuukausipalkka and 1 other fieldsHigh correlation
    Työpaikka is highly correlated with Vapaa sana and 2 other fieldsHigh correlation
    Kilpailukykyinen is highly correlated with Vapaa sanaHigh correlation
    Vapaa sana is highly correlated with Työpaikka and 1 other fieldsHigh correlation
    Kaupunki is highly correlated with TyöpaikkaHigh correlation
    Työsuhteen luonne is highly correlated with TyöpaikkaHigh correlation
    Kaupunki is highly correlated with Työsuhteen luonne and 5 other fieldsHigh correlation
    Ikä is highly correlated with Työkokemus and 2 other fieldsHigh correlation
    Sukupuoli is highly correlated with Vapaa sanaHigh correlation
    Työkokemus is highly correlated with Ikä and 5 other fieldsHigh correlation
    Työsuhteen luonne is highly correlated with Kaupunki and 4 other fieldsHigh correlation
    Työaika is highly correlated with Työpaikka and 1 other fieldsHigh correlation
    Etä is highly correlated with Vapaa sanaHigh correlation
    Kuukausipalkka is highly correlated with Kaupunki and 6 other fieldsHigh correlation
    Vuositulot is highly correlated with Kaupunki and 6 other fieldsHigh correlation
    Kilpailukykyinen is highly correlated with Kuukausipalkka and 1 other fieldsHigh correlation
    Työpaikka is highly correlated with Kaupunki and 8 other fieldsHigh correlation
    Vapaa sana is highly correlated with Kaupunki and 11 other fieldsHigh correlation
    Kk-tulot is highly correlated with Kaupunki and 6 other fieldsHigh correlation
    Sukupuoli has 35 (7.0%) missing values Missing
    Työaika has 19 (3.8%) missing values Missing
    Rooli has 13 (2.6%) missing values Missing
    Kuukausipalkka has 44 (8.8%) missing values Missing
    Vuositulot has 13 (2.6%) missing values Missing
    Kilpailukykyinen has 15 (3.0%) missing values Missing
    Työpaikka has 387 (77.4%) missing values Missing
    Vapaa sana has 462 (92.4%) missing values Missing
    Kk-tulot has 13 (2.6%) missing values Missing
    Vapaa sana is uniformly distributed Uniform
    Timestamp has unique values Unique

    Reproduction

    Analysis started2022-08-31 12:32:30.428158
    Analysis finished2022-08-31 12:32:36.919968
    Duration6.49 seconds
    Software versionpandas-profiling v3.2.0
    Download configurationconfig.json

    Variables

    Timestamp
    Date

    UNIQUE

    Distinct500
    Distinct (%)100.0%
    Missing0
    Missing (%)0.0%
    Memory size4.0 KiB
    Minimum2021-02-15 11:57:08.316000
    Maximum2021-02-27 17:49:24.789000
    2022-08-31T12:32:36.989034image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:41.686155image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:37.129698image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram with fixed size bins (bins=50)

    Kaupunki
    Categorical

    Distinct28
    Distinct (%)5.7%
    Missing5
    Missing (%)1.0%
    Memory size1.9 KiB
    PK-Seutu
    250 
    Tampere
    117 
    Turku
    47 
    Oulu
    26 
    Jyväskylä
     
    18
    Other values (23)
    37 

    Length

    Max length15
    Median length8
    Mean length7.234343434
    Min length2

    Characters and Unicode

    Total characters3581
    Distinct characters40
    Distinct categories5 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique14 ?
    Unique (%)2.8%

    Sample

    1st rowPK-Seutu
    2nd rowTurku
    3rd rowPK-Seutu
    4th rowTampere
    5th rowPK-Seutu
    ValueCountFrequency (%)
    PK-Seutu250
    50.0%
    Tampere117
    23.4%
    Turku47
     
    9.4%
    Oulu26
     
    5.2%
    Jyväskylä18
     
    3.6%
    Kuopio7
     
    1.4%
    Lontoo2
     
    0.4%
    Vaasa2
     
    0.4%
    Tallinna2
     
    0.4%
    Pori2
     
    0.4%
    Other values (18)22
     
    4.4%
    (Missing)5
     
    1.0%
    2021-05-25T12:52:42.205722image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram with fixed size bins (bins=50)

    Kaupunki
    Categorical

    HIGH CORRELATION
    HIGH CORRELATION

    Distinct28
    Distinct (%)5.7%
    Missing5
    Missing (%)1.0%
    Memory size1.9 KiB
    PK-Seutu
    250 
    Tampere
    117 
    Turku
    47 
    Oulu
    26 
    Jyväskylä
     
    18
    Other values (23)
    37 

    Length

    Max length15
    Median length8
    Mean length7.234343434
    Min length2

    Characters and Unicode

    Total characters3581
    Distinct characters40
    Distinct categories5 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique14 ?
    Unique (%)2.8%

    Sample

    1st rowPK-Seutu
    2nd rowTurku
    3rd rowPK-Seutu
    4th rowTampere
    5th rowPK-Seutu

    Common Values

    ValueCountFrequency (%)
    PK-Seutu250
    50.0%
    Tampere117
    23.4%
    Turku47
     
    9.4%
    Oulu26
     
    5.2%
    Jyväskylä18
     
    3.6%
    Kuopio7
     
    1.4%
    Lontoo2
     
    0.4%
    Vaasa2
     
    0.4%
    Tallinna2
     
    0.4%
    Pori2
     
    0.4%
    Other values (18)22
     
    4.4%
    (Missing)5
     
    1.0%

    Length

    2022-08-31T12:32:37.253813image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram of lengths of the category
    ValueCountFrequency (%)
    pk-seutu250
    50.1%
    tampere117
    23.4%
    turku47
     
    9.4%
    oulu26
     
    5.2%
    jyväskylä18
     
    3.6%
    kuopio7
     
    1.4%
    tallinna2
     
    0.4%
    lontoo2
     
    0.4%
    pori2
     
    0.4%
    hämeenlinna2
     
    0.4%
    Other values (22)26
     
    5.2%

    Most occurring characters

    ValueCountFrequency (%)
    u661
    18.5%
    e496
    13.9%
    K261
     
    7.3%
    t257
     
    7.2%
    P253
     
    7.1%
    -252
     
    7.0%
    S252
     
    7.0%
    r170
     
    4.7%
    T166
     
    4.6%
    a145
     
    4.0%
    Other values (30)668
    18.7%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter2323
    64.9%
    Uppercase Letter1001
    28.0%
    Dash Punctuation252
     
    7.0%
    Space Separator4
     
    0.1%
    Other Punctuation1
     
    < 0.1%

    Most frequent character per category

    ValueCountFrequency (%)
    u661
    28.5%
    e496
    21.4%
    t257
     
    11.1%
    r170
     
    7.3%
    a145
     
    6.2%
    p125
     
    5.4%
    m123
     
    5.3%
    k70
     
    3.0%
    l58
     
    2.5%
    ä44
     
    1.9%
    Other values (10)174
     
    7.5%
    ValueCountFrequency (%)
    K261
    26.1%
    P253
    25.3%
    S252
    25.2%
    T166
    16.6%
    O26
     
    2.6%
    J19
     
    1.9%
    L5
     
    0.5%
    E4
     
    0.4%
    V3
     
    0.3%
    H3
     
    0.3%
    Other values (7)9
     
    0.9%
    ValueCountFrequency (%)
    -252
    100.0%
    ValueCountFrequency (%)
    4
    100.0%
    ValueCountFrequency (%)
    ,1
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin3324
    92.8%
    Common257
     
    7.2%

    Most frequent character per script

    ValueCountFrequency (%)
    u661
    19.9%
    e496
    14.9%
    K261
     
    7.9%
    t257
     
    7.7%
    P253
     
    7.6%
    S252
     
    7.6%
    r170
     
    5.1%
    T166
     
    5.0%
    a145
     
    4.4%
    p125
     
    3.8%
    Other values (27)538
    16.2%
    ValueCountFrequency (%)
    -252
    98.1%
    4
     
    1.6%
    ,1
     
    0.4%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII3537
    98.8%
    None44
     
    1.2%

    Most frequent character per block

    ValueCountFrequency (%)
    u661
    18.7%
    e496
    14.0%
    K261
     
    7.4%
    t257
     
    7.3%
    P253
     
    7.2%
    -252
     
    7.1%
    S252
     
    7.1%
    r170
     
    4.8%
    T166
     
    4.7%
    a145
     
    4.1%
    Other values (29)624
    17.6%
    ValueCountFrequency (%)
    ä44
    100.0%

    Ikä
    Real number (ℝ≥0)

    Distinct7
    Distinct (%)1.4%
    Missing3
    Missing (%)0.6%
    Infinite0
    Infinite (%)0.0%
    Mean33.77464789
    Minimum23
    Maximum53
    Zeros0
    Zeros (%)0.0%
    Memory size4.0 KiB
    2021-05-25T12:52:42.345856image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram of lengths of the category
    ValueCountFrequency (%)
    pk-seutu250
    50.1%
    tampere117
    23.4%
    turku47
     
    9.4%
    oulu26
     
    5.2%
    jyväskylä18
     
    3.6%
    kuopio7
     
    1.4%
    eu2
     
    0.4%
    hämeenlinna2
     
    0.4%
    kouvola2
     
    0.4%
    lahti2
     
    0.4%
    Other values (22)26
     
    5.2%

    Most occurring characters

    ValueCountFrequency (%)
    u661
    18.5%
    e496
    13.9%
    K261
     
    7.3%
    t257
     
    7.2%
    P253
     
    7.1%
    -252
     
    7.0%
    S252
     
    7.0%
    r170
     
    4.7%
    T166
     
    4.6%
    a145
     
    4.0%
    Other values (30)668
    18.7%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter2323
    64.9%
    Uppercase Letter1001
    28.0%
    Dash Punctuation252
     
    7.0%
    Space Separator4
     
    0.1%
    Other Punctuation1
     
    < 0.1%

    Most frequent character per category

    Lowercase Letter
    ValueCountFrequency (%)
    u661
    28.5%
    e496
    21.4%
    t257
     
    11.1%
    r170
     
    7.3%
    a145
     
    6.2%
    p125
     
    5.4%
    m123
     
    5.3%
    k70
     
    3.0%
    l58
     
    2.5%
    ä44
     
    1.9%
    Other values (10)174
     
    7.5%
    Uppercase Letter
    ValueCountFrequency (%)
    K261
    26.1%
    P253
    25.3%
    S252
    25.2%
    T166
    16.6%
    O26
     
    2.6%
    J19
     
    1.9%
    L5
     
    0.5%
    E4
     
    0.4%
    H3
     
    0.3%
    V3
     
    0.3%
    Other values (7)9
     
    0.9%
    Dash Punctuation
    ValueCountFrequency (%)
    -252
    100.0%
    Space Separator
    ValueCountFrequency (%)
    4
    100.0%
    Other Punctuation
    ValueCountFrequency (%)
    ,1
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin3324
    92.8%
    Common257
     
    7.2%

    Most frequent character per script

    Latin
    ValueCountFrequency (%)
    u661
    19.9%
    e496
    14.9%
    K261
     
    7.9%
    t257
     
    7.7%
    P253
     
    7.6%
    S252
     
    7.6%
    r170
     
    5.1%
    T166
     
    5.0%
    a145
     
    4.4%
    p125
     
    3.8%
    Other values (27)538
    16.2%
    Common
    ValueCountFrequency (%)
    -252
    98.1%
    4
     
    1.6%
    ,1
     
    0.4%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII3537
    98.8%
    None44
     
    1.2%

    Most frequent character per block

    ASCII
    ValueCountFrequency (%)
    u661
    18.7%
    e496
    14.0%
    K261
     
    7.4%
    t257
     
    7.3%
    P253
     
    7.2%
    -252
     
    7.1%
    S252
     
    7.1%
    r170
     
    4.8%
    T166
     
    4.7%
    a145
     
    4.1%
    Other values (29)624
    17.6%
    None
    ValueCountFrequency (%)
    ä44
    100.0%

    Ikä
    Real number (ℝ≥0)

    HIGH CORRELATION

    Distinct7
    Distinct (%)1.4%
    Missing3
    Missing (%)0.6%
    Infinite0
    Infinite (%)0.0%
    Mean33.77464789
    Minimum23
    Maximum53
    Zeros0
    Zeros (%)0.0%
    Negative0
    Negative (%)0.0%
    Memory size4.0 KiB
    2022-08-31T12:32:37.346644image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Quantile statistics

    Minimum23
    5-th percentile23
    Q128
    median33
    Q338
    95-th percentile43
    Maximum53
    Range30
    Interquartile range (IQR)10

    Descriptive statistics

    Standard deviation6.053651351
    Coefficient of variation (CV)0.1792365496
    Kurtosis0.2306290239
    Mean33.77464789
    Median Absolute Deviation (MAD)5
    Skewness0.480434113
    Sum16786
    Variance36.64669468
    MonotocityNot monotonic
    2021-05-25T12:52:42.471596image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Quantile statistics

    Minimum23
    5-th percentile23
    Q128
    median33
    Q338
    95-th percentile43
    Maximum53
    Range30
    Interquartile range (IQR)10

    Descriptive statistics

    Standard deviation6.053651351
    Coefficient of variation (CV)0.1792365496
    Kurtosis0.2306290239
    Mean33.77464789
    Median Absolute Deviation (MAD)5
    Skewness0.480434113
    Sum16786
    Variance36.64669468
    MonotonicityNot monotonic
    2022-08-31T12:32:37.428122image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram with fixed size bins (bins=7)
    ValueCountFrequency (%)
    33170
    34.0%
    28121
    24.2%
    38106
    21.2%
    4354
     
    10.8%
    2332
     
    6.4%
    488
     
    1.6%
    536
     
    1.2%
    (Missing)3
     
    0.6%
    ValueCountFrequency (%)
    2332
     
    6.4%
    28121
    24.2%
    33170
    34.0%
    38106
    21.2%
    4354
     
    10.8%
    ValueCountFrequency (%)
    536
     
    1.2%
    488
     
    1.6%
    4354
     
    10.8%
    38106
    21.2%
    33170
    34.0%

    Sukupuoli
    Categorical

    MISSING

    Distinct3
    Distinct (%)0.6%
    Missing35
    Missing (%)7.0%
    Memory size760.0 B
    mies
    419 
    nainen
     
    37
    muu
     
    9

    Length

    Max length6
    Median length4
    Mean length4.139784946
    Min length3

    Characters and Unicode

    Total characters1925
    Distinct characters7
    Distinct categories1 ?
    Distinct scripts1 ?
    Distinct blocks1 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique0 ?
    Unique (%)0.0%

    Sample

    1st rowmies
    2nd rowmies
    3rd rowmies
    4th rowmies
    5th rowmies
    ValueCountFrequency (%)
    mies419
    83.8%
    nainen37
     
    7.4%
    muu9
     
    1.8%
    (Missing)35
     
    7.0%
    2021-05-25T12:52:42.794103image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram with fixed size bins (bins=7)
    ValueCountFrequency (%)
    33170
    34.0%
    28121
    24.2%
    38106
    21.2%
    4354
     
    10.8%
    2332
     
    6.4%
    488
     
    1.6%
    536
     
    1.2%
    (Missing)3
     
    0.6%
    ValueCountFrequency (%)
    2332
     
    6.4%
    28121
    24.2%
    33170
    34.0%
    38106
    21.2%
    4354
     
    10.8%
    488
     
    1.6%
    536
     
    1.2%
    ValueCountFrequency (%)
    536
     
    1.2%
    488
     
    1.6%
    4354
     
    10.8%
    38106
    21.2%
    33170
    34.0%
    28121
    24.2%
    2332
     
    6.4%

    Sukupuoli
    Categorical

    HIGH CORRELATION
    MISSING

    Distinct3
    Distinct (%)0.6%
    Missing35
    Missing (%)7.0%
    Memory size760.0 B
    mies
    419 
    nainen
     
    37
    muu
     
    9

    Length

    Max length6
    Median length4
    Mean length4.139784946
    Min length3

    Characters and Unicode

    Total characters1925
    Distinct characters7
    Distinct categories1 ?
    Distinct scripts1 ?
    Distinct blocks1 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique0 ?
    Unique (%)0.0%

    Sample

    1st rowmies
    2nd rowmies
    3rd rowmies
    4th rowmies
    5th rowmies

    Common Values

    ValueCountFrequency (%)
    mies419
    83.8%
    nainen37
     
    7.4%
    muu9
     
    1.8%
    (Missing)35
     
    7.0%

    Length

    2022-08-31T12:32:37.698602image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram of lengths of the category
    2021-05-25T12:52:42.903774image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram of lengths of the category

    Category Frequency Plot

    2022-08-31T12:32:37.805318image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    ValueCountFrequency (%)
    mies419
    90.1%
    nainen37
     
    8.0%
    muu9
     
    1.9%

    Most occurring characters

    ValueCountFrequency (%)
    i456
    23.7%
    e456
    23.7%
    m428
    22.2%
    s419
    21.8%
    n111
     
    5.8%
    a37
     
    1.9%
    u18
     
    0.9%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter1925
    100.0%

    Most frequent character per category

    ValueCountFrequency (%)
    i456
    23.7%
    e456
    23.7%
    m428
    22.2%
    s419
    21.8%
    n111
     
    5.8%
    a37
     
    1.9%
    u18
     
    0.9%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin1925
    100.0%

    Most frequent character per script

    ValueCountFrequency (%)
    i456
    23.7%
    e456
    23.7%
    m428
    22.2%
    s419
    21.8%
    n111
     
    5.8%
    a37
     
    1.9%
    u18
     
    0.9%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII1925
    100.0%

    Most frequent character per block

    ValueCountFrequency (%)
    i456
    23.7%
    e456
    23.7%
    m428
    22.2%
    s419
    21.8%
    n111
     
    5.8%
    a37
     
    1.9%
    u18
     
    0.9%

    Työkokemus
    Real number (ℝ≥0)

    Distinct27
    Distinct (%)5.5%
    Missing5
    Missing (%)1.0%
    Infinite0
    Infinite (%)0.0%
    Mean9.523232323
    Minimum0
    Maximum30
    Zeros4
    Zeros (%)0.8%
    Memory size4.0 KiB
    2021-05-25T12:52:43.018420image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    ValueCountFrequency (%)
    mies419
    90.1%
    nainen37
     
    8.0%
    muu9
     
    1.9%

    Most occurring characters

    ValueCountFrequency (%)
    i456
    23.7%
    e456
    23.7%
    m428
    22.2%
    s419
    21.8%
    n111
     
    5.8%
    a37
     
    1.9%
    u18
     
    0.9%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter1925
    100.0%

    Most frequent character per category

    Lowercase Letter
    ValueCountFrequency (%)
    i456
    23.7%
    e456
    23.7%
    m428
    22.2%
    s419
    21.8%
    n111
     
    5.8%
    a37
     
    1.9%
    u18
     
    0.9%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin1925
    100.0%

    Most frequent character per script

    Latin
    ValueCountFrequency (%)
    i456
    23.7%
    e456
    23.7%
    m428
    22.2%
    s419
    21.8%
    n111
     
    5.8%
    a37
     
    1.9%
    u18
     
    0.9%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII1925
    100.0%

    Most frequent character per block

    ASCII
    ValueCountFrequency (%)
    i456
    23.7%
    e456
    23.7%
    m428
    22.2%
    s419
    21.8%
    n111
     
    5.8%
    a37
     
    1.9%
    u18
     
    0.9%

    Työkokemus
    Real number (ℝ≥0)

    HIGH CORRELATION
    HIGH CORRELATION
    HIGH CORRELATION

    Distinct27
    Distinct (%)5.5%
    Missing5
    Missing (%)1.0%
    Infinite0
    Infinite (%)0.0%
    Mean9.523232323
    Minimum0
    Maximum30
    Zeros4
    Zeros (%)0.8%
    Negative0
    Negative (%)0.0%
    Memory size4.0 KiB
    2022-08-31T12:32:37.891998image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Quantile statistics

    Minimum0
    5-th percentile2
    Q15
    median9
    Q313
    95-th percentile21
    Maximum30
    Range30
    Interquartile range (IQR)8

    Descriptive statistics

    Standard deviation6.053319568
    Coefficient of variation (CV)0.6356370781
    Kurtosis-0.03938790912
    Mean9.523232323
    Median Absolute Deviation (MAD)4
    Skewness0.7271444909
    Sum4714
    Variance36.64267779
    MonotocityNot monotonic
    2021-05-25T12:52:43.172805image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Quantile statistics

    Minimum0
    5-th percentile2
    Q15
    median9
    Q313
    95-th percentile21
    Maximum30
    Range30
    Interquartile range (IQR)8

    Descriptive statistics

    Standard deviation6.053319568
    Coefficient of variation (CV)0.6356370781
    Kurtosis-0.03938790912
    Mean9.523232323
    Median Absolute Deviation (MAD)4
    Skewness0.7271444909
    Sum4714
    Variance36.64267779
    MonotonicityNot monotonic
    2022-08-31T12:32:37.995126image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram with fixed size bins (bins=27)
    ValueCountFrequency (%)
    554
     
    10.8%
    1040
     
    8.0%
    431
     
    6.2%
    730
     
    6.0%
    229
     
    5.8%
    1529
     
    5.8%
    328
     
    5.6%
    2028
     
    5.6%
    627
     
    5.4%
    825
     
    5.0%
    Other values (17)174
    34.8%
    ValueCountFrequency (%)
    04
     
    0.8%
    117
    3.4%
    229
    5.8%
    328
    5.6%
    431
    6.2%
    ValueCountFrequency (%)
    302
     
    0.4%
    256
    1.2%
    243
    0.6%
    234
    0.8%
    225
    1.0%
    Distinct3
    Distinct (%)0.6%
    Missing1
    Missing (%)0.2%
    Memory size4.0 KiB
    Työntekijä / palkollinen
    446 
    Freelancer
     
    27
    Yrittäjä
     
    26

    Length

    Max length24
    Median length24
    Mean length22.40881764
    Min length8

    Characters and Unicode

    Total characters11182
    Distinct characters20
    Distinct categories4 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique0 ?
    Unique (%)0.0%

    Sample

    1st rowTyöntekijä / palkollinen
    2nd rowTyöntekijä / palkollinen
    3rd rowTyöntekijä / palkollinen
    4th rowYrittäjä
    5th rowTyöntekijä / palkollinen
    ValueCountFrequency (%)
    Työntekijä / palkollinen446
    89.2%
    Freelancer27
     
    5.4%
    Yrittäjä26
     
    5.2%
    (Missing)1
     
    0.2%
    2021-05-25T12:52:43.501174image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram with fixed size bins (bins=27)
    ValueCountFrequency (%)
    554
     
    10.8%
    1040
     
    8.0%
    431
     
    6.2%
    730
     
    6.0%
    1529
     
    5.8%
    229
     
    5.8%
    2028
     
    5.6%
    328
     
    5.6%
    627
     
    5.4%
    1325
     
    5.0%
    Other values (17)174
    34.8%
    ValueCountFrequency (%)
    04
     
    0.8%
    117
     
    3.4%
    229
    5.8%
    328
    5.6%
    431
    6.2%
    554
    10.8%
    627
    5.4%
    730
    6.0%
    825
    5.0%
    922
    4.4%
    ValueCountFrequency (%)
    302
     
    0.4%
    256
     
    1.2%
    243
     
    0.6%
    234
     
    0.8%
    225
     
    1.0%
    217
     
    1.4%
    2028
    5.6%
    191
     
    0.2%
    182
     
    0.4%
    173
     
    0.6%

    Työsuhteen luonne
    Categorical

    HIGH CORRELATION
    HIGH CORRELATION

    Distinct3
    Distinct (%)0.6%
    Missing1
    Missing (%)0.2%
    Memory size4.0 KiB
    Työntekijä / palkollinen
    446 
    Freelancer
     
    27
    Yrittäjä
     
    26

    Length

    Max length24
    Median length24
    Mean length22.40881764
    Min length8

    Characters and Unicode

    Total characters11182
    Distinct characters20
    Distinct categories4 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique0 ?
    Unique (%)0.0%

    Sample

    1st rowTyöntekijä / palkollinen
    2nd rowTyöntekijä / palkollinen
    3rd rowTyöntekijä / palkollinen
    4th rowYrittäjä
    5th rowTyöntekijä / palkollinen

    Common Values

    ValueCountFrequency (%)
    Työntekijä / palkollinen446
    89.2%
    Freelancer27
     
    5.4%
    Yrittäjä26
     
    5.2%
    (Missing)1
     
    0.2%

    Length

    2022-08-31T12:32:38.107427image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram of lengths of the category
    2021-05-25T12:52:43.616461image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram of lengths of the category

    Category Frequency Plot

    2022-08-31T12:32:38.212076image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    ValueCountFrequency (%)
    446
    32.1%
    palkollinen446
    32.1%
    työntekijä446
    32.1%
    freelancer27
     
    1.9%
    yrittäjä26
     
    1.9%

    Most occurring characters

    ValueCountFrequency (%)
    n1365
    12.2%
    l1365
    12.2%
    e973
     
    8.7%
    i918
     
    8.2%
    k892
     
    8.0%
    892
     
    8.0%
    t498
     
    4.5%
    ä498
     
    4.5%
    a473
     
    4.2%
    j472
     
    4.2%
    Other values (10)2836
    25.4%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter9345
    83.6%
    Space Separator892
     
    8.0%
    Uppercase Letter499
     
    4.5%
    Other Punctuation446
     
    4.0%

    Most frequent character per category

    ValueCountFrequency (%)
    n1365
    14.6%
    l1365
    14.6%
    e973
    10.4%
    i918
    9.8%
    k892
    9.5%
    t498
     
    5.3%
    ä498
     
    5.3%
    a473
     
    5.1%
    j472
     
    5.1%
    y446
     
    4.8%
    Other values (5)1445
    15.5%
    ValueCountFrequency (%)
    T446
    89.4%
    F27
     
    5.4%
    Y26
     
    5.2%
    ValueCountFrequency (%)
    892
    100.0%
    ValueCountFrequency (%)
    /446
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin9844
    88.0%
    Common1338
     
    12.0%

    Most frequent character per script

    ValueCountFrequency (%)
    n1365
    13.9%
    l1365
    13.9%
    e973
    9.9%
    i918
    9.3%
    k892
    9.1%
    t498
     
    5.1%
    ä498
     
    5.1%
    a473
     
    4.8%
    j472
     
    4.8%
    T446
     
    4.5%
    Other values (8)1944
    19.7%
    ValueCountFrequency (%)
    892
    66.7%
    /446
    33.3%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII10238
    91.6%
    None944
     
    8.4%

    Most frequent character per block

    ValueCountFrequency (%)
    n1365
    13.3%
    l1365
    13.3%
    e973
    9.5%
    i918
    9.0%
    k892
    8.7%
    892
    8.7%
    t498
     
    4.9%
    a473
     
    4.6%
    j472
     
    4.6%
    T446
     
    4.4%
    Other values (8)1944
    19.0%
    ValueCountFrequency (%)
    ä498
    52.8%
    ö446
    47.2%

    Työaika
    Categorical

    MISSING

    Distinct5
    Distinct (%)1.0%
    Missing19
    Missing (%)3.8%
    Memory size4.0 KiB
    1.0
    452 
    0.8
     
    23
    0.5
     
    4
    0.7
     
    1
    0.6
     
    1

    Length

    Max length3
    Median length3
    Mean length3
    Min length3

    Characters and Unicode

    Total characters1443
    Distinct characters7
    Distinct categories2 ?
    Distinct scripts1 ?
    Distinct blocks1 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique2 ?
    Unique (%)0.4%

    Sample

    1st row1.0
    2nd row1.0
    3rd row1.0
    4th row1.0
    5th row1.0
    ValueCountFrequency (%)
    1.0452
    90.4%
    0.823
     
    4.6%
    0.54
     
    0.8%
    0.71
     
    0.2%
    0.61
     
    0.2%
    (Missing)19
     
    3.8%
    2021-05-25T12:52:43.902705image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    ValueCountFrequency (%)
    työntekijä446
    32.1%
    446
    32.1%
    palkollinen446
    32.1%
    freelancer27
     
    1.9%
    yrittäjä26
     
    1.9%

    Most occurring characters

    ValueCountFrequency (%)
    n1365
    12.2%
    l1365
    12.2%
    e973
     
    8.7%
    i918
     
    8.2%
    892
     
    8.0%
    k892
     
    8.0%
    t498
     
    4.5%
    ä498
     
    4.5%
    a473
     
    4.2%
    j472
     
    4.2%
    Other values (10)2836
    25.4%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter9345
    83.6%
    Space Separator892
     
    8.0%
    Uppercase Letter499
     
    4.5%
    Other Punctuation446
     
    4.0%

    Most frequent character per category

    Lowercase Letter
    ValueCountFrequency (%)
    n1365
    14.6%
    l1365
    14.6%
    e973
    10.4%
    i918
    9.8%
    k892
    9.5%
    t498
     
    5.3%
    ä498
     
    5.3%
    a473
     
    5.1%
    j472
     
    5.1%
    p446
     
    4.8%
    Other values (5)1445
    15.5%
    Uppercase Letter
    ValueCountFrequency (%)
    T446
    89.4%
    F27
     
    5.4%
    Y26
     
    5.2%
    Space Separator
    ValueCountFrequency (%)
    892
    100.0%
    Other Punctuation
    ValueCountFrequency (%)
    /446
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin9844
    88.0%
    Common1338
     
    12.0%

    Most frequent character per script

    Latin
    ValueCountFrequency (%)
    n1365
    13.9%
    l1365
    13.9%
    e973
    9.9%
    i918
    9.3%
    k892
    9.1%
    t498
     
    5.1%
    ä498
     
    5.1%
    a473
     
    4.8%
    j472
     
    4.8%
    p446
     
    4.5%
    Other values (8)1944
    19.7%
    Common
    ValueCountFrequency (%)
    892
    66.7%
    /446
    33.3%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII10238
    91.6%
    None944
     
    8.4%

    Most frequent character per block

    ASCII
    ValueCountFrequency (%)
    n1365
    13.3%
    l1365
    13.3%
    e973
    9.5%
    i918
    9.0%
    892
    8.7%
    k892
    8.7%
    t498
     
    4.9%
    a473
     
    4.6%
    j472
     
    4.6%
    p446
     
    4.4%
    Other values (8)1944
    19.0%
    None
    ValueCountFrequency (%)
    ä498
    52.8%
    ö446
    47.2%

    Työaika
    Categorical

    HIGH CORRELATION
    MISSING

    Distinct5
    Distinct (%)1.0%
    Missing19
    Missing (%)3.8%
    Memory size4.0 KiB
    1.0
    452 
    0.8
     
    23
    0.5
     
    4
    0.7
     
    1
    0.6
     
    1

    Length

    Max length3
    Median length3
    Mean length3
    Min length3

    Characters and Unicode

    Total characters1443
    Distinct characters7
    Distinct categories2 ?
    Distinct scripts1 ?
    Distinct blocks1 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique2 ?
    Unique (%)0.4%

    Sample

    1st row1.0
    2nd row1.0
    3rd row1.0
    4th row1.0
    5th row1.0

    Common Values

    ValueCountFrequency (%)
    1.0452
    90.4%
    0.823
     
    4.6%
    0.54
     
    0.8%
    0.71
     
    0.2%
    0.61
     
    0.2%
    (Missing)19
     
    3.8%

    Length

    2022-08-31T12:32:38.297324image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram of lengths of the category
    2021-05-25T12:52:44.005409image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram of lengths of the category

    Category Frequency Plot

    2022-08-31T12:32:38.395486image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    ValueCountFrequency (%)
    1.0452
    94.0%
    0.823
     
    4.8%
    0.54
     
    0.8%
    0.71
     
    0.2%
    0.61
     
    0.2%

    Most occurring characters

    ValueCountFrequency (%)
    .481
    33.3%
    0481
    33.3%
    1452
    31.3%
    823
     
    1.6%
    54
     
    0.3%
    71
     
    0.1%
    61
     
    0.1%

    Most occurring categories

    ValueCountFrequency (%)
    Decimal Number962
    66.7%
    Other Punctuation481
    33.3%

    Most frequent character per category

    ValueCountFrequency (%)
    0481
    50.0%
    1452
    47.0%
    823
     
    2.4%
    54
     
    0.4%
    71
     
    0.1%
    61
     
    0.1%
    ValueCountFrequency (%)
    .481
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Common1443
    100.0%

    Most frequent character per script

    ValueCountFrequency (%)
    .481
    33.3%
    0481
    33.3%
    1452
    31.3%
    823
     
    1.6%
    54
     
    0.3%
    71
     
    0.1%
    61
     
    0.1%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII1443
    100.0%

    Most frequent character per block

    ValueCountFrequency (%)
    .481
    33.3%
    0481
    33.3%
    1452
    31.3%
    823
     
    1.6%
    54
     
    0.3%
    71
     
    0.1%
    61
     
    0.1%

    Rooli
    Categorical

    HIGH CARDINALITY
    MISSING

    Distinct261
    Distinct (%)53.6%
    Missing13
    Missing (%)2.6%
    Memory size4.0 KiB
    Ohjelmistokehittäjä
    42 
    full-stack
    36 
    Full-stack
     
    25
    ohjelmistokehittäjä
     
    17
    Arkkitehti
     
    16
    Other values (256)
    351 

    Length

    Max length67
    Median length18
    Mean length19.23408624
    Min length2

    Characters and Unicode

    Total characters9367
    Distinct characters58
    Distinct categories9 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique213 ?
    Unique (%)43.7%

    Sample

    1st rowArkkitehti
    2nd rowfull-stack
    3rd rowFull-stack ohjelmistokehittäjä
    4th rowweb-arkkitehti
    5th rowOhjelmistokehittäjä
    ValueCountFrequency (%)
    Ohjelmistokehittäjä42
     
    8.4%
    full-stack36
     
    7.2%
    Full-stack25
     
    5.0%
    ohjelmistokehittäjä17
     
    3.4%
    Arkkitehti16
     
    3.2%
    Full-stack ohjelmistokehittäjä8
     
    1.6%
    full-stack ohjelmistokehittäjä7
     
    1.4%
    arkkitehti6
     
    1.2%
    Frontend6
     
    1.2%
    frontend6
     
    1.2%
    Other values (251)318
    63.6%
    (Missing)13
     
    2.6%
    2021-05-25T12:52:44.427778image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    ValueCountFrequency (%)
    1.0452
    94.0%
    0.823
     
    4.8%
    0.54
     
    0.8%
    0.71
     
    0.2%
    0.61
     
    0.2%

    Most occurring characters

    ValueCountFrequency (%)
    .481
    33.3%
    0481
    33.3%
    1452
    31.3%
    823
     
    1.6%
    54
     
    0.3%
    71
     
    0.1%
    61
     
    0.1%

    Most occurring categories

    ValueCountFrequency (%)
    Decimal Number962
    66.7%
    Other Punctuation481
    33.3%

    Most frequent character per category

    Decimal Number
    ValueCountFrequency (%)
    0481
    50.0%
    1452
    47.0%
    823
     
    2.4%
    54
     
    0.4%
    71
     
    0.1%
    61
     
    0.1%
    Other Punctuation
    ValueCountFrequency (%)
    .481
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Common1443
    100.0%

    Most frequent character per script

    Common
    ValueCountFrequency (%)
    .481
    33.3%
    0481
    33.3%
    1452
    31.3%
    823
     
    1.6%
    54
     
    0.3%
    71
     
    0.1%
    61
     
    0.1%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII1443
    100.0%

    Most frequent character per block

    ASCII
    ValueCountFrequency (%)
    .481
    33.3%
    0481
    33.3%
    1452
    31.3%
    823
     
    1.6%
    54
     
    0.3%
    71
     
    0.1%
    61
     
    0.1%

    Rooli
    Categorical

    HIGH CARDINALITY
    MISSING

    Distinct261
    Distinct (%)53.6%
    Missing13
    Missing (%)2.6%
    Memory size4.0 KiB
    Ohjelmistokehittäjä
    42 
    full-stack
    36 
    Full-stack
     
    25
    ohjelmistokehittäjä
     
    17
    Arkkitehti
     
    16
    Other values (256)
    351 

    Length

    Max length67
    Median length52
    Mean length19.23408624
    Min length2

    Characters and Unicode

    Total characters9367
    Distinct characters58
    Distinct categories9 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique213 ?
    Unique (%)43.7%

    Sample

    1st rowArkkitehti
    2nd rowfull-stack
    3rd rowFull-stack ohjelmistokehittäjä
    4th rowweb-arkkitehti
    5th rowOhjelmistokehittäjä

    Common Values

    ValueCountFrequency (%)
    Ohjelmistokehittäjä42
     
    8.4%
    full-stack36
     
    7.2%
    Full-stack25
     
    5.0%
    ohjelmistokehittäjä17
     
    3.4%
    Arkkitehti16
     
    3.2%
    Full-stack ohjelmistokehittäjä8
     
    1.6%
    full-stack ohjelmistokehittäjä7
     
    1.4%
    arkkitehti6
     
    1.2%
    Frontend6
     
    1.2%
    frontend6
     
    1.2%
    Other values (251)318
    63.6%
    (Missing)13
     
    2.6%

    Length

    2022-08-31T12:32:38.518059image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram of lengths of the category
    ValueCountFrequency (%)
    full-stack145
     
    16.1%
    ohjelmistokehittäjä115
     
    12.8%
    developer61
     
    6.8%
    arkkitehti36
     
    4.0%
    35
     
    3.9%
    lead33
     
    3.7%
    frontend28
     
    3.1%
    senior21
     
    2.3%
    backend17
     
    1.9%
    kehittäjä16
     
    1.8%
    Other values (196)393
    43.7%

    Most occurring characters

    ValueCountFrequency (%)
    t975
     
    10.4%
    e862
     
    9.2%
    l683
     
    7.3%
    i679
     
    7.2%
    k517
     
    5.5%
    o489
     
    5.2%
    s449
     
    4.8%
    a448
     
    4.8%
    419
     
    4.5%
    h374
     
    4.0%
    Other values (48)3472
    37.1%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter8134
    86.8%
    Uppercase Letter474
     
    5.1%
    Space Separator420
     
    4.5%
    Dash Punctuation177
     
    1.9%
    Other Punctuation99
     
    1.1%
    Open Punctuation27
     
    0.3%
    Close Punctuation27
     
    0.3%
    Math Symbol8
     
    0.1%
    Decimal Number1
     
    < 0.1%

    Most frequent character per category

    ValueCountFrequency (%)
    t975
    12.0%
    e862
     
    10.6%
    l683
     
    8.4%
    i679
     
    8.3%
    k517
     
    6.4%
    o489
     
    6.0%
    s449
     
    5.5%
    a448
     
    5.5%
    h374
     
    4.6%
    j355
     
    4.4%
    Other values (16)2303
    28.3%
    ValueCountFrequency (%)
    F107
    22.6%
    O99
    20.9%
    S52
    11.0%
    D42
     
    8.9%
    A28
     
    5.9%
    T28
     
    5.9%
    L21
     
    4.4%
    C18
     
    3.8%
    E12
     
    2.5%
    P11
     
    2.3%
    Other values (11)56
    11.8%
    ValueCountFrequency (%)
    ,53
    53.5%
    /42
    42.4%
    &3
     
    3.0%
    .1
     
    1.0%
    ValueCountFrequency (%)
    419
    99.8%
     1
     
    0.2%
    ValueCountFrequency (%)
    -177
    100.0%
    ValueCountFrequency (%)
    (27
    100.0%
    ValueCountFrequency (%)
    )27
    100.0%
    ValueCountFrequency (%)
    +8
    100.0%
    ValueCountFrequency (%)
    11
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin8608
    91.9%
    Common759
     
    8.1%

    Most frequent character per script

    ValueCountFrequency (%)
    t975
     
    11.3%
    e862
     
    10.0%
    l683
     
    7.9%
    i679
     
    7.9%
    k517
     
    6.0%
    o489
     
    5.7%
    s449
     
    5.2%
    a448
     
    5.2%
    h374
     
    4.3%
    j355
     
    4.1%
    Other values (37)2777
    32.3%
    ValueCountFrequency (%)
    419
    55.2%
    -177
    23.3%
    ,53
     
    7.0%
    /42
     
    5.5%
    (27
     
    3.6%
    )27
     
    3.6%
    +8
     
    1.1%
    &3
     
    0.4%
    .1
     
    0.1%
     1
     
    0.1%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII9013
    96.2%
    None354
     
    3.8%

    Most frequent character per block

    ValueCountFrequency (%)
    t975
     
    10.8%
    e862
     
    9.6%
    l683
     
    7.6%
    i679
     
    7.5%
    k517
     
    5.7%
    o489
     
    5.4%
    s449
     
    5.0%
    a448
     
    5.0%
    419
     
    4.6%
    h374
     
    4.1%
    Other values (45)3118
    34.6%
    ValueCountFrequency (%)
    ä337
    95.2%
    ö16
     
    4.5%
     1
     
    0.3%

    Etä
    Categorical

    Distinct3
    Distinct (%)0.6%
    Missing3
    Missing (%)0.6%
    Memory size760.0 B
    Etä
    208 
    Toimisto
    173 
    50/50
    116 

    Length

    Max length8
    Median length5
    Mean length5.207243461
    Min length3

    Characters and Unicode

    Total characters2588
    Distinct characters11
    Distinct categories4 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique0 ?
    Unique (%)0.0%

    Sample

    1st row50/50
    2nd rowEtä
    3rd rowEtä
    4th rowEtä
    5th rowEtä
    ValueCountFrequency (%)
    Etä208
    41.6%
    Toimisto173
    34.6%
    50/50116
    23.2%
    (Missing)3
     
    0.6%
    2021-05-25T12:52:44.913270image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram of lengths of the category
    ValueCountFrequency (%)
    full-stack145
     
    16.1%
    ohjelmistokehittäjä115
     
    12.8%
    developer61
     
    6.8%
    arkkitehti36
     
    4.0%
    35
     
    3.9%
    lead33
     
    3.7%
    frontend28
     
    3.1%
    senior21
     
    2.3%
    backend17
     
    1.9%
    kehittäjä16
     
    1.8%
    Other values (196)393
    43.7%

    Most occurring characters

    ValueCountFrequency (%)
    t975
     
    10.4%
    e862
     
    9.2%
    l683
     
    7.3%
    i679
     
    7.2%
    k517
     
    5.5%
    o489
     
    5.2%
    s449
     
    4.8%
    a448
     
    4.8%
    419
     
    4.5%
    h374
     
    4.0%
    Other values (48)3472
    37.1%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter8134
    86.8%
    Uppercase Letter474
     
    5.1%
    Space Separator420
     
    4.5%
    Dash Punctuation177
     
    1.9%
    Other Punctuation99
     
    1.1%
    Open Punctuation27
     
    0.3%
    Close Punctuation27
     
    0.3%
    Math Symbol8
     
    0.1%
    Decimal Number1
     
    < 0.1%

    Most frequent character per category

    Lowercase Letter
    ValueCountFrequency (%)
    t975
    12.0%
    e862
     
    10.6%
    l683
     
    8.4%
    i679
     
    8.3%
    k517
     
    6.4%
    o489
     
    6.0%
    s449
     
    5.5%
    a448
     
    5.5%
    h374
     
    4.6%
    j355
     
    4.4%
    Other values (16)2303
    28.3%
    Uppercase Letter
    ValueCountFrequency (%)
    F107
    22.6%
    O99
    20.9%
    S52
    11.0%
    D42
     
    8.9%
    T28
     
    5.9%
    A28
     
    5.9%
    L21
     
    4.4%
    C18
     
    3.8%
    E12
     
    2.5%
    P11
     
    2.3%
    Other values (11)56
    11.8%
    Other Punctuation
    ValueCountFrequency (%)
    ,53
    53.5%
    /42
    42.4%
    &3
     
    3.0%
    .1
     
    1.0%
    Space Separator
    ValueCountFrequency (%)
    419
    99.8%
     1
     
    0.2%
    Dash Punctuation
    ValueCountFrequency (%)
    -177
    100.0%
    Open Punctuation
    ValueCountFrequency (%)
    (27
    100.0%
    Close Punctuation
    ValueCountFrequency (%)
    )27
    100.0%
    Math Symbol
    ValueCountFrequency (%)
    +8
    100.0%
    Decimal Number
    ValueCountFrequency (%)
    11
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin8608
    91.9%
    Common759
     
    8.1%

    Most frequent character per script

    Latin
    ValueCountFrequency (%)
    t975
     
    11.3%
    e862
     
    10.0%
    l683
     
    7.9%
    i679
     
    7.9%
    k517
     
    6.0%
    o489
     
    5.7%
    s449
     
    5.2%
    a448
     
    5.2%
    h374
     
    4.3%
    j355
     
    4.1%
    Other values (37)2777
    32.3%
    Common
    ValueCountFrequency (%)
    419
    55.2%
    -177
    23.3%
    ,53
     
    7.0%
    /42
     
    5.5%
    (27
     
    3.6%
    )27
     
    3.6%
    +8
     
    1.1%
    &3
     
    0.4%
    11
     
    0.1%
     1
     
    0.1%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII9013
    96.2%
    None354
     
    3.8%

    Most frequent character per block

    ASCII
    ValueCountFrequency (%)
    t975
     
    10.8%
    e862
     
    9.6%
    l683
     
    7.6%
    i679
     
    7.5%
    k517
     
    5.7%
    o489
     
    5.4%
    s449
     
    5.0%
    a448
     
    5.0%
    419
     
    4.6%
    h374
     
    4.1%
    Other values (45)3118
    34.6%
    None
    ValueCountFrequency (%)
    ä337
    95.2%
    ö16
     
    4.5%
     1
     
    0.3%

    Etä
    Categorical

    HIGH CORRELATION

    Distinct3
    Distinct (%)0.6%
    Missing3
    Missing (%)0.6%
    Memory size760.0 B
    Etä
    208 
    Toimisto
    173 
    50/50
    116 

    Length

    Max length8
    Median length5
    Mean length5.207243461
    Min length3

    Characters and Unicode

    Total characters2588
    Distinct characters11
    Distinct categories4 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique0 ?
    Unique (%)0.0%

    Sample

    1st row50/50
    2nd rowEtä
    3rd rowEtä
    4th rowEtä
    5th rowEtä

    Common Values

    ValueCountFrequency (%)
    Etä208
    41.6%
    Toimisto173
    34.6%
    50/50116
    23.2%
    (Missing)3
     
    0.6%

    Length

    2022-08-31T12:32:38.645932image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram of lengths of the category
    2021-05-25T12:52:45.022158image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram of lengths of the category

    Category Frequency Plot

    2022-08-31T12:32:38.749820image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    ValueCountFrequency (%)
    etä208
    41.9%
    toimisto173
    34.8%
    50/50116
    23.3%

    Most occurring characters

    ValueCountFrequency (%)
    t381
    14.7%
    o346
    13.4%
    i346
    13.4%
    5232
    9.0%
    0232
    9.0%
    E208
    8.0%
    ä208
    8.0%
    T173
    6.7%
    m173
    6.7%
    s173
    6.7%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter1627
    62.9%
    Decimal Number464
     
    17.9%
    Uppercase Letter381
     
    14.7%
    Other Punctuation116
     
    4.5%

    Most frequent character per category

    ValueCountFrequency (%)
    t381
    23.4%
    o346
    21.3%
    i346
    21.3%
    ä208
    12.8%
    m173
    10.6%
    s173
    10.6%
    ValueCountFrequency (%)
    5232
    50.0%
    0232
    50.0%
    ValueCountFrequency (%)
    E208
    54.6%
    T173
    45.4%
    ValueCountFrequency (%)
    /116
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin2008
    77.6%
    Common580
     
    22.4%

    Most frequent character per script

    ValueCountFrequency (%)
    t381
    19.0%
    o346
    17.2%
    i346
    17.2%
    E208
    10.4%
    ä208
    10.4%
    T173
    8.6%
    m173
    8.6%
    s173
    8.6%
    ValueCountFrequency (%)
    5232
    40.0%
    0232
    40.0%
    /116
    20.0%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII2380
    92.0%
    None208
     
    8.0%

    Most frequent character per block

    ValueCountFrequency (%)
    t381
    16.0%
    o346
    14.5%
    i346
    14.5%
    5232
    9.7%
    0232
    9.7%
    E208
    8.7%
    T173
    7.3%
    m173
    7.3%
    s173
    7.3%
    /116
     
    4.9%
    ValueCountFrequency (%)
    ä208
    100.0%

    Kuukausipalkka
    Real number (ℝ≥0)

    MISSING

    Distinct130
    Distinct (%)28.5%
    Missing44
    Missing (%)8.8%
    Infinite0
    Infinite (%)0.0%
    Mean4671.388158
    Minimum1081
    Maximum15000
    Zeros0
    Zeros (%)0.0%
    Memory size4.0 KiB
    2021-05-25T12:52:45.152174image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    ValueCountFrequency (%)
    etä208
    41.9%
    toimisto173
    34.8%
    50/50116
    23.3%

    Most occurring characters

    ValueCountFrequency (%)
    t381
    14.7%
    o346
    13.4%
    i346
    13.4%
    5232
    9.0%
    0232
    9.0%
    E208
    8.0%
    ä208
    8.0%
    T173
    6.7%
    m173
    6.7%
    s173
    6.7%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter1627
    62.9%
    Decimal Number464
     
    17.9%
    Uppercase Letter381
     
    14.7%
    Other Punctuation116
     
    4.5%

    Most frequent character per category

    Lowercase Letter
    ValueCountFrequency (%)
    t381
    23.4%
    o346
    21.3%
    i346
    21.3%
    ä208
    12.8%
    m173
    10.6%
    s173
    10.6%
    Decimal Number
    ValueCountFrequency (%)
    5232
    50.0%
    0232
    50.0%
    Uppercase Letter
    ValueCountFrequency (%)
    E208
    54.6%
    T173
    45.4%
    Other Punctuation
    ValueCountFrequency (%)
    /116
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin2008
    77.6%
    Common580
     
    22.4%

    Most frequent character per script

    Latin
    ValueCountFrequency (%)
    t381
    19.0%
    o346
    17.2%
    i346
    17.2%
    E208
    10.4%
    ä208
    10.4%
    T173
    8.6%
    m173
    8.6%
    s173
    8.6%
    Common
    ValueCountFrequency (%)
    5232
    40.0%
    0232
    40.0%
    /116
    20.0%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII2380
    92.0%
    None208
     
    8.0%

    Most frequent character per block

    ASCII
    ValueCountFrequency (%)
    t381
    16.0%
    o346
    14.5%
    i346
    14.5%
    5232
    9.7%
    0232
    9.7%
    E208
    8.7%
    T173
    7.3%
    m173
    7.3%
    s173
    7.3%
    /116
     
    4.9%
    None
    ValueCountFrequency (%)
    ä208
    100.0%

    Kuukausipalkka
    Real number (ℝ≥0)

    HIGH CORRELATION
    HIGH CORRELATION
    HIGH CORRELATION
    HIGH CORRELATION
    MISSING

    Distinct130
    Distinct (%)28.5%
    Missing44
    Missing (%)8.8%
    Infinite0
    Infinite (%)0.0%
    Mean4671.388158
    Minimum1081
    Maximum15000
    Zeros0
    Zeros (%)0.0%
    Negative0
    Negative (%)0.0%
    Memory size4.0 KiB
    2022-08-31T12:32:38.853601image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Quantile statistics

    Minimum1081
    5-th percentile2792.5
    Q13800
    median4500
    Q35477.5
    95-th percentile7000
    Maximum15000
    Range13919
    Interquartile range (IQR)1677.5

    Descriptive statistics

    Standard deviation1443.054453
    Coefficient of variation (CV)0.3089134117
    Kurtosis7.900697718
    Mean4671.388158
    Median Absolute Deviation (MAD)765.5
    Skewness1.62359699
    Sum2130153
    Variance2082406.154
    MonotocityNot monotonic
    2021-05-25T12:52:45.338491image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Quantile statistics

    Minimum1081
    5-th percentile2792.5
    Q13800
    median4500
    Q35477.5
    95-th percentile7000
    Maximum15000
    Range13919
    Interquartile range (IQR)1677.5

    Descriptive statistics

    Standard deviation1443.054453
    Coefficient of variation (CV)0.3089134117
    Kurtosis7.900697718
    Mean4671.388158
    Median Absolute Deviation (MAD)765.5
    Skewness1.62359699
    Sum2130153
    Variance2082406.154
    MonotonicityNot monotonic
    2022-08-31T12:32:38.978123image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram with fixed size bins (bins=50)
    ValueCountFrequency (%)
    400026
     
    5.2%
    450024
     
    4.8%
    500018
     
    3.6%
    550017
     
    3.4%
    600017
     
    3.4%
    480013
     
    2.6%
    430013
     
    2.6%
    420012
     
    2.4%
    380012
     
    2.4%
    300012
     
    2.4%
    Other values (120)292
    58.4%
    (Missing)44
     
    8.8%
    ValueCountFrequency (%)
    10811
    0.2%
    11001
    0.2%
    16661
    0.2%
    17001
    0.2%
    18001
    0.2%
    ValueCountFrequency (%)
    150001
    0.2%
    120002
    0.4%
    93001
    0.2%
    85002
    0.4%
    82001
    0.2%

    Vuositulot
    Real number (ℝ≥0)

    HIGH CORRELATION
    MISSING

    Distinct185
    Distinct (%)38.0%
    Missing13
    Missing (%)2.6%
    Infinite0
    Infinite (%)0.0%
    Mean65593.46304
    Minimum0
    Maximum300000
    Zeros2
    Zeros (%)0.4%
    Memory size4.0 KiB
    2021-05-25T12:52:45.540299image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram with fixed size bins (bins=50)
    ValueCountFrequency (%)
    400026
     
    5.2%
    450024
     
    4.8%
    500018
     
    3.6%
    600017
     
    3.4%
    550017
     
    3.4%
    480013
     
    2.6%
    430013
     
    2.6%
    300012
     
    2.4%
    420012
     
    2.4%
    380012
     
    2.4%
    Other values (120)292
    58.4%
    (Missing)44
     
    8.8%
    ValueCountFrequency (%)
    10811
     
    0.2%
    11001
     
    0.2%
    16661
     
    0.2%
    17001
     
    0.2%
    18001
     
    0.2%
    21001
     
    0.2%
    22001
     
    0.2%
    22751
     
    0.2%
    23001
     
    0.2%
    24003
    0.6%
    ValueCountFrequency (%)
    150001
     
    0.2%
    120002
     
    0.4%
    93001
     
    0.2%
    85002
     
    0.4%
    82001
     
    0.2%
    80006
    1.2%
    75003
     
    0.6%
    72001
     
    0.2%
    700011
    2.2%
    69561
     
    0.2%

    Vuositulot
    Real number (ℝ≥0)

    HIGH CORRELATION
    HIGH CORRELATION
    HIGH CORRELATION
    HIGH CORRELATION
    MISSING

    Distinct185
    Distinct (%)38.0%
    Missing13
    Missing (%)2.6%
    Infinite0
    Infinite (%)0.0%
    Mean65593.46304
    Minimum0
    Maximum300000
    Zeros2
    Zeros (%)0.4%
    Negative0
    Negative (%)0.0%
    Memory size4.0 KiB
    2022-08-31T12:32:39.112226image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Quantile statistics

    Minimum0
    5-th percentile34020
    Q149562.5
    median58750
    Q375000
    95-th percentile123500
    Maximum300000
    Range300000
    Interquartile range (IQR)25437.5

    Descriptive statistics

    Standard deviation31817.79458
    Coefficient of variation (CV)0.4850756937
    Kurtosis11.75121598
    Mean65593.46304
    Median Absolute Deviation (MAD)11750
    Skewness2.645875828
    Sum31944016.5
    Variance1012372052
    MonotocityNot monotonic
    2021-05-25T12:52:45.740075image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Quantile statistics

    Minimum0
    5-th percentile34020
    Q149562.5
    median58750
    Q375000
    95-th percentile123500
    Maximum300000
    Range300000
    Interquartile range (IQR)25437.5

    Descriptive statistics

    Standard deviation31817.79458
    Coefficient of variation (CV)0.4850756937
    Kurtosis11.75121598
    Mean65593.46304
    Median Absolute Deviation (MAD)11750
    Skewness2.645875828
    Sum31944016.5
    Variance1012372052
    MonotonicityNot monotonic
    2022-08-31T12:32:39.240435image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram with fixed size bins (bins=50)
    ValueCountFrequency (%)
    5500018
     
    3.6%
    5000018
     
    3.6%
    7500017
     
    3.4%
    6000014
     
    2.8%
    7000011
     
    2.2%
    8500011
     
    2.2%
    6250010
     
    2.0%
    3750010
     
    2.0%
    6500010
     
    2.0%
    5400010
     
    2.0%
    Other values (175)358
    71.6%
    (Missing)13
     
    2.6%
    ValueCountFrequency (%)
    02
    0.4%
    40001
    0.2%
    61001
    0.2%
    75001
    0.2%
    137501
    0.2%
    ValueCountFrequency (%)
    3000001
     
    0.2%
    2500001
     
    0.2%
    2200001
     
    0.2%
    2000004
    0.8%
    1900001
     
    0.2%

    Kilpailukykyinen
    Boolean

    HIGH CORRELATION
    MISSING

    Distinct2
    Distinct (%)0.4%
    Missing15
    Missing (%)3.0%
    Memory size4.0 KiB
    True
    329 
    False
    156 
    (Missing)
     
    15
    ValueCountFrequency (%)
    True329
    65.8%
    False156
    31.2%
    (Missing)15
     
    3.0%
    2021-05-25T12:52:45.887939image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram with fixed size bins (bins=50)
    ValueCountFrequency (%)
    5500018
     
    3.6%
    5000018
     
    3.6%
    7500017
     
    3.4%
    6000014
     
    2.8%
    7000011
     
    2.2%
    8500011
     
    2.2%
    6500010
     
    2.0%
    6250010
     
    2.0%
    5400010
     
    2.0%
    3750010
     
    2.0%
    Other values (175)358
    71.6%
    (Missing)13
     
    2.6%
    ValueCountFrequency (%)
    02
    0.4%
    40001
    0.2%
    61001
    0.2%
    75001
    0.2%
    137501
    0.2%
    140001
    0.2%
    200001
    0.2%
    220001
    0.2%
    225001
    0.2%
    250001
    0.2%
    ValueCountFrequency (%)
    3000001
     
    0.2%
    2500001
     
    0.2%
    2200001
     
    0.2%
    2000004
    0.8%
    1900001
     
    0.2%
    1800001
     
    0.2%
    1650001
     
    0.2%
    1573001
     
    0.2%
    1550001
     
    0.2%
    1500001
     
    0.2%

    Kilpailukykyinen
    Boolean

    HIGH CORRELATION
    HIGH CORRELATION
    MISSING

    Distinct2
    Distinct (%)0.4%
    Missing15
    Missing (%)3.0%
    Memory size4.0 KiB
    True
    329 
    False
    156 
    (Missing)
     
    15
    ValueCountFrequency (%)
    True329
    65.8%
    False156
    31.2%
    (Missing)15
     
    3.0%
    2022-08-31T12:32:39.359925image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Työpaikka
    Categorical

    HIGH CARDINALITY
    HIGH CORRELATION
    MISSING

    Distinct73
    Distinct (%)64.6%
    Missing387
    Missing (%)77.4%
    Memory size4.0 KiB
    Gofore
    12 
    Vincit
     
    8
    Futurice
     
    5
    Mavericks
     
    4
    Fraktio
     
    4
    Other values (68)
    80 

    Length

    Max length132
    Median length7
    Mean length10.15044248
    Min length2

    Characters and Unicode

    Total characters1147
    Distinct characters54
    Distinct categories5 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique59 ?
    Unique (%)52.2%

    Sample

    1st rowQuestrade
    2nd rowDigiaj
    3rd rowGofore
    4th rowOura Health
    5th rowWirepas
    ValueCountFrequency (%)
    Gofore12
     
    2.4%
    Vincit8
     
    1.6%
    Futurice5
     
    1.0%
    Mavericks4
     
    0.8%
    Fraktio4
     
    0.8%
    Pankki3
     
    0.6%
    Arado3
     
    0.6%
    Siili3
     
    0.6%
    Compile2
     
    0.4%
    If2
     
    0.4%
    Other values (63)67
     
    13.4%
    (Missing)387
    77.4%
    2021-05-25T12:52:46.246276image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Työpaikka
    Categorical

    HIGH CARDINALITY
    HIGH CORRELATION
    HIGH CORRELATION
    MISSING

    Distinct73
    Distinct (%)64.6%
    Missing387
    Missing (%)77.4%
    Memory size4.0 KiB
    Gofore
    12 
    Vincit
     
    8
    Futurice
     
    5
    Fraktio
     
    4
    Mavericks
     
    4
    Other values (68)
    80 

    Length

    Max length132
    Median length28
    Mean length10.15044248
    Min length2

    Characters and Unicode

    Total characters1147
    Distinct characters54
    Distinct categories5 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique59 ?
    Unique (%)52.2%

    Sample

    1st rowQuestrade
    2nd rowDigiaj
    3rd rowGofore
    4th rowOura Health
    5th rowWirepas

    Common Values

    ValueCountFrequency (%)
    Gofore12
     
    2.4%
    Vincit8
     
    1.6%
    Futurice5
     
    1.0%
    Fraktio4
     
    0.8%
    Mavericks4
     
    0.8%
    Pankki3
     
    0.6%
    Siili3
     
    0.6%
    Arado3
     
    0.6%
    Qvik2
     
    0.4%
    KVTES-alainen kunnan omistama 2
     
    0.4%
    Other values (63)67
     
    13.4%
    (Missing)387
    77.4%

    Length

    2022-08-31T12:32:39.456875image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram of lengths of the category
    ValueCountFrequency (%)
    gofore12
     
    7.5%
    vincit8
     
    5.0%
    mavericks6
     
    3.7%
    siili5
     
    3.1%
    futurice5
     
    3.1%
    fraktio4
     
    2.5%
    if3
     
    1.9%
    omistama3
     
    1.9%
    konsulttitalo3
     
    1.9%
    pankki3
     
    1.9%
    Other values (96)109
    67.7%

    Most occurring characters

    ValueCountFrequency (%)
    i128
     
    11.2%
    a89
     
    7.8%
    o89
     
    7.8%
    e86
     
    7.5%
    t82
     
    7.1%
    r63
     
    5.5%
    n59
     
    5.1%
    51
     
    4.4%
    k49
     
    4.3%
    l47
     
    4.1%
    Other values (44)404
    35.2%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter955
    83.3%
    Uppercase Letter135
     
    11.8%
    Space Separator51
     
    4.4%
    Other Punctuation3
     
    0.3%
    Dash Punctuation3
     
    0.3%

    Most frequent character per category

    ValueCountFrequency (%)
    i128
    13.4%
    a89
    9.3%
    o89
    9.3%
    e86
     
    9.0%
    t82
     
    8.6%
    r63
     
    6.6%
    n59
     
    6.2%
    k49
     
    5.1%
    l47
     
    4.9%
    u45
     
    4.7%
    Other values (16)218
    22.8%
    ValueCountFrequency (%)
    G15
     
    11.1%
    S15
     
    11.1%
    V14
     
    10.4%
    F10
     
    7.4%
    K8
     
    5.9%
    A7
     
    5.2%
    M7
     
    5.2%
    C6
     
    4.4%
    P6
     
    4.4%
    T6
     
    4.4%
    Other values (15)41
    30.4%
    ValueCountFrequency (%)
    51
    100.0%
    ValueCountFrequency (%)
    .3
    100.0%
    ValueCountFrequency (%)
    -3
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin1090
    95.0%
    Common57
     
    5.0%

    Most frequent character per script

    ValueCountFrequency (%)
    i128
     
    11.7%
    a89
     
    8.2%
    o89
     
    8.2%
    e86
     
    7.9%
    t82
     
    7.5%
    r63
     
    5.8%
    n59
     
    5.4%
    k49
     
    4.5%
    l47
     
    4.3%
    u45
     
    4.1%
    Other values (41)353
    32.4%
    ValueCountFrequency (%)
    51
    89.5%
    .3
     
    5.3%
    -3
     
    5.3%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII1135
    99.0%
    None12
     
    1.0%

    Most frequent character per block

    ValueCountFrequency (%)
    i128
     
    11.3%
    a89
     
    7.8%
    o89
     
    7.8%
    e86
     
    7.6%
    t82
     
    7.2%
    r63
     
    5.6%
    n59
     
    5.2%
    51
     
    4.5%
    k49
     
    4.3%
    l47
     
    4.1%
    Other values (42)392
    34.5%
    ValueCountFrequency (%)
    ä11
    91.7%
    ö1
     
    8.3%

    Vapaa sana
    Categorical

    HIGH CORRELATION
    MISSING
    UNIFORM

    Distinct37
    Distinct (%)97.4%
    Missing462
    Missing (%)92.4%
    Memory size4.0 KiB
    palkan lisänä lounas- ja virkistysetu
     
    2
    Teen 80% työaikaa jotta ehtisin harrastaa kaikenlaista työnteon lisäksi
     
    1
    Rahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu
     
    1
    Vaikea vastata henkilönä joka tekee yrityksen kautta yhdelle ulkomaalaiselle yritykselle töitä (jolla ei ole entiteettiä suomessa). Vastasin nyt ikään kuin olisin yrittäjä vaikka käytännössä tämä on sama kuin olisin palkkaduunissa.
     
    1
    olen sekä päivätyöläinen että friikku. jospa nyt kuitenki vois valita monta?
     
    1
    Other values (32)
    32 

    Length

    Max length286
    Median length73
    Mean length95.57894737
    Min length7

    Characters and Unicode

    Total characters3632
    Distinct characters56
    Distinct categories9 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique36 ?
    Unique (%)94.7%

    Sample

    1st rowKuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.
    2nd rowTyöskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.
    3rd rowpalkan lisäksi kompensaatioon kuuluu varsin runsas ja suomen it-alalla uniikki etupaketti. pelkkä palkka ei välttämättä ole kilpailukykyinen, mutta koko kompensaatio yleisesti työstäni on ehdottomasti kilpailukykyinen.
    4th rowRahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu
    5th rowOsittain laskutukseen perustuva palkka joten vaihtelee.
    ValueCountFrequency (%)
    palkan lisänä lounas- ja virkistysetu2
     
    0.4%
    Teen 80% työaikaa jotta ehtisin harrastaa kaikenlaista työnteon lisäksi1
     
    0.2%
    Rahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu1
     
    0.2%
    Vaikea vastata henkilönä joka tekee yrityksen kautta yhdelle ulkomaalaiselle yritykselle töitä (jolla ei ole entiteettiä suomessa). Vastasin nyt ikään kuin olisin yrittäjä vaikka käytännössä tämä on sama kuin olisin palkkaduunissa.1
     
    0.2%
    olen sekä päivätyöläinen että friikku. jospa nyt kuitenki vois valita monta?1
     
    0.2%
    Vuositulot pitää sisällään myös sivutoimisena tehtyä pientä laskutusta.1
     
    0.2%
    Ihan OK. Edut myös kovat.1
     
    0.2%
    + merkittävä optiopaketti1
     
    0.2%
    Pakettiin kuuluu reilu määrä optioita ja palkka nousee (ja laskee) firman liikevaihdon myötä.1
     
    0.2%
    Osittain laskutukseen perustuva palkka joten vaihtelee.1
     
    0.2%
    Other values (27)27
     
    5.4%
    (Missing)462
    92.4%
    2021-05-25T12:52:46.650441image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram of lengths of the category
    ValueCountFrequency (%)
    gofore12
     
    7.5%
    vincit8
     
    5.0%
    mavericks6
     
    3.7%
    futurice5
     
    3.1%
    siili5
     
    3.1%
    fraktio4
     
    2.5%
    if3
     
    1.9%
    pankki3
     
    1.9%
    arado3
     
    1.9%
    konsulttitalo3
     
    1.9%
    Other values (96)109
    67.7%

    Most occurring characters

    ValueCountFrequency (%)
    i128
     
    11.2%
    a89
     
    7.8%
    o89
     
    7.8%
    e86
     
    7.5%
    t82
     
    7.1%
    r63
     
    5.5%
    n59
     
    5.1%
    51
     
    4.4%
    k49
     
    4.3%
    l47
     
    4.1%
    Other values (44)404
    35.2%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter955
    83.3%
    Uppercase Letter135
     
    11.8%
    Space Separator51
     
    4.4%
    Dash Punctuation3
     
    0.3%
    Other Punctuation3
     
    0.3%

    Most frequent character per category

    Lowercase Letter
    ValueCountFrequency (%)
    i128
    13.4%
    a89
    9.3%
    o89
    9.3%
    e86
     
    9.0%
    t82
     
    8.6%
    r63
     
    6.6%
    n59
     
    6.2%
    k49
     
    5.1%
    l47
     
    4.9%
    u45
     
    4.7%
    Other values (16)218
    22.8%
    Uppercase Letter
    ValueCountFrequency (%)
    S15
     
    11.1%
    G15
     
    11.1%
    V14
     
    10.4%
    F10
     
    7.4%
    K8
     
    5.9%
    A7
     
    5.2%
    M7
     
    5.2%
    P6
     
    4.4%
    T6
     
    4.4%
    C6
     
    4.4%
    Other values (15)41
    30.4%
    Space Separator
    ValueCountFrequency (%)
    51
    100.0%
    Dash Punctuation
    ValueCountFrequency (%)
    -3
    100.0%
    Other Punctuation
    ValueCountFrequency (%)
    .3
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin1090
    95.0%
    Common57
     
    5.0%

    Most frequent character per script

    Latin
    ValueCountFrequency (%)
    i128
     
    11.7%
    a89
     
    8.2%
    o89
     
    8.2%
    e86
     
    7.9%
    t82
     
    7.5%
    r63
     
    5.8%
    n59
     
    5.4%
    k49
     
    4.5%
    l47
     
    4.3%
    u45
     
    4.1%
    Other values (41)353
    32.4%
    Common
    ValueCountFrequency (%)
    51
    89.5%
    -3
     
    5.3%
    .3
     
    5.3%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII1135
    99.0%
    None12
     
    1.0%

    Most frequent character per block

    ASCII
    ValueCountFrequency (%)
    i128
     
    11.3%
    a89
     
    7.8%
    o89
     
    7.8%
    e86
     
    7.6%
    t82
     
    7.2%
    r63
     
    5.6%
    n59
     
    5.2%
    51
     
    4.5%
    k49
     
    4.3%
    l47
     
    4.1%
    Other values (42)392
    34.5%
    None
    ValueCountFrequency (%)
    ä11
    91.7%
    ö1
     
    8.3%

    Vapaa sana
    Categorical

    HIGH CORRELATION
    HIGH CORRELATION
    MISSING
    UNIFORM

    Distinct37
    Distinct (%)97.4%
    Missing462
    Missing (%)92.4%
    Memory size4.0 KiB
    palkan lisänä lounas- ja virkistysetu
     
    2
    it-ala 10+v koodaus 6v
     
    1
    Opiskelija
     
    1
    Teen 80% työaikaa jotta ehtisin harrastaa kaikenlaista työnteon lisäksi
     
    1
    Halpaa freelancer laskutusta oman tuotekehityksen sivussa
     
    1
    Other values (32)
    32 

    Length

    Max length286
    Median length104.5
    Mean length95.57894737
    Min length7

    Characters and Unicode

    Total characters3632
    Distinct characters56
    Distinct categories9 ?
    Distinct scripts2 ?
    Distinct blocks2 ?
    The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

    Unique

    Unique36 ?
    Unique (%)94.7%

    Sample

    1st rowKuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.
    2nd rowTyöskentelen toimistolla, koska täällä ei ole ketään muita. Työnantajan puolesta voisin työskennellä myös kotoa.
    3rd rowpalkan lisäksi kompensaatioon kuuluu varsin runsas ja suomen it-alalla uniikki etupaketti. pelkkä palkka ei välttämättä ole kilpailukykyinen, mutta koko kompensaatio yleisesti työstäni on ehdottomasti kilpailukykyinen.
    4th rowRahapalkan päälle tulee vielä kohtuullinen optiopotti, mutta se toki on lähinnä arpalippu
    5th rowOsittain laskutukseen perustuva palkka joten vaihtelee.

    Common Values

    ValueCountFrequency (%)
    palkan lisänä lounas- ja virkistysetu2
     
    0.4%
    it-ala 10+v koodaus 6v1
     
    0.2%
    Opiskelija1
     
    0.2%
    Teen 80% työaikaa jotta ehtisin harrastaa kaikenlaista työnteon lisäksi1
     
    0.2%
    Halpaa freelancer laskutusta oman tuotekehityksen sivussa1
     
    0.2%
    Palkka riippuu osittain firman tuloksesta, joten vaikea sanoa tarkkaan.1
     
    0.2%
    Vaikea vastata henkilönä joka tekee yrityksen kautta yhdelle ulkomaalaiselle yritykselle töitä (jolla ei ole entiteettiä suomessa). Vastasin nyt ikään kuin olisin yrittäjä vaikka käytännössä tämä on sama kuin olisin palkkaduunissa.1
     
    0.2%
    Pakettiin kuuluu reilu määrä optioita ja palkka nousee (ja laskee) firman liikevaihdon myötä.1
     
    0.2%
    Vaikka merkitsin, että palkkani ei ole mielestäni kilpailukykyinen, se ei tarkoita ettenkö olisi siihen tyytyväinen. Tilanne yrittäjillä ei yleensä vastaa samaa kuin palkansaajilla, joten palkka ei ole yrittäjille monestikaan niin mustavalkoinen asia vaan kysymys on isommasta kuviosta.1
     
    0.2%
    Kuukausipalkkaan tulossa ihan juuri firman laajuinen pieni (muistaakseni 50 e) yleiskorotus + palkka nousee ainakin 2800 e/kk, kunhan valmistuisi.1
     
    0.2%
    Other values (27)27
     
    5.4%
    (Missing)462
    92.4%

    Length

    2022-08-31T12:32:39.583255image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram of lengths of the category
    ValueCountFrequency (%)
    ei11
     
    2.4%
    ja11
     
    2.4%
    palkka10
     
    2.2%
    on10
     
    2.2%
    mutta9
     
    2.0%
    ole6
     
    1.3%
    nyt5
     
    1.1%
    firman4
     
    0.9%
    ihan4
     
    0.9%
    olen4
     
    0.9%
    Other values (321)383
    83.8%

    Most occurring characters

    ValueCountFrequency (%)
    422
    11.6%
    a383
     
    10.5%
    i311
     
    8.6%
    t284
     
    7.8%
    n245
     
    6.7%
    s237
     
    6.5%
    e228
     
    6.3%
    k206
     
    5.7%
    l183
     
    5.0%
    o169
     
    4.7%
    Other values (46)964
    26.5%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter3025
    83.3%
    Space Separator422
     
    11.6%
    Other Punctuation85
     
    2.3%
    Uppercase Letter53
     
    1.5%
    Decimal Number28
     
    0.8%
    Dash Punctuation8
     
    0.2%
    Open Punctuation4
     
    0.1%
    Close Punctuation4
     
    0.1%
    Math Symbol3
     
    0.1%

    Most frequent character per category

    ValueCountFrequency (%)
    a383
    12.7%
    i311
    10.3%
    t284
    9.4%
    n245
     
    8.1%
    s237
     
    7.8%
    e228
     
    7.5%
    k206
     
    6.8%
    l183
     
    6.0%
    o169
     
    5.6%
    u140
     
    4.6%
    Other values (14)639
    21.1%
    ValueCountFrequency (%)
    P9
    17.0%
    T7
    13.2%
    O7
    13.2%
    E6
    11.3%
    V6
    11.3%
    K5
    9.4%
    S4
    7.5%
    I2
     
    3.8%
    J2
     
    3.8%
    H2
     
    3.8%
    Other values (3)3
     
    5.7%
    ValueCountFrequency (%)
    015
    53.6%
    13
     
    10.7%
    52
     
    7.1%
    22
     
    7.1%
    82
     
    7.1%
    62
     
    7.1%
    31
     
    3.6%
    71
     
    3.6%
    ValueCountFrequency (%)
    .44
    51.8%
    ,28
    32.9%
    /5
     
    5.9%
    %4
     
    4.7%
    "2
     
    2.4%
    ?2
     
    2.4%
    ValueCountFrequency (%)
    422
    100.0%
    ValueCountFrequency (%)
    (4
    100.0%
    ValueCountFrequency (%)
    )4
    100.0%
    ValueCountFrequency (%)
    +3
    100.0%
    ValueCountFrequency (%)
    -8
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin3078
    84.7%
    Common554
     
    15.3%

    Most frequent character per script

    ValueCountFrequency (%)
    a383
    12.4%
    i311
    10.1%
    t284
    9.2%
    n245
     
    8.0%
    s237
     
    7.7%
    e228
     
    7.4%
    k206
     
    6.7%
    l183
     
    5.9%
    o169
     
    5.5%
    u140
     
    4.5%
    Other values (27)692
    22.5%
    ValueCountFrequency (%)
    422
    76.2%
    .44
     
    7.9%
    ,28
     
    5.1%
    015
     
    2.7%
    -8
     
    1.4%
    /5
     
    0.9%
    (4
     
    0.7%
    )4
     
    0.7%
    %4
     
    0.7%
    +3
     
    0.5%
    Other values (9)17
     
    3.1%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII3479
    95.8%
    None153
     
    4.2%

    Most frequent character per block

    ValueCountFrequency (%)
    422
    12.1%
    a383
    11.0%
    i311
     
    8.9%
    t284
     
    8.2%
    n245
     
    7.0%
    s237
     
    6.8%
    e228
     
    6.6%
    k206
     
    5.9%
    l183
     
    5.3%
    o169
     
    4.9%
    Other values (44)811
    23.3%
    ValueCountFrequency (%)
    ä126
    82.4%
    ö27
     
    17.6%

    Kk-tulot
    Real number (ℝ≥0)

    HIGH CORRELATION
    MISSING

    Distinct185
    Distinct (%)38.0%
    Missing13
    Missing (%)2.6%
    Infinite0
    Infinite (%)0.0%
    Mean5466.12192
    Minimum0
    Maximum25000
    Zeros2
    Zeros (%)0.4%
    Memory size4.0 KiB
    2021-05-25T12:52:46.849845image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram of lengths of the category
    ValueCountFrequency (%)
    ja11
     
    2.4%
    ei11
     
    2.4%
    palkka10
     
    2.2%
    on10
     
    2.2%
    mutta9
     
    2.0%
    ole6
     
    1.3%
    nyt5
     
    1.1%
    palkan4
     
    0.9%
    ihan4
     
    0.9%
    joten4
     
    0.9%
    Other values (321)383
    83.8%

    Most occurring characters

    ValueCountFrequency (%)
    422
    11.6%
    a383
     
    10.5%
    i311
     
    8.6%
    t284
     
    7.8%
    n245
     
    6.7%
    s237
     
    6.5%
    e228
     
    6.3%
    k206
     
    5.7%
    l183
     
    5.0%
    o169
     
    4.7%
    Other values (46)964
    26.5%

    Most occurring categories

    ValueCountFrequency (%)
    Lowercase Letter3025
    83.3%
    Space Separator422
     
    11.6%
    Other Punctuation85
     
    2.3%
    Uppercase Letter53
     
    1.5%
    Decimal Number28
     
    0.8%
    Dash Punctuation8
     
    0.2%
    Close Punctuation4
     
    0.1%
    Open Punctuation4
     
    0.1%
    Math Symbol3
     
    0.1%

    Most frequent character per category

    Lowercase Letter
    ValueCountFrequency (%)
    a383
    12.7%
    i311
    10.3%
    t284
    9.4%
    n245
     
    8.1%
    s237
     
    7.8%
    e228
     
    7.5%
    k206
     
    6.8%
    l183
     
    6.0%
    o169
     
    5.6%
    u140
     
    4.6%
    Other values (14)639
    21.1%
    Uppercase Letter
    ValueCountFrequency (%)
    P9
    17.0%
    O7
    13.2%
    T7
    13.2%
    E6
    11.3%
    V6
    11.3%
    K5
    9.4%
    S4
    7.5%
    H2
     
    3.8%
    J2
     
    3.8%
    I2
     
    3.8%
    Other values (3)3
     
    5.7%
    Decimal Number
    ValueCountFrequency (%)
    015
    53.6%
    13
     
    10.7%
    52
     
    7.1%
    22
     
    7.1%
    82
     
    7.1%
    62
     
    7.1%
    31
     
    3.6%
    71
     
    3.6%
    Other Punctuation
    ValueCountFrequency (%)
    .44
    51.8%
    ,28
    32.9%
    /5
     
    5.9%
    %4
     
    4.7%
    "2
     
    2.4%
    ?2
     
    2.4%
    Space Separator
    ValueCountFrequency (%)
    422
    100.0%
    Dash Punctuation
    ValueCountFrequency (%)
    -8
    100.0%
    Close Punctuation
    ValueCountFrequency (%)
    )4
    100.0%
    Open Punctuation
    ValueCountFrequency (%)
    (4
    100.0%
    Math Symbol
    ValueCountFrequency (%)
    +3
    100.0%

    Most occurring scripts

    ValueCountFrequency (%)
    Latin3078
    84.7%
    Common554
     
    15.3%

    Most frequent character per script

    Latin
    ValueCountFrequency (%)
    a383
    12.4%
    i311
    10.1%
    t284
    9.2%
    n245
     
    8.0%
    s237
     
    7.7%
    e228
     
    7.4%
    k206
     
    6.7%
    l183
     
    5.9%
    o169
     
    5.5%
    u140
     
    4.5%
    Other values (27)692
    22.5%
    Common
    ValueCountFrequency (%)
    422
    76.2%
    .44
     
    7.9%
    ,28
     
    5.1%
    015
     
    2.7%
    -8
     
    1.4%
    /5
     
    0.9%
    )4
     
    0.7%
    (4
     
    0.7%
    %4
     
    0.7%
    13
     
    0.5%
    Other values (9)17
     
    3.1%

    Most occurring blocks

    ValueCountFrequency (%)
    ASCII3479
    95.8%
    None153
     
    4.2%

    Most frequent character per block

    ASCII
    ValueCountFrequency (%)
    422
    12.1%
    a383
    11.0%
    i311
     
    8.9%
    t284
     
    8.2%
    n245
     
    7.0%
    s237
     
    6.8%
    e228
     
    6.6%
    k206
     
    5.9%
    l183
     
    5.3%
    o169
     
    4.9%
    Other values (44)811
    23.3%
    None
    ValueCountFrequency (%)
    ä126
    82.4%
    ö27
     
    17.6%

    Kk-tulot
    Real number (ℝ≥0)

    HIGH CORRELATION
    HIGH CORRELATION
    HIGH CORRELATION
    HIGH CORRELATION
    MISSING

    Distinct185
    Distinct (%)38.0%
    Missing13
    Missing (%)2.6%
    Infinite0
    Infinite (%)0.0%
    Mean5466.12192
    Minimum0
    Maximum25000
    Zeros2
    Zeros (%)0.4%
    Negative0
    Negative (%)0.0%
    Memory size4.0 KiB
    2022-08-31T12:32:39.867480image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Quantile statistics

    Minimum0
    5-th percentile2835
    Q14130.208333
    median4895.833333
    Q36250
    95-th percentile10291.66667
    Maximum25000
    Range25000
    Interquartile range (IQR)2119.791667

    Descriptive statistics

    Standard deviation2651.482882
    Coefficient of variation (CV)0.4850756937
    Kurtosis11.75121598
    Mean5466.12192
    Median Absolute Deviation (MAD)979.1666667
    Skewness2.645875828
    Sum2662001.375
    Variance7030361.474
    MonotocityNot monotonic
    2021-05-25T12:52:47.193354image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Quantile statistics

    Minimum0
    5-th percentile2835
    Q14130.208333
    median4895.833333
    Q36250
    95-th percentile10291.66667
    Maximum25000
    Range25000
    Interquartile range (IQR)2119.791667

    Descriptive statistics

    Standard deviation2651.482882
    Coefficient of variation (CV)0.4850756937
    Kurtosis11.75121598
    Mean5466.12192
    Median Absolute Deviation (MAD)979.1666667
    Skewness2.645875828
    Sum2662001.375
    Variance7030361.474
    MonotonicityNot monotonic
    2022-08-31T12:32:40.001363image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Histogram with fixed size bins (bins=50)
    ValueCountFrequency (%)
    4583.33333318
     
    3.6%
    4166.66666718
     
    3.6%
    625017
     
    3.4%
    500014
     
    2.8%
    7083.33333311
     
    2.2%
    5833.33333311
     
    2.2%
    450010
     
    2.0%
    5416.66666710
     
    2.0%
    5208.33333310
     
    2.0%
    312510
     
    2.0%
    Other values (175)358
    71.6%
    (Missing)13
     
    2.6%
    ValueCountFrequency (%)
    02
    0.4%
    333.33333331
    0.2%
    508.33333331
    0.2%
    6251
    0.2%
    1145.8333331
    0.2%
    ValueCountFrequency (%)
    250001
     
    0.2%
    20833.333331
     
    0.2%
    18333.333331
     
    0.2%
    16666.666674
    0.8%
    15833.333331
     
    0.2%

    Interactions

    2021-05-25T12:52:36.323294image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Histogram with fixed size bins (bins=50)
    ValueCountFrequency (%)
    4583.33333318
     
    3.6%
    4166.66666718
     
    3.6%
    625017
     
    3.4%
    500014
     
    2.8%
    5833.33333311
     
    2.2%
    7083.33333311
     
    2.2%
    5416.66666710
     
    2.0%
    5208.33333310
     
    2.0%
    450010
     
    2.0%
    312510
     
    2.0%
    Other values (175)358
    71.6%
    (Missing)13
     
    2.6%
    ValueCountFrequency (%)
    02
    0.4%
    333.33333331
    0.2%
    508.33333331
    0.2%
    6251
    0.2%
    1145.8333331
    0.2%
    1166.6666671
    0.2%
    1666.6666671
    0.2%
    1833.3333331
    0.2%
    18751
    0.2%
    2083.3333331
    0.2%
    ValueCountFrequency (%)
    250001
     
    0.2%
    20833.333331
     
    0.2%
    18333.333331
     
    0.2%
    16666.666674
    0.8%
    15833.333331
     
    0.2%
    150001
     
    0.2%
    137501
     
    0.2%
    13108.333331
     
    0.2%
    12916.666671
     
    0.2%
    125001
     
    0.2%

    Interactions

    2022-08-31T12:32:35.573330image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:36.492312image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:33.336676image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:36.666072image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:33.849794image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:36.837728image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:34.365038image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:36.996997image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:34.890227image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:37.155409image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:35.669894image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:37.336403image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:33.444076image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:37.512460image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:33.951137image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:37.678105image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:34.467841image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:37.842137image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:34.994110image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:38.019730image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:35.770665image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:38.300015image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:33.546001image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:38.468139image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:34.054250image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:38.632623image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:34.573613image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:38.810129image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:35.098927image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:38.991087image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:35.871263image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:39.172007image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:33.649307image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:39.328736image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:34.159608image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:39.488095image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:34.681523image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    2021-05-25T12:52:39.649791image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    2022-08-31T12:32:35.368709image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Correlations

    2021-05-25T12:52:47.364259image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
  • Pearson's r
  • Kendall's τ
  • Cramér's V (φc)
  • Phik (φk)
  • 2022-08-31T12:32:40.108786image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Pearson's r

    The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

    To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
    2021-05-25T12:52:47.571393image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Spearman's ρ

    The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

    To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
    2022-08-31T12:32:40.260626image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Spearman's ρ

    The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

    To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
    2021-05-25T12:52:47.777298image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Pearson's r

    The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

    To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
    2022-08-31T12:32:40.388351image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Kendall's τ

    Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

    To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
    2021-05-25T12:52:47.987731image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Kendall's τ

    Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

    To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
    2022-08-31T12:32:40.521243image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Cramér's V (φc)

    Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

    Missing values

    2021-05-25T12:52:39.954016image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

    Cramér's V (φc)

    Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
    2022-08-31T12:32:40.667661image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

    Phik (φk)

    Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

    Missing values

    2022-08-31T12:32:36.143373image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    A simple visualization of nullity by column.
    2021-05-25T12:52:40.398612image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    A simple visualization of nullity by column.
    2022-08-31T12:32:36.392644image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
    2021-05-25T12:52:40.788621image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
    2022-08-31T12:32:36.620142image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
    2021-05-25T12:52:41.176483image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
    The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
    2022-08-31T12:32:36.817084image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
    The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

    Sample

    First rows

    TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sanaKk-tulot
    02021-02-15 11:57:08.316PK-Seutu33NaN10.0Työntekijä / palkollinen1.0Arkkitehti50/506500.083000.0TrueNaNNaN6916.666667
    12021-02-15 11:57:19.676Turku33mies14.0Työntekijä / palkollinen1.0full-stackEtä5000.062500.0TrueNaNNaN5208.333333
    22021-02-15 11:58:03.592PK-Seutu28mies2.0Työntekijä / palkollinen1.0Full-stack ohjelmistokehittäjäEtä2475.030000.0FalseNaNNaN2500.000000
    32021-02-15 11:58:15.261Tampere33mies22.0Yrittäjä1.0web-arkkitehtiEtä4300.0100000.0TrueNaNNaN8333.333333
    42021-02-15 11:58:16.983PK-Seutu28mies2.0Työntekijä / palkollinen1.0OhjelmistokehittäjäEtä3000.037500.0FalseNaNNaN3125.000000
    52021-02-15 11:58:49.454PK-Seutu43mies23.0Työntekijä / palkollinen1.0OhjelmistokehittäjäToimisto8000.0100000.0TrueNaNNaN8333.333333
    62021-02-15 12:00:03.771PK-Seutu33mies10.0Freelancer1.0OhjelmistokehittäjäEtä6000.0140000.0TrueNaNNaN11666.666667
    72021-02-15 12:00:04.655Tampere33NaN10.0Työntekijä / palkollinen1.0OhjelmistokehittäjäToimisto4250.054000.0TrueNaNNaN4500.000000
    82021-02-15 12:01:00.769Tampere33mies6.0Työntekijä / palkollinen1.0Lead developerToimisto4000.050000.0FalseNaNNaN4166.666667
    92021-02-15 12:02:03.577Tallinna33mies12.0Freelancer1.0NaNEtäNaN200000.0TrueQuestradeNaN16666.666667

    Last rows

    TimestampKaupunkiIkäSukupuoliTyökokemusTyösuhteen luonneTyöaikaRooliEtäKuukausipalkkaVuositulotKilpailukykyinenTyöpaikkaVapaa sanaKk-tulot
    4902021-02-25 21:17:36.323PK-Seutu33mies10.0Työntekijä / palkollinen1.0Full-stack ohjemistokehittäjäToimisto4600.058000.0TrueNaNNaN4833.333333
    4912021-02-26 09:32:59.778Oulu48mies21.0Työntekijä / palkollinen1.0Backend-koodariEtä5000.070000.0TrueNokiaNaN5833.333333
    4922021-02-26 12:16:19.696Tampere38mies15.0Työntekijä / palkollinen1.0OhjelmistosuunnittelijaToimisto4300.053750.0FalseGoforeNaN4479.166667
    4932021-02-26 12:21:52.296Tampere33mies11.0Freelancer1.0frontendEtäNaN157300.0TrueNaNNaN13108.333333
    4942021-02-26 12:46:37.404PK-Seutu33mies11.0Työntekijä / palkollinen1.0ArkkitehtiToimisto6500.081250.0TrueSiiliNaN6770.833333
    4952021-02-26 12:47:26.116PK-Seutu33nainen3.0Työntekijä / palkollinen1.0Full-stack50/503800.0NaNFalseNaNNaNNaN
    4962021-02-26 13:24:35.647PK-Seutu33miesNaNTyöntekijä / palkollinen1.0Ohjelmistokehittäjä50/50NaN75000.0TrueVincitNaN6250.000000
    4972021-02-26 16:28:30.010Tampere43mies20.0Työntekijä / palkollinen1.0full-stackToimisto4800.061000.0TrueNaNNaN5083.333333
    4982021-02-27 12:38:00.760Tampere33mies9.0Työntekijä / palkollinen1.0backend ja devopsEtä4270.054000.0FalseNaNNaN4500.000000
    4992021-02-27 17:49:24.789Kouvola33mies2.0Työntekijä / palkollinen1.0Full-stack OhjelmistosuunnittelijaEtä2800.035000.0FalseNaNNaN2916.666667