Overview

Dataset statistics

Number of variables15
Number of observations48842
Missing cells0
Missing cells (%)0.0%
Duplicate rows49
Duplicate rows (%)0.1%
Total size in memory5.6 MiB
Average record size in memory120.0 B

Variable types

Numeric6
Categorical9

Alerts

Dataset has 49 (0.1%) duplicate rowsDuplicates
educational-num is highly overall correlated with educationHigh correlation
education is highly overall correlated with educational-numHigh correlation
relationship is highly overall correlated with genderHigh correlation
gender is highly overall correlated with relationshipHigh correlation
race is highly imbalanced (65.8%)Imbalance
native-country is highly imbalanced (82.7%)Imbalance
capital-gain has 44807 (91.7%) zerosZeros
capital-loss has 46560 (95.3%) zerosZeros

Reproduction

Analysis started2024-06-27 05:58:30.334017
Analysis finished2024-06-27 05:58:47.506944
Duration17.17 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

age
Real number (ℝ)

Distinct74
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38.643585
Minimum17
Maximum90
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size381.7 KiB
2024-06-27T11:28:47.749045image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum17
5-th percentile19
Q128
median37
Q348
95-th percentile63
Maximum90
Range73
Interquartile range (IQR)20

Descriptive statistics

Standard deviation13.71051
Coefficient of variation (CV)0.35479394
Kurtosis-0.18426874
Mean38.643585
Median Absolute Deviation (MAD)10
Skewness0.55758032
Sum1887430
Variance187.97808
MonotonicityNot monotonic
2024-06-27T11:28:48.003898image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
36 1348
 
2.8%
35 1337
 
2.7%
33 1335
 
2.7%
23 1329
 
2.7%
31 1325
 
2.7%
34 1303
 
2.7%
37 1280
 
2.6%
28 1280
 
2.6%
30 1278
 
2.6%
38 1264
 
2.6%
Other values (64) 35763
73.2%
ValueCountFrequency (%)
17 595
1.2%
18 862
1.8%
19 1053
2.2%
20 1113
2.3%
21 1096
2.2%
22 1178
2.4%
23 1329
2.7%
24 1206
2.5%
25 1195
2.4%
26 1153
2.4%
ValueCountFrequency (%)
90 55
0.1%
89 2
 
< 0.1%
88 6
 
< 0.1%
87 3
 
< 0.1%
86 1
 
< 0.1%
85 5
 
< 0.1%
84 13
 
< 0.1%
83 11
 
< 0.1%
82 15
 
< 0.1%
81 37
0.1%

workclass
Categorical

Distinct9
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size381.7 KiB
Private
33906 
Self-emp-not-inc
3862 
Local-gov
 
3136
?
 
2799
State-gov
 
1981
Other values (4)
 
3158

Length

Max length16
Median length7
Mean length7.8708693
Min length1

Characters and Unicode

Total characters384429
Distinct characters28
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPrivate
2nd rowPrivate
3rd rowLocal-gov
4th rowPrivate
5th row?

Common Values

ValueCountFrequency (%)
Private 33906
69.4%
Self-emp-not-inc 3862
 
7.9%
Local-gov 3136
 
6.4%
? 2799
 
5.7%
State-gov 1981
 
4.1%
Self-emp-inc 1695
 
3.5%
Federal-gov 1432
 
2.9%
Without-pay 21
 
< 0.1%
Never-worked 10
 
< 0.1%

Length

2024-06-27T11:28:48.284483image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-27T11:28:48.548444image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
private 33906
69.4%
self-emp-not-inc 3862
 
7.9%
local-gov 3136
 
6.4%
2799
 
5.7%
state-gov 1981
 
4.1%
self-emp-inc 1695
 
3.5%
federal-gov 1432
 
2.9%
without-pay 21
 
< 0.1%
never-worked 10
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 49895
13.0%
t 41772
10.9%
a 40476
10.5%
v 40465
10.5%
i 39484
10.3%
r 35358
9.2%
P 33906
8.8%
- 21556
 
5.6%
o 13578
 
3.5%
l 10125
 
2.6%
Other values (18) 57814
15.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 314031
81.7%
Uppercase Letter 46043
 
12.0%
Dash Punctuation 21556
 
5.6%
Other Punctuation 2799
 
0.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 49895
15.9%
t 41772
13.3%
a 40476
12.9%
v 40465
12.9%
i 39484
12.6%
r 35358
11.3%
o 13578
 
4.3%
l 10125
 
3.2%
n 9419
 
3.0%
c 8693
 
2.8%
Other values (10) 24766
7.9%
Uppercase Letter
ValueCountFrequency (%)
P 33906
73.6%
S 7538
 
16.4%
L 3136
 
6.8%
F 1432
 
3.1%
W 21
 
< 0.1%
N 10
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 21556
100.0%
Other Punctuation
ValueCountFrequency (%)
? 2799
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 360074
93.7%
Common 24355
 
6.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 49895
13.9%
t 41772
11.6%
a 40476
11.2%
v 40465
11.2%
i 39484
11.0%
r 35358
9.8%
P 33906
9.4%
o 13578
 
3.8%
l 10125
 
2.8%
n 9419
 
2.6%
Other values (16) 45596
12.7%
Common
ValueCountFrequency (%)
- 21556
88.5%
? 2799
 
11.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 384429
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 49895
13.0%
t 41772
10.9%
a 40476
10.5%
v 40465
10.5%
i 39484
10.3%
r 35358
9.2%
P 33906
8.8%
- 21556
 
5.6%
o 13578
 
3.5%
l 10125
 
2.6%
Other values (18) 57814
15.0%

fnlwgt
Real number (ℝ)

Distinct28523
Distinct (%)58.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean189664.13
Minimum12285
Maximum1490400
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size381.7 KiB
2024-06-27T11:28:48.811276image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum12285
5-th percentile39615.4
Q1117550.5
median178144.5
Q3237642
95-th percentile379481.65
Maximum1490400
Range1478115
Interquartile range (IQR)120091.5

Descriptive statistics

Standard deviation105604.03
Coefficient of variation (CV)0.55679491
Kurtosis6.0578482
Mean189664.13
Median Absolute Deviation (MAD)60295.5
Skewness1.4388919
Sum9.2635757 × 109
Variance1.115221 × 1010
MonotonicityNot monotonic
2024-06-27T11:28:49.078120image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
203488 21
 
< 0.1%
190290 19
 
< 0.1%
120277 19
 
< 0.1%
125892 18
 
< 0.1%
126569 18
 
< 0.1%
99185 17
 
< 0.1%
126675 17
 
< 0.1%
113364 17
 
< 0.1%
186934 16
 
< 0.1%
111567 16
 
< 0.1%
Other values (28513) 48664
99.6%
ValueCountFrequency (%)
12285 1
 
< 0.1%
13492 1
 
< 0.1%
13769 3
< 0.1%
13862 1
 
< 0.1%
14878 1
 
< 0.1%
18827 1
 
< 0.1%
19214 1
 
< 0.1%
19302 6
< 0.1%
19395 2
 
< 0.1%
19410 2
 
< 0.1%
ValueCountFrequency (%)
1490400 1
< 0.1%
1484705 1
< 0.1%
1455435 1
< 0.1%
1366120 1
< 0.1%
1268339 1
< 0.1%
1226583 1
< 0.1%
1210504 1
< 0.1%
1184622 1
< 0.1%
1161363 1
< 0.1%
1125613 1
< 0.1%

education
Categorical

HIGH CORRELATION 

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size381.7 KiB
HS-grad
15784 
Some-college
10878 
Bachelors
8025 
Masters
2657 
Assoc-voc
2061 
Other values (11)
9437 

Length

Max length12
Median length11
Mean length8.4220753
Min length3

Characters and Unicode

Total characters411351
Distinct characters31
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row11th
2nd rowHS-grad
3rd rowAssoc-acdm
4th rowSome-college
5th rowSome-college

Common Values

ValueCountFrequency (%)
HS-grad 15784
32.3%
Some-college 10878
22.3%
Bachelors 8025
16.4%
Masters 2657
 
5.4%
Assoc-voc 2061
 
4.2%
11th 1812
 
3.7%
Assoc-acdm 1601
 
3.3%
10th 1389
 
2.8%
7th-8th 955
 
2.0%
Prof-school 834
 
1.7%
Other values (6) 2846
 
5.8%

Length

2024-06-27T11:28:49.477894image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
hs-grad 15784
32.3%
some-college 10878
22.3%
bachelors 8025
16.4%
masters 2657
 
5.4%
assoc-voc 2061
 
4.2%
11th 1812
 
3.7%
assoc-acdm 1601
 
3.3%
10th 1389
 
2.8%
7th-8th 955
 
2.0%
prof-school 834
 
1.7%
Other values (6) 2846
 
5.8%

Most occurring characters

ValueCountFrequency (%)
e 43993
10.7%
o 39360
 
9.6%
- 32869
 
8.0%
l 30698
 
7.5%
a 28661
 
7.0%
r 27977
 
6.8%
c 27738
 
6.7%
g 26662
 
6.5%
S 26662
 
6.5%
s 21827
 
5.3%
Other values (21) 104904
25.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 308287
74.9%
Uppercase Letter 58301
 
14.2%
Dash Punctuation 32869
 
8.0%
Decimal Number 11894
 
2.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 43993
14.3%
o 39360
12.8%
l 30698
10.0%
a 28661
9.3%
r 27977
9.1%
c 27738
9.0%
g 26662
8.6%
s 21827
7.1%
d 17385
 
5.6%
h 16731
 
5.4%
Other values (4) 27255
8.8%
Decimal Number
ValueCountFrequency (%)
1 5917
49.7%
0 1389
 
11.7%
7 955
 
8.0%
8 955
 
8.0%
9 756
 
6.4%
2 657
 
5.5%
5 509
 
4.3%
6 509
 
4.3%
4 247
 
2.1%
Uppercase Letter
ValueCountFrequency (%)
S 26662
45.7%
H 15784
27.1%
B 8025
 
13.8%
A 3662
 
6.3%
M 2657
 
4.6%
P 917
 
1.6%
D 594
 
1.0%
Dash Punctuation
ValueCountFrequency (%)
- 32869
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 366588
89.1%
Common 44763
 
10.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 43993
12.0%
o 39360
10.7%
l 30698
8.4%
a 28661
 
7.8%
r 27977
 
7.6%
c 27738
 
7.6%
g 26662
 
7.3%
S 26662
 
7.3%
s 21827
 
6.0%
d 17385
 
4.7%
Other values (11) 75625
20.6%
Common
ValueCountFrequency (%)
- 32869
73.4%
1 5917
 
13.2%
0 1389
 
3.1%
7 955
 
2.1%
8 955
 
2.1%
9 756
 
1.7%
2 657
 
1.5%
5 509
 
1.1%
6 509
 
1.1%
4 247
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 411351
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 43993
10.7%
o 39360
 
9.6%
- 32869
 
8.0%
l 30698
 
7.5%
a 28661
 
7.0%
r 27977
 
6.8%
c 27738
 
6.7%
g 26662
 
6.5%
S 26662
 
6.5%
s 21827
 
5.3%
Other values (21) 104904
25.5%

educational-num
Real number (ℝ)

HIGH CORRELATION 

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.078089
Minimum1
Maximum16
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size381.7 KiB
2024-06-27T11:28:49.665786image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q19
median10
Q312
95-th percentile14
Maximum16
Range15
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.5709728
Coefficient of variation (CV)0.2551052
Kurtosis0.62574527
Mean10.078089
Median Absolute Deviation (MAD)1
Skewness-0.31652486
Sum492234
Variance6.6099009
MonotonicityNot monotonic
2024-06-27T11:28:49.851979image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
9 15784
32.3%
10 10878
22.3%
13 8025
16.4%
14 2657
 
5.4%
11 2061
 
4.2%
7 1812
 
3.7%
12 1601
 
3.3%
6 1389
 
2.8%
4 955
 
2.0%
15 834
 
1.7%
Other values (6) 2846
 
5.8%
ValueCountFrequency (%)
1 83
 
0.2%
2 247
 
0.5%
3 509
 
1.0%
4 955
 
2.0%
5 756
 
1.5%
6 1389
 
2.8%
7 1812
 
3.7%
8 657
 
1.3%
9 15784
32.3%
10 10878
22.3%
ValueCountFrequency (%)
16 594
 
1.2%
15 834
 
1.7%
14 2657
 
5.4%
13 8025
16.4%
12 1601
 
3.3%
11 2061
 
4.2%
10 10878
22.3%
9 15784
32.3%
8 657
 
1.3%
7 1812
 
3.7%

marital-status
Categorical

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size381.7 KiB
Married-civ-spouse
22379 
Never-married
16117 
Divorced
6633 
Separated
 
1530
Widowed
 
1518
Other values (2)
 
665

Length

Max length21
Median length18
Mean length14.406044
Min length7

Characters and Unicode

Total characters703620
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNever-married
2nd rowMarried-civ-spouse
3rd rowMarried-civ-spouse
4th rowMarried-civ-spouse
5th rowNever-married

Common Values

ValueCountFrequency (%)
Married-civ-spouse 22379
45.8%
Never-married 16117
33.0%
Divorced 6633
 
13.6%
Separated 1530
 
3.1%
Widowed 1518
 
3.1%
Married-spouse-absent 628
 
1.3%
Married-AF-spouse 37
 
0.1%

Length

2024-06-27T11:28:50.078698image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-27T11:28:50.323554image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
married-civ-spouse 22379
45.8%
never-married 16117
33.0%
divorced 6633
 
13.6%
separated 1530
 
3.1%
widowed 1518
 
3.1%
married-spouse-absent 628
 
1.3%
married-af-spouse 37
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 106278
15.1%
r 102602
14.6%
i 69691
9.9%
- 62205
8.8%
d 50360
7.2%
s 46716
 
6.6%
v 45129
 
6.4%
a 42849
 
6.1%
o 31195
 
4.4%
c 29012
 
4.1%
Other values (14) 117583
16.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 592499
84.2%
Dash Punctuation 62205
 
8.8%
Uppercase Letter 48916
 
7.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 106278
17.9%
r 102602
17.3%
i 69691
11.8%
d 50360
8.5%
s 46716
7.9%
v 45129
7.6%
a 42849
7.2%
o 31195
 
5.3%
c 29012
 
4.9%
p 24574
 
4.1%
Other values (6) 44093
7.4%
Uppercase Letter
ValueCountFrequency (%)
M 23044
47.1%
N 16117
32.9%
D 6633
 
13.6%
S 1530
 
3.1%
W 1518
 
3.1%
A 37
 
0.1%
F 37
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 62205
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 641415
91.2%
Common 62205
 
8.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 106278
16.6%
r 102602
16.0%
i 69691
10.9%
d 50360
7.9%
s 46716
7.3%
v 45129
7.0%
a 42849
6.7%
o 31195
 
4.9%
c 29012
 
4.5%
p 24574
 
3.8%
Other values (13) 93009
14.5%
Common
ValueCountFrequency (%)
- 62205
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 703620
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 106278
15.1%
r 102602
14.6%
i 69691
9.9%
- 62205
8.8%
d 50360
7.2%
s 46716
 
6.6%
v 45129
 
6.4%
a 42849
 
6.1%
o 31195
 
4.4%
c 29012
 
4.1%
Other values (14) 117583
16.7%

occupation
Categorical

Distinct15
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size381.7 KiB
Prof-specialty
6172 
Craft-repair
6112 
Exec-managerial
6086 
Adm-clerical
5611 
Sales
5504 
Other values (10)
19357 

Length

Max length17
Median length15
Mean length12.186991
Min length1

Characters and Unicode

Total characters595237
Distinct characters32
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMachine-op-inspct
2nd rowFarming-fishing
3rd rowProtective-serv
4th rowMachine-op-inspct
5th row?

Common Values

ValueCountFrequency (%)
Prof-specialty 6172
12.6%
Craft-repair 6112
12.5%
Exec-managerial 6086
12.5%
Adm-clerical 5611
11.5%
Sales 5504
11.3%
Other-service 4923
10.1%
Machine-op-inspct 3022
6.2%
? 2809
5.8%
Transport-moving 2355
 
4.8%
Handlers-cleaners 2072
 
4.2%
Other values (5) 4176
8.6%

Length

2024-06-27T11:28:50.565434image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
prof-specialty 6172
12.6%
craft-repair 6112
12.5%
exec-managerial 6086
12.5%
adm-clerical 5611
11.5%
sales 5504
11.3%
other-service 4923
10.1%
machine-op-inspct 3022
6.2%
2809
5.8%
transport-moving 2355
 
4.8%
handlers-cleaners 2072
 
4.2%
Other values (5) 4176
8.6%

Most occurring characters

ValueCountFrequency (%)
e 64487
 
10.8%
r 60321
 
10.1%
a 58780
 
9.9%
- 43793
 
7.4%
i 42998
 
7.2%
c 38963
 
6.5%
l 33128
 
5.6%
s 30538
 
5.1%
t 25996
 
4.4%
n 23964
 
4.0%
Other values (22) 172269
28.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 502587
84.4%
Uppercase Letter 46048
 
7.7%
Dash Punctuation 43793
 
7.4%
Other Punctuation 2809
 
0.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 64487
12.8%
r 60321
12.0%
a 58780
11.7%
i 42998
8.6%
c 38963
 
7.8%
l 33128
 
6.6%
s 30538
 
6.1%
t 25996
 
5.2%
n 23964
 
4.8%
p 23575
 
4.7%
Other values (10) 99837
19.9%
Uppercase Letter
ValueCountFrequency (%)
P 7397
16.1%
C 6112
13.3%
E 6086
13.2%
A 5626
12.2%
S 5504
12.0%
O 4923
10.7%
T 3801
8.3%
M 3022
6.6%
H 2072
 
4.5%
F 1505
 
3.3%
Dash Punctuation
ValueCountFrequency (%)
- 43793
100.0%
Other Punctuation
ValueCountFrequency (%)
? 2809
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 548635
92.2%
Common 46602
 
7.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 64487
11.8%
r 60321
11.0%
a 58780
10.7%
i 42998
 
7.8%
c 38963
 
7.1%
l 33128
 
6.0%
s 30538
 
5.6%
t 25996
 
4.7%
n 23964
 
4.4%
p 23575
 
4.3%
Other values (20) 145885
26.6%
Common
ValueCountFrequency (%)
- 43793
94.0%
? 2809
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 595237
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 64487
 
10.8%
r 60321
 
10.1%
a 58780
 
9.9%
- 43793
 
7.4%
i 42998
 
7.2%
c 38963
 
6.5%
l 33128
 
5.6%
s 30538
 
5.1%
t 25996
 
4.4%
n 23964
 
4.0%
Other values (22) 172269
28.9%

relationship
Categorical

HIGH CORRELATION 

Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size381.7 KiB
Husband
19716 
Not-in-family
12583 
Own-child
7581 
Unmarried
5125 
Wife
2331 

Length

Max length14
Median length13
Mean length9.1387126
Min length4

Characters and Unicode

Total characters446353
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOwn-child
2nd rowHusband
3rd rowHusband
4th rowHusband
5th rowOwn-child

Common Values

ValueCountFrequency (%)
Husband 19716
40.4%
Not-in-family 12583
25.8%
Own-child 7581
 
15.5%
Unmarried 5125
 
10.5%
Wife 2331
 
4.8%
Other-relative 1506
 
3.1%

Length

2024-06-27T11:28:50.788286image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-27T11:28:50.996222image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
husband 19716
40.4%
not-in-family 12583
25.8%
own-child 7581
 
15.5%
unmarried 5125
 
10.5%
wife 2331
 
4.8%
other-relative 1506
 
3.1%

Most occurring characters

ValueCountFrequency (%)
n 45005
 
10.1%
i 41709
 
9.3%
a 38930
 
8.7%
- 34253
 
7.7%
d 32422
 
7.3%
l 21670
 
4.9%
H 19716
 
4.4%
u 19716
 
4.4%
s 19716
 
4.4%
b 19716
 
4.4%
Other values (15) 153500
34.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 363258
81.4%
Uppercase Letter 48842
 
10.9%
Dash Punctuation 34253
 
7.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 45005
12.4%
i 41709
11.5%
a 38930
10.7%
d 32422
 
8.9%
l 21670
 
6.0%
u 19716
 
5.4%
s 19716
 
5.4%
b 19716
 
5.4%
m 17708
 
4.9%
t 15595
 
4.3%
Other values (9) 91071
25.1%
Uppercase Letter
ValueCountFrequency (%)
H 19716
40.4%
N 12583
25.8%
O 9087
18.6%
U 5125
 
10.5%
W 2331
 
4.8%
Dash Punctuation
ValueCountFrequency (%)
- 34253
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 412100
92.3%
Common 34253
 
7.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 45005
 
10.9%
i 41709
 
10.1%
a 38930
 
9.4%
d 32422
 
7.9%
l 21670
 
5.3%
H 19716
 
4.8%
u 19716
 
4.8%
s 19716
 
4.8%
b 19716
 
4.8%
m 17708
 
4.3%
Other values (14) 135792
33.0%
Common
ValueCountFrequency (%)
- 34253
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446353
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 45005
 
10.1%
i 41709
 
9.3%
a 38930
 
8.7%
- 34253
 
7.7%
d 32422
 
7.3%
l 21670
 
4.9%
H 19716
 
4.4%
u 19716
 
4.4%
s 19716
 
4.4%
b 19716
 
4.4%
Other values (15) 153500
34.4%

race
Categorical

IMBALANCE 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size381.7 KiB
White
41762 
Black
4685 
Asian-Pac-Islander
 
1519
Amer-Indian-Eskimo
 
470
Other
 
406

Length

Max length18
Median length5
Mean length5.5294009
Min length5

Characters and Unicode

Total characters270067
Distinct characters22
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBlack
2nd rowWhite
3rd rowWhite
4th rowBlack
5th rowWhite

Common Values

ValueCountFrequency (%)
White 41762
85.5%
Black 4685
 
9.6%
Asian-Pac-Islander 1519
 
3.1%
Amer-Indian-Eskimo 470
 
1.0%
Other 406
 
0.8%

Length

2024-06-27T11:28:51.214096image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-27T11:28:51.424977image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
white 41762
85.5%
black 4685
 
9.6%
asian-pac-islander 1519
 
3.1%
amer-indian-eskimo 470
 
1.0%
other 406
 
0.8%

Most occurring characters

ValueCountFrequency (%)
i 44221
16.4%
e 44157
16.4%
t 42168
15.6%
h 42168
15.6%
W 41762
15.5%
a 9712
 
3.6%
l 6204
 
2.3%
c 6204
 
2.3%
k 5155
 
1.9%
B 4685
 
1.7%
Other values (12) 23631
8.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 213269
79.0%
Uppercase Letter 52820
 
19.6%
Dash Punctuation 3978
 
1.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 44221
20.7%
e 44157
20.7%
t 42168
19.8%
h 42168
19.8%
a 9712
 
4.6%
l 6204
 
2.9%
c 6204
 
2.9%
k 5155
 
2.4%
n 3978
 
1.9%
s 3508
 
1.6%
Other values (4) 5794
 
2.7%
Uppercase Letter
ValueCountFrequency (%)
W 41762
79.1%
B 4685
 
8.9%
A 1989
 
3.8%
I 1989
 
3.8%
P 1519
 
2.9%
E 470
 
0.9%
O 406
 
0.8%
Dash Punctuation
ValueCountFrequency (%)
- 3978
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 266089
98.5%
Common 3978
 
1.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 44221
16.6%
e 44157
16.6%
t 42168
15.8%
h 42168
15.8%
W 41762
15.7%
a 9712
 
3.6%
l 6204
 
2.3%
c 6204
 
2.3%
k 5155
 
1.9%
B 4685
 
1.8%
Other values (11) 19653
7.4%
Common
ValueCountFrequency (%)
- 3978
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 270067
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 44221
16.4%
e 44157
16.4%
t 42168
15.6%
h 42168
15.6%
W 41762
15.5%
a 9712
 
3.6%
l 6204
 
2.3%
c 6204
 
2.3%
k 5155
 
1.9%
B 4685
 
1.7%
Other values (12) 23631
8.8%

gender
Categorical

HIGH CORRELATION 

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size381.7 KiB
Male
32650 
Female
16192 

Length

Max length6
Median length4
Mean length4.6630359
Min length4

Characters and Unicode

Total characters227752
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowMale
4th rowMale
5th rowFemale

Common Values

ValueCountFrequency (%)
Male 32650
66.8%
Female 16192
33.2%

Length

2024-06-27T11:28:51.645058image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-27T11:28:51.863932image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
male 32650
66.8%
female 16192
33.2%

Most occurring characters

ValueCountFrequency (%)
e 65034
28.6%
a 48842
21.4%
l 48842
21.4%
M 32650
14.3%
F 16192
 
7.1%
m 16192
 
7.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 178910
78.6%
Uppercase Letter 48842
 
21.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 65034
36.4%
a 48842
27.3%
l 48842
27.3%
m 16192
 
9.1%
Uppercase Letter
ValueCountFrequency (%)
M 32650
66.8%
F 16192
33.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 227752
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 65034
28.6%
a 48842
21.4%
l 48842
21.4%
M 32650
14.3%
F 16192
 
7.1%
m 16192
 
7.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 227752
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 65034
28.6%
a 48842
21.4%
l 48842
21.4%
M 32650
14.3%
F 16192
 
7.1%
m 16192
 
7.1%

capital-gain
Real number (ℝ)

ZEROS 

Distinct123
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1079.0676
Minimum0
Maximum99999
Zeros44807
Zeros (%)91.7%
Negative0
Negative (%)0.0%
Memory size381.7 KiB
2024-06-27T11:28:52.064837image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile5013
Maximum99999
Range99999
Interquartile range (IQR)0

Descriptive statistics

Standard deviation7452.0191
Coefficient of variation (CV)6.9059796
Kurtosis152.6931
Mean1079.0676
Median Absolute Deviation (MAD)0
Skewness11.894659
Sum52703821
Variance55532588
MonotonicityNot monotonic
2024-06-27T11:28:52.338663image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 44807
91.7%
15024 513
 
1.1%
7688 410
 
0.8%
7298 364
 
0.7%
99999 244
 
0.5%
3103 152
 
0.3%
5178 146
 
0.3%
5013 117
 
0.2%
4386 108
 
0.2%
8614 82
 
0.2%
Other values (113) 1899
 
3.9%
ValueCountFrequency (%)
0 44807
91.7%
114 8
 
< 0.1%
401 5
 
< 0.1%
594 52
 
0.1%
914 10
 
< 0.1%
991 6
 
< 0.1%
1055 37
 
0.1%
1086 8
 
< 0.1%
1111 1
 
< 0.1%
1151 13
 
< 0.1%
ValueCountFrequency (%)
99999 244
0.5%
41310 3
 
< 0.1%
34095 6
 
< 0.1%
27828 58
 
0.1%
25236 14
 
< 0.1%
25124 6
 
< 0.1%
22040 1
 
< 0.1%
20051 49
 
0.1%
18481 2
 
< 0.1%
15831 8
 
< 0.1%

capital-loss
Real number (ℝ)

ZEROS 

Distinct99
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean87.502314
Minimum0
Maximum4356
Zeros46560
Zeros (%)95.3%
Negative0
Negative (%)0.0%
Memory size381.7 KiB
2024-06-27T11:28:52.612480image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum4356
Range4356
Interquartile range (IQR)0

Descriptive statistics

Standard deviation403.00455
Coefficient of variation (CV)4.6056445
Kurtosis20.014346
Mean87.502314
Median Absolute Deviation (MAD)0
Skewness4.5698089
Sum4273788
Variance162412.67
MonotonicityNot monotonic
2024-06-27T11:28:52.858429image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 46560
95.3%
1902 304
 
0.6%
1977 253
 
0.5%
1887 233
 
0.5%
2415 72
 
0.1%
1485 71
 
0.1%
1848 67
 
0.1%
1590 62
 
0.1%
1602 62
 
0.1%
1876 59
 
0.1%
Other values (89) 1099
 
2.3%
ValueCountFrequency (%)
0 46560
95.3%
155 1
 
< 0.1%
213 5
 
< 0.1%
323 5
 
< 0.1%
419 3
 
< 0.1%
625 17
 
< 0.1%
653 4
 
< 0.1%
810 2
 
< 0.1%
880 6
 
< 0.1%
974 2
 
< 0.1%
ValueCountFrequency (%)
4356 3
 
< 0.1%
3900 2
 
< 0.1%
3770 4
 
< 0.1%
3683 2
 
< 0.1%
3175 2
 
< 0.1%
3004 5
 
< 0.1%
2824 14
< 0.1%
2754 2
 
< 0.1%
2603 7
< 0.1%
2559 17
< 0.1%

hours-per-week
Real number (ℝ)

Distinct96
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40.422382
Minimum1
Maximum99
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size381.7 KiB
2024-06-27T11:28:53.134379image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile17.05
Q140
median40
Q345
95-th percentile60
Maximum99
Range98
Interquartile range (IQR)5

Descriptive statistics

Standard deviation12.391444
Coefficient of variation (CV)0.30654908
Kurtosis2.9510591
Mean40.422382
Median Absolute Deviation (MAD)3
Skewness0.23874966
Sum1974310
Variance153.54789
MonotonicityNot monotonic
2024-06-27T11:28:53.403418image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40 22803
46.7%
50 4246
 
8.7%
45 2717
 
5.6%
60 2177
 
4.5%
35 1937
 
4.0%
20 1862
 
3.8%
30 1700
 
3.5%
55 1051
 
2.2%
25 958
 
2.0%
48 770
 
1.6%
Other values (86) 8621
 
17.7%
ValueCountFrequency (%)
1 27
 
0.1%
2 53
 
0.1%
3 59
 
0.1%
4 84
 
0.2%
5 95
 
0.2%
6 92
 
0.2%
7 45
 
0.1%
8 218
0.4%
9 27
 
0.1%
10 425
0.9%
ValueCountFrequency (%)
99 137
0.3%
98 14
 
< 0.1%
97 2
 
< 0.1%
96 9
 
< 0.1%
95 2
 
< 0.1%
94 1
 
< 0.1%
92 3
 
< 0.1%
91 3
 
< 0.1%
90 42
 
0.1%
89 3
 
< 0.1%

native-country
Categorical

IMBALANCE 

Distinct42
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size381.7 KiB
United-States
43832 
Mexico
 
951
?
 
857
Philippines
 
295
Germany
 
206
Other values (37)
 
2701

Length

Max length26
Median length13
Mean length12.306847
Min length1

Characters and Unicode

Total characters601091
Distinct characters45
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowUnited-States
2nd rowUnited-States
3rd rowUnited-States
4th rowUnited-States
5th rowUnited-States

Common Values

ValueCountFrequency (%)
United-States 43832
89.7%
Mexico 951
 
1.9%
? 857
 
1.8%
Philippines 295
 
0.6%
Germany 206
 
0.4%
Puerto-Rico 184
 
0.4%
Canada 182
 
0.4%
El-Salvador 155
 
0.3%
India 151
 
0.3%
Cuba 138
 
0.3%
Other values (32) 1891
 
3.9%

Length

2024-06-27T11:28:53.641397image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united-states 43832
89.7%
mexico 951
 
1.9%
857
 
1.8%
philippines 295
 
0.6%
germany 206
 
0.4%
puerto-rico 184
 
0.4%
canada 182
 
0.4%
el-salvador 155
 
0.3%
india 151
 
0.3%
cuba 138
 
0.3%
Other values (32) 1891
 
3.9%

Most occurring characters

ValueCountFrequency (%)
t 132284
22.0%
e 89870
15.0%
a 47613
 
7.9%
i 47106
 
7.8%
n 45884
 
7.6%
d 44771
 
7.4%
- 44344
 
7.4%
s 44194
 
7.4%
S 44169
 
7.3%
U 43878
 
7.3%
Other values (35) 16978
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 463369
77.1%
Uppercase Letter 92448
 
15.4%
Dash Punctuation 44344
 
7.4%
Other Punctuation 884
 
0.1%
Open Punctuation 23
 
< 0.1%
Close Punctuation 23
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 132284
28.5%
e 89870
19.4%
a 47613
 
10.3%
i 47106
 
10.2%
n 45884
 
9.9%
d 44771
 
9.7%
s 44194
 
9.5%
o 2176
 
0.5%
c 1672
 
0.4%
l 1403
 
0.3%
Other values (11) 6396
 
1.4%
Uppercase Letter
ValueCountFrequency (%)
S 44169
47.8%
U 43878
47.5%
M 951
 
1.0%
P 679
 
0.7%
C 555
 
0.6%
I 375
 
0.4%
G 366
 
0.4%
E 327
 
0.4%
R 287
 
0.3%
J 198
 
0.2%
Other values (9) 663
 
0.7%
Other Punctuation
ValueCountFrequency (%)
? 857
96.9%
& 27
 
3.1%
Dash Punctuation
ValueCountFrequency (%)
- 44344
100.0%
Open Punctuation
ValueCountFrequency (%)
( 23
100.0%
Close Punctuation
ValueCountFrequency (%)
) 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 555817
92.5%
Common 45274
 
7.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 132284
23.8%
e 89870
16.2%
a 47613
 
8.6%
i 47106
 
8.5%
n 45884
 
8.3%
d 44771
 
8.1%
s 44194
 
8.0%
S 44169
 
7.9%
U 43878
 
7.9%
o 2176
 
0.4%
Other values (30) 13872
 
2.5%
Common
ValueCountFrequency (%)
- 44344
97.9%
? 857
 
1.9%
& 27
 
0.1%
( 23
 
0.1%
) 23
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 601091
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 132284
22.0%
e 89870
15.0%
a 47613
 
7.9%
i 47106
 
7.8%
n 45884
 
7.6%
d 44771
 
7.4%
- 44344
 
7.4%
s 44194
 
7.4%
S 44169
 
7.3%
U 43878
 
7.3%
Other values (35) 16978
 
2.8%

income
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size381.7 KiB
<=50K
37155 
>50K
11687 

Length

Max length5
Median length5
Mean length4.7607182
Min length4

Characters and Unicode

Total characters232523
Distinct characters6
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<=50K
2nd row<=50K
3rd row>50K
4th row>50K
5th row<=50K

Common Values

ValueCountFrequency (%)
<=50K 37155
76.1%
>50K 11687
 
23.9%

Length

2024-06-27T11:28:53.866295image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-06-27T11:28:54.055465image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
50k 48842
100.0%

Most occurring characters

ValueCountFrequency (%)
5 48842
21.0%
0 48842
21.0%
K 48842
21.0%
< 37155
16.0%
= 37155
16.0%
> 11687
 
5.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 97684
42.0%
Math Symbol 85997
37.0%
Uppercase Letter 48842
21.0%

Most frequent character per category

Math Symbol
ValueCountFrequency (%)
< 37155
43.2%
= 37155
43.2%
> 11687
 
13.6%
Decimal Number
ValueCountFrequency (%)
5 48842
50.0%
0 48842
50.0%
Uppercase Letter
ValueCountFrequency (%)
K 48842
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 183681
79.0%
Latin 48842
 
21.0%

Most frequent character per script

Common
ValueCountFrequency (%)
5 48842
26.6%
0 48842
26.6%
< 37155
20.2%
= 37155
20.2%
> 11687
 
6.4%
Latin
ValueCountFrequency (%)
K 48842
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 232523
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5 48842
21.0%
0 48842
21.0%
K 48842
21.0%
< 37155
16.0%
= 37155
16.0%
> 11687
 
5.0%

Interactions

2024-06-27T11:28:44.636950image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:36.553386image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:38.186546image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:39.790311image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:41.356903image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:43.084856image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:44.958781image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:36.865132image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:38.472284image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:40.053180image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:41.629727image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:43.337693image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:45.240619image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:37.153534image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:38.742477image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:40.318482image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:42.042454image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:43.666506image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:45.523458image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:37.434374image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:39.024322image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:40.588324image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:42.299289image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:43.905386image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:45.766318image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:37.685229image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:39.277344image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:40.852192image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:42.574132image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:44.150229image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:46.004164image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:37.931711image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:39.540471image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:41.103029image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:42.829004image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2024-06-27T11:28:44.385110image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2024-06-27T11:28:54.231629image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
agefnlwgteducational-numcapital-gaincapital-losshours-per-weekworkclasseducationmarital-statusoccupationrelationshipracegendernative-countryincome
age1.000-0.0780.0630.1240.0580.1470.1150.1130.2840.1190.2730.0270.1250.0320.316
fnlwgt-0.0781.000-0.030-0.009-0.001-0.0160.0200.0180.0220.0170.0170.0700.0280.0520.010
educational-num0.063-0.0301.0000.1190.0770.1640.0911.0000.0760.2210.1090.0670.0730.1420.360
capital-gain0.124-0.0090.1191.000-0.0660.0920.0530.1070.0380.0670.0430.0130.0490.0180.271
capital-loss0.058-0.0010.077-0.0661.0000.0600.0220.0380.0570.0350.0640.0120.0640.0220.197
hours-per-week0.147-0.0160.1640.0920.0601.0000.1150.0890.1180.1430.1620.0580.2400.0300.269
workclass0.1150.0200.0910.0530.0220.1151.0000.0980.0850.4000.1000.0570.1510.0320.181
education0.1130.0181.0000.1070.0380.0890.0981.0000.0890.1860.1210.0710.0920.1270.365
marital-status0.2840.0220.0760.0380.0570.1180.0850.0891.0000.1310.4880.0820.4590.0590.448
occupation0.1190.0170.2210.0670.0350.1430.4000.1860.1311.0000.1770.0770.4240.0620.350
relationship0.2730.0170.1090.0430.0640.1620.1000.1210.4880.1771.0000.0970.6460.0740.454
race0.0270.0700.0670.0130.0120.0580.0570.0710.0820.0770.0971.0000.1140.4010.099
gender0.1250.0280.0730.0490.0640.2400.1510.0920.4590.4240.6460.1141.0000.0540.215
native-country0.0320.0520.1420.0180.0220.0300.0320.1270.0590.0620.0740.4010.0541.0000.092
income0.3160.0100.3600.2710.1970.2690.1810.3650.4480.3500.4540.0990.2150.0921.000

Missing values

2024-06-27T11:28:46.436082image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2024-06-27T11:28:47.097703image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

ageworkclassfnlwgteducationeducational-nummarital-statusoccupationrelationshipracegendercapital-gaincapital-losshours-per-weeknative-countryincome
025Private22680211th7Never-marriedMachine-op-inspctOwn-childBlackMale0040United-States<=50K
138Private89814HS-grad9Married-civ-spouseFarming-fishingHusbandWhiteMale0050United-States<=50K
228Local-gov336951Assoc-acdm12Married-civ-spouseProtective-servHusbandWhiteMale0040United-States>50K
344Private160323Some-college10Married-civ-spouseMachine-op-inspctHusbandBlackMale7688040United-States>50K
418?103497Some-college10Never-married?Own-childWhiteFemale0030United-States<=50K
534Private19869310th6Never-marriedOther-serviceNot-in-familyWhiteMale0030United-States<=50K
629?227026HS-grad9Never-married?UnmarriedBlackMale0040United-States<=50K
763Self-emp-not-inc104626Prof-school15Married-civ-spouseProf-specialtyHusbandWhiteMale3103032United-States>50K
824Private369667Some-college10Never-marriedOther-serviceUnmarriedWhiteFemale0040United-States<=50K
955Private1049967th-8th4Married-civ-spouseCraft-repairHusbandWhiteMale0010United-States<=50K
ageworkclassfnlwgteducationeducational-nummarital-statusoccupationrelationshipracegendercapital-gaincapital-losshours-per-weeknative-countryincome
4883232Private3406610th6Married-civ-spouseHandlers-cleanersHusbandAmer-Indian-EskimoMale0040United-States<=50K
4883343Private84661Assoc-voc11Married-civ-spouseSalesHusbandWhiteMale0045United-States<=50K
4883432Private116138Masters14Never-marriedTech-supportNot-in-familyAsian-Pac-IslanderMale0011Taiwan<=50K
4883553Private321865Masters14Married-civ-spouseExec-managerialHusbandWhiteMale0040United-States>50K
4883622Private310152Some-college10Never-marriedProtective-servNot-in-familyWhiteMale0040United-States<=50K
4883727Private257302Assoc-acdm12Married-civ-spouseTech-supportWifeWhiteFemale0038United-States<=50K
4883840Private154374HS-grad9Married-civ-spouseMachine-op-inspctHusbandWhiteMale0040United-States>50K
4883958Private151910HS-grad9WidowedAdm-clericalUnmarriedWhiteFemale0040United-States<=50K
4884022Private201490HS-grad9Never-marriedAdm-clericalOwn-childWhiteMale0020United-States<=50K
4884152Self-emp-inc287927HS-grad9Married-civ-spouseExec-managerialWifeWhiteFemale15024040United-States>50K

Duplicate rows

Most frequently occurring

ageworkclassfnlwgteducationeducational-nummarital-statusoccupationrelationshipracegendercapital-gaincapital-losshours-per-weeknative-countryincome# duplicates
1221Private243368Preschool1Never-marriedFarming-fishingNot-in-familyWhiteMale0050Mexico<=50K3
2325Private1959941st-4th2Never-marriedPriv-house-servNot-in-familyWhiteFemale0040Guatemala<=50K3
2425Private308144Bachelors13Never-marriedCraft-repairNot-in-familyWhiteMale0040Mexico<=50K3
017Private15302112th8Never-marriedSalesOwn-childWhiteFemale0020United-States<=50K2
118Self-emp-inc37803612th8Never-marriedFarming-fishingOwn-childWhiteMale0010United-States<=50K2
219?167428Some-college10Never-married?Own-childWhiteMale0040United-States<=50K2
319Private97261HS-grad9Never-marriedFarming-fishingNot-in-familyWhiteMale0040United-States<=50K2
419Private1304315th-6th3Never-marriedFarming-fishingNot-in-familyWhiteMale0036Mexico<=50K2
519Private138153Some-college10Never-marriedAdm-clericalOwn-childWhiteFemale0010United-States<=50K2
619Private139466Some-college10Never-marriedSalesOwn-childWhiteFemale0025United-States<=50K2