Dataset statistics
Number of variables | 10 |
---|---|
Number of observations | 1048575 |
Missing cells | 1510399 |
Missing cells (%) | 14.4% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 80.0 MiB |
Average record size in memory | 80.0 B |
Variable types
Numeric | 5 |
---|---|
DateTime | 1 |
Categorical | 3 |
Unsupported | 1 |
deleted has constant value "0" | Constant |
source_id is highly correlated with source_type and 1 other fields | High correlation |
source_type is highly correlated with source_id | High correlation |
value is highly correlated with device_id | High correlation |
device_id is highly correlated with source_id and 1 other fields | High correlation |
source_id is highly correlated with source_type and 1 other fields | High correlation |
source_type is highly correlated with source_id | High correlation |
device_id is highly correlated with source_id | High correlation |
source_id is highly correlated with source_type and 1 other fields | High correlation |
source_type is highly correlated with source_id | High correlation |
device_id is highly correlated with source_id | High correlation |
deleted is highly correlated with source_type and 1 other fields | High correlation |
source_type is highly correlated with deleted | High correlation |
type is highly correlated with deleted | High correlation |
type is highly correlated with source_id and 4 other fields | High correlation |
source_id is highly correlated with type and 3 other fields | High correlation |
source_type is highly correlated with type and 1 other fields | High correlation |
value is highly correlated with type and 1 other fields | High correlation |
device_id is highly correlated with type and 2 other fields | High correlation |
zone_id is highly correlated with type and 3 other fields | High correlation |
device_id has 461824 (44.0%) missing values | Missing |
deleted_date has 1048575 (100.0%) missing values | Missing |
id is uniformly distributed | Uniform |
id has unique values | Unique |
deleted_date is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
source_id has 24939 (2.4%) zeros | Zeros |
zone_id has 61451 (5.9%) zeros | Zeros |
Reproduction
Analysis started | 2022-08-19 09:25:35.274419 |
---|---|
Analysis finished | 2022-08-19 09:26:18.386490 |
Duration | 43.11 seconds |
Software version | pandas-profiling v3.2.0 |
Download configuration | config.json |
Distinct | 1048575 |
---|---|
Distinct (%) | 100.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 524866 |
Minimum | 579 |
---|---|
Maximum | 1049153 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 8.0 MiB |
Quantile statistics
Minimum | 579 |
---|---|
5-th percentile | 53007.7 |
Q1 | 262722.5 |
median | 524866 |
Q3 | 787009.5 |
95-th percentile | 996724.3 |
Maximum | 1049153 |
Range | 1048574 |
Interquartile range (IQR) | 524287 |
Descriptive statistics
Standard deviation | 302697.6736 |
---|---|
Coefficient of variation (CV) | 0.5767141968 |
Kurtosis | -1.2 |
Mean | 524866 |
Median Absolute Deviation (MAD) | 262144 |
Skewness | 0 |
Sum | 5.50361366 × 1011 |
Variance | 9.16258816 × 1010 |
Monotonicity | Strictly increasing |
Value | Count | Frequency (%) |
579 | 1 | < 0.1% |
699634 | 1 | < 0.1% |
699621 | 1 | < 0.1% |
699622 | 1 | < 0.1% |
699623 | 1 | < 0.1% |
699624 | 1 | < 0.1% |
699625 | 1 | < 0.1% |
699626 | 1 | < 0.1% |
699627 | 1 | < 0.1% |
699628 | 1 | < 0.1% |
Other values (1048565) | 1048565 |
Value | Count | Frequency (%) |
579 | 1 | |
580 | 1 | |
581 | 1 | |
582 | 1 | |
583 | 1 | |
584 | 1 | |
585 | 1 | |
586 | 1 | |
587 | 1 | |
588 | 1 |
Value | Count | Frequency (%) |
1049153 | 1 | |
1049152 | 1 | |
1049151 | 1 | |
1049150 | 1 | |
1049149 | 1 | |
1049148 | 1 | |
1049147 | 1 | |
1049146 | 1 | |
1049145 | 1 | |
1049144 | 1 |
created
Date
Distinct | 101890 |
---|---|
Distinct (%) | 9.7% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 8.0 MiB |
Minimum | 2016-08-02 12:32:18 |
---|---|
Maximum | 2016-08-03 23:58:22 |
Distinct | 4 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 8.0 MiB |
2 | |
---|---|
11 | |
8 | |
4 |
Common Values
Value | Count | Frequency (%) |
2 | 419535 | |
11 | 224480 | |
8 | 211030 | |
4 | 193530 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
2 | 419535 | |
11 | 224480 | |
8 | 211030 | |
4 | 193530 |
Most occurring characters
Value | Count | Frequency (%) |
1 | 448960 | |
2 | 419535 | |
8 | 211030 | |
4 | 193530 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 1273055 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
1 | 448960 | |
2 | 419535 | |
8 | 211030 | |
4 | 193530 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 1273055 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
1 | 448960 | |
2 | 419535 | |
8 | 211030 | |
4 | 193530 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1273055 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
1 | 448960 | |
2 | 419535 | |
8 | 211030 | |
4 | 193530 |
source_id
Real number (ℝ≥0)
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS
Distinct | 44 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 17.40758172 |
Minimum | 0 |
---|---|
Maximum | 58 |
Zeros | 24939 |
Zeros (%) | 2.4% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 8.0 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 3 |
median | 5 |
Q3 | 35 |
95-th percentile | 51 |
Maximum | 58 |
Range | 58 |
Interquartile range (IQR) | 32 |
Descriptive statistics
Standard deviation | 18.07786195 |
---|---|
Coefficient of variation (CV) | 1.038505075 |
Kurtosis | -0.9170016318 |
Mean | 17.40758172 |
Median Absolute Deviation (MAD) | 4 |
Skewness | 0.7817356947 |
Sum | 18253155 |
Variance | 326.8090928 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
3 | 265848 | |
2 | 93832 | 8.9% |
5 | 77736 | 7.4% |
1 | 47026 | 4.5% |
4 | 39408 | 3.8% |
0 | 24939 | 2.4% |
35 | 24264 | 2.3% |
36 | 24128 | 2.3% |
18 | 23638 | 2.3% |
17 | 23638 | 2.3% |
Other values (34) | 404118 |
Value | Count | Frequency (%) |
0 | 24939 | 2.4% |
1 | 47026 | 4.5% |
2 | 93832 | 8.9% |
3 | 265848 | |
4 | 39408 | 3.8% |
5 | 77736 | 7.4% |
7 | 23620 | 2.3% |
9 | 5328 | 0.5% |
10 | 5327 | 0.5% |
11 | 2347 | 0.2% |
Value | Count | Frequency (%) |
58 | 6130 | |
57 | 11568 | |
54 | 11568 | |
53 | 3085 | 0.3% |
52 | 12345 | |
51 | 13816 | |
50 | 13852 | |
49 | 13849 | |
48 | 13852 | |
47 | 13852 |
source_type
Categorical
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
Distinct | 2 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 8.0 MiB |
0 | |
---|---|
1 |
Common Values
Value | Count | Frequency (%) |
0 | 586751 | |
1 | 461824 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
0 | 586751 | |
1 | 461824 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 586751 | |
1 | 461824 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 1048575 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 586751 | |
1 | 461824 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 1048575 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 586751 | |
1 | 461824 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1048575 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 586751 | |
1 | 461824 |
Distinct | 8848 |
---|---|
Distinct (%) | 0.8% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 187.4722094 |
Minimum | 0.1 |
---|---|
Maximum | 2044 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 8.0 MiB |
Quantile statistics
Minimum | 0.1 |
---|---|
5-th percentile | 0.14 |
Q1 | 14.2 |
median | 73.49 |
Q3 | 179 |
95-th percentile | 864 |
Maximum | 2044 |
Range | 2043.9 |
Interquartile range (IQR) | 164.8 |
Descriptive statistics
Standard deviation | 287.0248363 |
---|---|
Coefficient of variation (CV) | 1.53102605 |
Kurtosis | 4.507448884 |
Mean | 187.4722094 |
Median Absolute Deviation (MAD) | 69.49 |
Skewness | 2.174587786 |
Sum | 196578672 |
Variance | 82383.25666 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
0.13 | 22393 | 2.1% |
82.74 | 22259 | 2.1% |
84.3 | 20785 | 2.0% |
81.45 | 15820 | 1.5% |
0.24 | 15053 | 1.4% |
2 | 14647 | 1.4% |
0.12 | 14394 | 1.4% |
1.5 | 13237 | 1.3% |
80.16 | 12303 | 1.2% |
0.14 | 10053 | 1.0% |
Other values (8838) | 887631 |
Value | Count | Frequency (%) |
0.1 | 7115 | 0.7% |
0.11 | 524 | < 0.1% |
0.12 | 14394 | |
0.13 | 22393 | |
0.14 | 10053 | |
0.15 | 4445 | 0.4% |
0.16 | 1713 | 0.2% |
0.17 | 1875 | 0.2% |
0.18 | 5835 | 0.6% |
0.19 | 7131 | 0.7% |
Value | Count | Frequency (%) |
2044 | 3 | |
1984 | 3 | |
1974 | 3 | |
1934 | 3 | |
1924 | 3 | |
1915 | 3 | |
1901 | 3 | |
1886 | 3 | |
1870 | 3 | |
1868 | 3 |
device_id
Real number (ℝ≥0)
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
Distinct | 10 |
---|---|
Distinct (%) | < 0.1% |
Missing | 461824 |
Missing (%) | 44.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 8.109107611 |
Minimum | 1 |
---|---|
Maximum | 15 |
Zeros | 0 |
Zeros (%) | 0.0% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 8.0 MiB |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 4 |
median | 8 |
Q3 | 15 |
95-th percentile | 15 |
Maximum | 15 |
Range | 14 |
Interquartile range (IQR) | 11 |
Descriptive statistics
Standard deviation | 4.80401161 |
---|---|
Coefficient of variation (CV) | 0.5924217362 |
Kurtosis | -1.270267306 |
Mean | 8.109107611 |
Median Absolute Deviation (MAD) | 4 |
Skewness | 0.2614423216 |
Sum | 4758027 |
Variance | 23.07852755 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
15 | 150473 | 14.4% |
8 | 73150 | 7.0% |
6 | 64254 | 6.1% |
3 | 63345 | 6.0% |
5 | 60755 | 5.8% |
10 | 53770 | 5.1% |
1 | 47240 | 4.5% |
4 | 34704 | 3.3% |
11 | 26058 | 2.5% |
2 | 13002 | 1.2% |
(Missing) | 461824 |
Value | Count | Frequency (%) |
1 | 47240 | 4.5% |
2 | 13002 | 1.2% |
3 | 63345 | |
4 | 34704 | 3.3% |
5 | 60755 | |
6 | 64254 | |
8 | 73150 | |
10 | 53770 | 5.1% |
11 | 26058 | 2.5% |
15 | 150473 |
Value | Count | Frequency (%) |
15 | 150473 | |
11 | 26058 | 2.5% |
10 | 53770 | 5.1% |
8 | 73150 | |
6 | 64254 | |
5 | 60755 | |
4 | 34704 | 3.3% |
3 | 63345 | |
2 | 13002 | 1.2% |
1 | 47240 | 4.5% |
Distinct | 6 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 2.9105157 |
Minimum | 0 |
---|---|
Maximum | 5 |
Zeros | 61451 |
Zeros (%) | 5.9% |
Negative | 0 |
Negative (%) | 0.0% |
Memory size | 8.0 MiB |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 2 |
median | 3 |
Q3 | 3 |
95-th percentile | 5 |
Maximum | 5 |
Range | 5 |
Interquartile range (IQR) | 1 |
Descriptive statistics
Standard deviation | 1.238552958 |
---|---|
Coefficient of variation (CV) | 0.4255441598 |
Kurtosis | 0.2941330708 |
Mean | 2.9105157 |
Median Absolute Deviation (MAD) | 1 |
Skewness | -0.291233769 |
Sum | 3051894 |
Variance | 1.53401343 |
Monotonicity | Not monotonic |
Value | Count | Frequency (%) |
3 | 516231 | |
2 | 184004 | 17.5% |
5 | 145152 | 13.8% |
4 | 89232 | 8.5% |
0 | 61451 | 5.9% |
1 | 52505 | 5.0% |
Value | Count | Frequency (%) |
0 | 61451 | 5.9% |
1 | 52505 | 5.0% |
2 | 184004 | 17.5% |
3 | 516231 | |
4 | 89232 | 8.5% |
5 | 145152 | 13.8% |
Value | Count | Frequency (%) |
5 | 145152 | 13.8% |
4 | 89232 | 8.5% |
3 | 516231 | |
2 | 184004 | 17.5% |
1 | 52505 | 5.0% |
0 | 61451 | 5.9% |
Distinct | 1 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 8.0 MiB |
0 |
---|
Common Values
Value | Count | Frequency (%) |
0 | 1048575 |
Length
Category Frequency Plot
Value | Count | Frequency (%) |
0 | 1048575 |
Most occurring characters
Value | Count | Frequency (%) |
0 | 1048575 |
Most occurring categories
Value | Count | Frequency (%) |
Decimal Number | 1048575 |
Most frequent character per category
Decimal Number
Value | Count | Frequency (%) |
0 | 1048575 |
Most occurring scripts
Value | Count | Frequency (%) |
Common | 1048575 |
Most frequent character per script
Common
Value | Count | Frequency (%) |
0 | 1048575 |
Most occurring blocks
Value | Count | Frequency (%) |
ASCII | 1048575 |
Most frequent character per block
ASCII
Value | Count | Frequency (%) |
0 | 1048575 |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
id | created | type | source_id | source_type | value | device_id | zone_id | deleted | deleted_date | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 579 | 2016-08-02 12:32:18 | 2 | 28 | 0 | 84.30 | 8.0 | 2 | 0 | NaN |
1 | 580 | 2016-08-02 12:32:18 | 8 | 42 | 0 | 0.24 | 15.0 | 3 | 0 | NaN |
2 | 581 | 2016-08-02 12:32:18 | 8 | 43 | 0 | 0.15 | 15.0 | 3 | 0 | NaN |
3 | 582 | 2016-08-02 12:32:18 | 2 | 28 | 0 | 71.58 | 8.0 | 2 | 0 | NaN |
4 | 583 | 2016-08-02 12:32:18 | 8 | 44 | 0 | 6.39 | 15.0 | 3 | 0 | NaN |
5 | 584 | 2016-08-02 12:32:18 | 2 | 29 | 0 | 84.30 | 8.0 | 2 | 0 | NaN |
6 | 585 | 2016-08-02 12:32:18 | 2 | 17 | 0 | 84.30 | 5.0 | 4 | 0 | NaN |
7 | 586 | 2016-08-02 12:32:18 | 2 | 29 | 0 | 71.22 | 8.0 | 2 | 0 | NaN |
8 | 587 | 2016-08-02 12:32:18 | 8 | 45 | 0 | 0.50 | 15.0 | 3 | 0 | NaN |
9 | 588 | 2016-08-02 12:32:18 | 2 | 17 | 0 | 69.32 | 5.0 | 4 | 0 | NaN |
Last rows
id | created | type | source_id | source_type | value | device_id | zone_id | deleted | deleted_date | |
---|---|---|---|---|---|---|---|---|---|---|
1048565 | 1049144 | 2016-08-03 23:58:22 | 2 | 36 | 0 | 75.72 | 10.0 | 5 | 0 | NaN |
1048566 | 1049145 | 2016-08-03 23:58:22 | 2 | 18 | 0 | 81.45 | 5.0 | 4 | 0 | NaN |
1048567 | 1049146 | 2016-08-03 23:58:22 | 4 | 4 | 0 | 571.00 | 3.0 | 1 | 0 | NaN |
1048568 | 1049147 | 2016-08-03 23:58:22 | 2 | 29 | 0 | 80.16 | 8.0 | 2 | 0 | NaN |
1048569 | 1049148 | 2016-08-03 23:58:22 | 2 | 18 | 0 | 72.70 | 5.0 | 4 | 0 | NaN |
1048570 | 1049149 | 2016-08-03 23:58:22 | 4 | 41 | 0 | 444.00 | 10.0 | 5 | 0 | NaN |
1048571 | 1049150 | 2016-08-03 23:58:22 | 2 | 10 | 0 | 55.67 | 2.0 | 1 | 0 | NaN |
1048572 | 1049151 | 2016-08-03 23:58:22 | 4 | 22 | 0 | 1064.00 | 6.0 | 2 | 0 | NaN |
1048573 | 1049152 | 2016-08-03 23:58:22 | 2 | 29 | 0 | 73.49 | 8.0 | 2 | 0 | NaN |
1048574 | 1049153 | 2016-08-03 23:58:22 | 2 | 3 | 1 | 77.58 | NaN | 3 | 0 | NaN |