The dataset is from 2020 but this will show the analysis
Let's Import the modules
import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt print('Modules are imported.') |
importing covid19 dataset
corona_dataset_csv = pd.read_csv("Datasets/Covid19_Confirmed_dataset.csv") corona_dataset_csv.head(10) |
first 10 rows of corona_dataset_csv Delete the useless columns and show the first 10 rows again
corona_dataset_csv.drop(["Lat","Long"],axis=1,inplace=True) corona_dataset_csv.head(10) |
Aggregating the rows by the country
corona_dataset_aggregated = corona_dataset_csv.groupby("Country/Region").sum() corona_dataset_aggregated.head() |
corona_dataset_aggregated.shape
(187, 101)
Visualizing data related to a country for example China: visualization always helps for better understanding of data.
corona_dataset_aggregated.loc['China'].plot() corona_dataset_aggregated.loc['Italy'].plot() corona_dataset_aggregated.loc['Spain'].plot() plt.legend() |
Calculating a good measure: we need to find a good measure represented as a number, describing the spread of the virus in a country.
corona_dataset_aggregated.loc['China'].plot() plt.title("Spread of Virus in China") |
find maximum infection rate for all of the countries
countries = list(corona_dataset_aggregated.index) max_infection_rates = [] for c in countries: max_infection_rates.append(corona_dataset_aggregated.loc[c].diff().max()) corona_dataset_aggregated["max_infection_rate"] = max_infection_rates |
create a new dataframe with only needed column
corona_data = pd.DataFrame(corona_dataset_aggregated["max_infection_rate"]) corona_data.head() |
Comments