With a Linux server you can set it up as a ssh server and use sshfs or directly use ssh or sftp, from terminal or with a third party client. 1. level 2. Manawqt. Op · 1m. You can set up linux in a VM…
RMS Titanic was an Olympic-class transatlantic cruise ship owned by White Star Line. Manufactured at the Harland and Wolff (Belfast, Ireland) shipyards. On the night of April 15, 1912, it hit an iceberg on its first voyage and sank into the icy waters of the North Atlantic in about two hours and forty minutes. When its construction was completed in 1912, it was the largest steam passenger ship in the world. Its sinking resulted in the deaths of 1,514 people and went down in history as one of the greatest maritime disasters.
The great loss of life caused by the sinking of the Titanic was attributed to many reasons, but the fact that stood out over time was that the ship did not carry enough lifeboats for everyone. Although the full capacity of the Titanic was 3,547 people, the total capacity of the lifeboats on board was 1,178 people. Also, the number of men who died in total was very inproportionate because they prioritized women and children during the accident.
PassengerId is a continuous/numeric variable but acts like an index.
Let’s Examine Categorical Variables
Let’s Review Continuous/Numerical Variables
In Dataset
let’s analyze.
Age
Observation units between 0 and 1 year old
Observation units with float data type in the 0–1 age range express the age of a newborn baby.
Number of days since birthday / 365
Fare
The condition that the minimum value of the ticket price is zero
As a result
If the price paid for the ticket is zero
It cannot be ruled out that there are passengers with the same ticket number and all passengers embark from the same port.
It can be commented that, with the possibility that a certain number of promotional tickets were distributed according to the ticket grade, the age of the passengers included in the ticket class number two is not known, creating a false identity and ignoring the ship security.
Let’s Examine the Target/Dependent/Output Variable
Target Variable by Category Variables
Hedef degisken → by Survived
The number of missing observations in the Age variable is approximately one-fifth of the dataset. The number of missing observations in the Cabine variable is approximately four-fifth of the dataset. The number of missing observations in the imbarked variable is approximately two-thousandths of the dataset.
if
Cabin
Because Cabin, which is a categorical and Cardinal variable, has a large number of unique classes and the ratio of missing observation units to total observation units is high, it is a logical decision to remove/delete/drop Cabin Variable from the data set.
If a significant variable is obtained for the target variable after the Cabin variable is converted to a bool categorical variable, it should be deleted after the Feature Engineering phase is completed.
When the Cabin variable is converted to a binary variable, it becomes meaningful for the target variable, so it should be deleted after it is converted to a new variable in the Feature Engineering stage.
Age
It would be correct to fill the observation units of the variable Age, which is a continuous variable, with a statistical metric.
Standard deviation is 14 and Age is a high ratio since it has values between 0 and 80. Therefore, filling with median should be preferred instead of filling with mean.
Embarked
The ratio of missing observation units to the total observation units is two per thousand.
Outlier Observation Units
Name
The names of the passengers can give us information about their ethnic origin and financial situation.
Adjective
Labels that individuals have can also be an important factor in terms of survival.
Cabin
Ticket
There are many passengers with the same ticket tag. Tickets are cut after a single transaction and their tags are unique, so a ticket represents a group and is charged according to the number of people.
If the number of people in the group is in the range of [1–5], there is a positive correlation between the unique class and the target variable, but if the number of people is in the range of [6–7], there is a negative correlation between the unique class and the target variable.
The higher the number of people in the group, the higher the survival rate.
Ticket
Area
Age
Fare
Ticket numbers are unique, a ticket can belong to a group or to an individual.
Examining the existence of unique rare classes of Categorical Variables
If you are not carrying out a project in an industry (e.g. : health) where even small odds are of great value, it is necessary to delete rare classes.
After One Hot Encoding, a new variable is created for each class, but since the correlation between the variable created from rare classes and the target variable is at a very low level, residue occurs in the data set.
If the area under the curve increases (AUC →Area Under Curve), the accuracy score increases.
My LinkedIn Address :
My Medium Address :
My Github Address :
My Kaggle Address :
Data Science has been hailed as the transformative trend that is set to re-wire the industries and re-invent the ways people do things. Products and applications are being developed in agriculture…
The PubChem Bioassay Database is the largest open-source repository of bioactivity data. With over 290 million data points from well-described biological experiments, it is an invaluable source of…
Before I could submit this article for publication, it appears Black Panther star Letitia Wright has deleted her Twitter and Instagram accounts following online backlash for a video she shared. I suppose that’s a frustratingly apt metaphor for what I’m about to discuss.