Interactive mode: click a code block or Show Plot button to reveal/hide its corresponding plot.

Script Walkthrough


* -----------------------------------------------------
* Basic Stata Commands and Knowledge for New Users
* -----------------------------------------------------

* Clear the current environment (important when starting fresh)
clear all

* Stop Stata from pausing when output is long (e.g., large tables)
set more off

* -----------------------------------------------------
* Loading Built-in Data
* -----------------------------------------------------

* Load a built-in dataset provided by Stata (e.g., auto dataset)
sysuse auto

* Check the first few rows of the dataset
list in 1/10

* Show the structure of the dataset, including variable names, storage types, and labels
describe

* View the names and types of the variables, along with the total number of observations
codebook

* -----------------------------------------------------
* Understanding Variables and Data Structure
* -----------------------------------------------------

* Get a summary of all variables in the dataset (mean, std dev, min, max)
summarize

* Get summary statistics for specific variables (price, weight, mpg)
summarize price weight mpg

* -----------------------------------------------------
* Variable Labels and Value Labels
* -----------------------------------------------------

* Display the variable labels in the dataset
label list

* Display labels associated with variables (e.g., foreign cars)
label list origin

* List the unique values and their frequency for a categorical variable (foreign)
tabulate foreign

* -----------------------------------------------------
* Data Inspection: Simple and Structured Data Views
* -----------------------------------------------------

* List specific variables for the first 10 observations (make, price, mpg)
list make price mpg in 1/10

* Find summary statistics broken down by a category (e.g., summarize by foreign)
bysort foreign: summarize price mpg

* -----------------------------------------------------
* Sorting and Organizing Data
* -----------------------------------------------------

* Sort the data by the price variable in ascending order
sort price

* Display the first 10 observations sorted by price
list make price mpg in 1/10

* Sort the data by price in descending order (using gsort)
gsort -price

* Display the first 10 observations after sorting in descending order
list make price mpg in 1/10

* -----------------------------------------------------
* Basic Graphs and Data Visualization
* -----------------------------------------------------

* Create a scatter plot of price vs. weight
scatter price weight

* Create a histogram of mpg (miles per gallon)
histogram mpg

* -----------------------------------------------------
* Saving and Exporting Data
* -----------------------------------------------------

* Save the dataset in Stata format with a new name
save auto_copy.dta, replace

* Export the dataset to a CSV file
export delimited using "auto_data.csv", replace

* -----------------------------------------------------
* Basic File Management
* -----------------------------------------------------

* Check the current working directory
pwd

* Change the working directory (use your desired folder path)
cd "C:/Your/Desired/Directory"

* List all the files in the current working directory
dir

* -----------------------------------------------------
* Help and Resources
* -----------------------------------------------------

* Get detailed help on any Stata command (e.g., summarize)
help summarize

* Search for commands or functions related to specific tasks (e.g., regression)
search regression

* -----------------------------------------------------
* Conclusion of Basic Stata Commands
* -----------------------------------------------------

* This session covered essential Stata operations:
* - Loading data (`sysuse`)
* - Inspecting datasets (`describe`, `codebook`, `summarize`)
* - Sorting data (`sort`, `gsort`)
* - Simple graphs (`scatter`, `histogram`)
* - Managing files and directories (`save`, `export`, `pwd`, `cd`)
* - Using Stata's help system (`help`, `search`)
* -----------------------------------------------------






* -----------------------------------------------------
* Introduction to Data Manipulation in Stata
* -----------------------------------------------------

clear all
set more off

* Load the built-in dataset
sysuse auto

* Inspect the data
list in 1/10

* -----------------------------------------------------
* Part 1: Renaming Variables
* -----------------------------------------------------

* Rename mpg to Miles_Per_Gallon
rename mpg Miles_Per_Gallon

* Inspect the first few rows to check the new variable name
list make Miles_Per_Gallon in 1/10

* -----------------------------------------------------
* Part 2: Filtering Data (using `keep` and `drop`)
* -----------------------------------------------------

* Keep cars with more than 3 cylinders (equivalent to filter() in R)
keep if rep78 > 3

* List the first few rows after filtering
list make rep78 in 1/10

* drop variables that you won't use

drop foreign gear_ratio


* -----------------------------------------------------
* Part 3: Creating and Modifying Variables (equivalent to mutate() in R)
* -----------------------------------------------------

* Create a new variable: price per weight (price/weight)
gen price_per_weight = price / weight

* List the first few rows to check the new variable
list make price weight price_per_weight in 1/10

* -----------------------------------------------------
* Creating Multiple Variables
* -----------------------------------------------------

* Create two new variables:
* 1. price per weight (price/weight)
* 2. mpg_class: classify as "Efficient" if Miles_Per_Gallon > 20, otherwise "Non-efficient"
gen price_per_weight = price / weight
gen mpg_class = cond(Miles_Per_Gallon > 20, "Efficient", "Non-efficient")

* List the first few rows to check both variables
list make price_per_weight mpg_class in 1/10

* -----------------------------------------------------
* Conditional Mutate with `gen` and `cond()` (Equivalent to `case_when()` in R)
* -----------------------------------------------------

* Classify cars based on their weight using cond():
* Light (<2500), Medium (2500-3500), Heavy (>3500)
gen weight_class = cond(weight < 2500, "Light", cond(weight >= 2500 & weight < 3500, "Medium", "Heavy"))

* List the first few rows to check the new classifications
list make weight weight_class in 1/10

* -----------------------------------------------------
* Modifying Existing Variables
* -----------------------------------------------------

* Modify the price variable by creating a new categorical variable
* Classify price into categories "Low", "Medium", "High"
gen price_class = cond(price > 10000, "High", cond(price > 5000 & price <= 10000, "Medium", "Low"))

* List the first few rows to check the new variable
list make price price_class in 1/10

* -----------------------------------------------------
* Working with Multiple Variables
* -----------------------------------------------------

* Apply the `egen` function to calculate the row-wise mean of mpg and weight

* The term row-wise mean refers to calculating the mean (average) for each row of 
* specific columns in a dataset, rather than calculating the mean across the entire 
* column (which would be a column-wise mean). In other words, instead of computing a single
* average for all values in a variable, you compute the mean for a group of variables 
* within each observation (row).

egen mean_mpg_weight = rowmean(Miles_Per_Gallon weight)

* List the first few rows to check the new variable
list make Miles_Per_Gallon weight mean_mpg_weight in 1/10


* This variable mean_mpg_weight is not meaningful here, but just immagine that you want to calculate
* someone's average working hours within 3 days 
* Each row of data is one person, and you have variables Day1 Day2 and Day3 that records how many hours
* each person works in one day
* the code here will be:

egen mean_work_hours = rowmean(Day1 Day2 Day3)


* -----------------------------------------------------
* Conclusion: Key Stata Functions Demonstrated
* - rename: to rename variables
* - keep: to filter rows based on conditions
* - gen: to create new variables
* - egen: to perform calculations across variables
* - cond(): to apply conditional logic
* - drop: to remove variables
* -----------------------------------------------------

Output Preview: auto_data.csv

df <- read.csv("auto_data.csv")
knitr::kable(head(df, 20))
make price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign
Cad. Seville 15906 21 3 3.0 13 4290 204 45 350 2.24 Domestic
Cad. Eldorado 14500 14 2 3.5 16 3900 204 43 350 2.19 Domestic
Linc. Mark V 13594 12 3 2.5 18 4720 230 48 400 2.47 Domestic
Linc. Versailles 13466 14 3 3.5 15 3830 201 41 302 2.47 Domestic
Peugeot 604 12990 14 NA 3.5 14 3420 192 38 163 3.58 Foreign
Volvo 260 11995 17 5 2.5 14 3170 193 37 163 2.98 Foreign
Linc. Continental 11497 12 3 3.5 22 4840 233 51 400 2.47 Domestic
Cad. Deville 11385 14 3 4.0 20 4330 221 44 425 2.28 Domestic
Buick Riviera 10372 16 3 3.5 17 3880 207 43 231 2.93 Domestic
Olds Toronado 10371 16 3 3.5 17 4030 206 43 350 2.41 Domestic
BMW 320i 9735 25 4 2.5 12 2650 177 34 121 3.64 Foreign
Audi 5000 9690 17 5 3.0 15 2830 189 37 131 3.20 Foreign
Olds 98 8814 21 4 4.0 20 4060 220 43 350 2.41 Domestic
Datsun 810 8129 21 4 2.5 8 2750 184 38 146 3.55 Foreign
Buick Electra 7827 15 4 4.0 20 4080 222 43 350 2.41 Domestic
VW Dasher 7140 23 4 2.5 12 2160 172 36 97 3.74 Foreign
VW Scirocco 6850 25 4 2.0 16 1990 156 36 97 3.78 Foreign
Plym. Sapporo 6486 26 NA 1.5 8 2520 182 38 119 3.54 Domestic
Dodge St. Regis 6342 17 2 4.5 21 3740 220 46 225 2.94 Domestic
Merc. XR-7 6303 14 4 3.0 16 4130 217 45 302 2.75 Domestic