Example output
You can create a macro in Excel using VBA to automate the data cleaning process described. Below is the code for the macro, along with instructions on how to add it to your Excel workbook:
### Steps to Add the Macro:
1. Press `Alt + F11` to open the VBA editor in Excel.
2. In the editor, go to `Insert` > `Module` to create a new module.
3. Copy and paste the VBA code provided below into the module.
4. Press `F5` to run the macro or assign it to a button in Excel.
### VBA Code for Data Cleaning:
```vba
Sub CleanCustomerData()
Dim ws As Worksheet
Dim emailRange As Range
Dim nameRange As Range
Dim phoneRange As Range
Dim lastRow As Long
Dim duplicatesRemoved As Long
Dim missingValuesReplaced As Long
Dim nameChanges As Long
Dim selectedSheet As String
Dim wsFound As Boolean
Dim sheetCount As Integer
Dim i As Integer
' Prompt user to select the worksheet
selectedSheet = InputBox("Enter the name of the worksheet to clean:")
' Check if worksheet exists
sheetCount = ThisWorkbook.Sheets.Count
wsFound = False
For i = 1 To sheetCount
If ThisWorkbook.Sheets(i).Name = selectedSheet Then
wsFound = True
Exit For
End If
Next i
If wsFound = False Then
MsgBox "Worksheet not found. Please try again.", vbExclamation
Exit Sub
End If
Set ws = ThisWorkbook.Sheets(selectedSheet)
' Find the last row of data in the worksheet
lastRow = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
' 1. Remove duplicates based on the Email column (assuming column "B")
Set emailRange = ws.Range("B1:B" & lastRow)
duplicatesRemoved = emailRange.RemoveDuplicates(Columns:=1, Header:=xlYes)
' 2. Trim leading and trailing spaces in the dataset
Dim cell As Range
For Each cell In ws.UsedRange
If IsEmpty(cell.Value) = False Then
cell.Value = Trim(cell.Value)
End If
Next cell
' 3. Convert all text in the "Names" column to Uppercase (assuming column "A")
Set nameRange = ws.Range("A2:A" & lastRow)
nameChanges = 0
For Each cell In nameRange
If IsEmpty(cell.Value) = False And cell.Value <> UCase(cell.Value) Then
cell.Value = UCase(cell.Value)
nameChanges = nameChanges + 1
End If
Next cell
' 4. Replace missing values in the "Phone Number" column with "N/A" (assuming column "C")
Set phoneRange = ws.Range("C2:C" & lastRow)
missingValuesReplaced = 0
For Each cell In phoneRange
If IsEmpty(cell.Value) Then
cell.Value = "N/A"
missingValuesReplaced = missingValuesReplaced + 1
End If
Next cell
' Generate summary report
Dim report As String
report = "Data Cleaning Summary:" & vbCrLf & vbCrLf
report = report & "1. Duplicates removed: " & (duplicatesRemoved - 1) & vbCrLf
report = report & "2. Missing values replaced in 'Phone Number' column: " & missingValuesReplaced & vbCrLf
report = report & "3. Text formatting changes in 'Names' column: " & nameChanges & vbCrLf
MsgBox report, vbInformation, "Cleaning Complete"
End Sub
```
### Explanation of the Code:
1. **Worksheet Selection**: The macro prompts the user to input the name of the worksheet on which the macro should run.
2. **Remove Duplicates**: It removes duplicates based on the "Email" column (column B is assumed to be the Email column).
3. **Trim Spaces**: It trims all leading and trailing spaces in the entire dataset.
4. **Convert Names to Uppercase**: It converts all text in the "Names" column (column A) to uppercase.
5. **Replace Missing Phone Numbers**: It replaces any missing values in the "Phone Number" column (column C) with "N/A".
6. **Summary Report**: At the end, the macro generates a report summarizing:
- The number of duplicates removed.
- The number of missing values replaced.
- The number of changes made to the "Names" column.
You can modify the column references (A, B, C) in the code to match the actual structure of your dataset if it is different.