0% found this document useful (0 votes)
7 views16 pages

Pandoc

The document provides a comprehensive analysis of data distribution across various table sizes, highlighting key observations on transactional, lookup, and hybrid tables. It includes proposed MongoDB schema designs that consolidate relational tables into collections while maintaining necessary relationships. Additionally, it outlines schema design considerations, including document embedding, denormalization strategies, and indexing recommendations for optimized performance.

Uploaded by

Shivangi Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views16 pages

Pandoc

The document provides a comprehensive analysis of data distribution across various table sizes, highlighting key observations on transactional, lookup, and hybrid tables. It includes proposed MongoDB schema designs that consolidate relational tables into collections while maintaining necessary relationships. Additionally, it outlines schema design considerations, including document embedding, denormalization strategies, and indexing recommendations for optimized performance.

Uploaded by

Shivangi Pandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Data Distribution Analysis 2

Large Tables (>10GB) 4


Medium Tables (1GB - 10GB) 4
Small but Active Tables (100MB - 1GB) 5
Recently Created Tables (2022-2023) 5
Data Distribution Patterns: 5

Transactional Tables 6
Lookup Tables 6
Hybrid Tables (Mixed Usage) 7
Key Observations: 7
Relationship Analysis and MongoDB Schema Design 8

Inferred Relationships 8
Proposed MongoDB Schema 8
Mapping of Relational Tables to MongoDB Collections 13
Schema Design Considerations 14

Migration Design Considerations 15


Data Distribution Analysis
Large Tables (>10GB)
1. RwStop (127.3GB used / 137.9GB total)
• Highest data volume in the system (92.3% space utilization)
• Last modified January 2020, suggesting it's a critical historical data store
• Likely contains transaction stop events or process termination records
2. RwPushLogs (60.4GB used / 61.0GB total)
• Nearly 99% space utilization
• Last modified October 2020
• Probably stores notification or message push activity logs
3. RwSession (34.4GB used / 52.4GB total)
• 65.6% space utilization
• Last modified September 2020
• Stores user session data with significant unused space (potential for growth or cleanup)
4. RwHistory (44.9GB used / 51.7GB total)
• 86.8% space utilization
• Most recently modified (June 2021)
• Likely contains comprehensive historical records or audit logs
5. RwSite (11.0GB used / 19.2GB total)
• 57.5% space utilization
• Last modified September 2020
• Stores site configuration or traffic data
6. RwUser (7.1GB used / 10.9GB total)
• 65.1% space utilization
• Last modified October 2020
• Contains user profiles or user-related data

Medium Tables (1GB - 10GB)


• RwLog (5.6GB used / 6.7GB total) - 83.5% utilization
• RwPurchases (4.9GB used / 5.5GB total) - 89.4% utilization
• RwInstall (2.7GB used / 4.6GB total) - 59.7% utilization
• RwRoute (2.3GB used / 3.8GB total) - 61.6% utilization
• RwBlobInfo (2.8GB used / 2.8GB total) - 99.6% utilization (unusually high)
• RwJob (2.2GB used / 2.4GB total) - 89.3% utilization
• RwSubStatus (1.6GB used / 1.6GB total) - 99.2% utilization (almost full)

Small but Active Tables (100MB - 1GB)


• RwPurchaseLog (967MB used / 1.4GB total) - Last modified December 2016
• RwMatrix (246MB used / 1.2GB total) - Very low utilization (20.3%)
• RwMail (971MB used / 979MB total) - 99.2% utilization
• RwRefreshTokens (789MB used / 793MB total) - Recent creation (January 2023)
• RwAccount (524MB used / 566MB total) - Recently modified (June 2021)
• RwSchedule (427MB used / 546MB total)
• RwMember (503MB used / 525MB total)
• RwStats (481MB used / 495MB total)

Recently Created Tables (2022-2023)


• RwRefreshTokens (created January 2023)
• RwStopAddOns (created May 2023)
• RwPurchaseAddOns (created April 2023)
• sysdiagrams (created April 2022)

Data Distribution Patterns:


1. High Utilization Tables (>90% space used):
• RwBlobInfo, RwMail, RwSubStatus, RwPushLogs
• These tables are at risk of running out of space
2. Low Utilization Tables (<50% space used):
• RwMatrix (20.3%)
• RwTruck (59.6%)
• Several small tables with low utilization
3. Recently Modified Tables (2021):
• RwHistory, RwSubStatus, RwInstall, RwAccount, RwFedExAuth, RwWorkerStatus
• These tables show ongoing system activity
4. Static Tables (not modified for years):
• RwJob, RwMatrix (last modified December 2014)
• RwSub, RwSysdiagrams (created and modified same day)
• These appear to be reference tables or inactive functionality
5. Largest Data Consumers:
• Top 3 tables (RwStop, RwPushLogs, RwSession) account for 222GB of used space
• Represents approximately 73% of the total used storage

Transactional Tables
These tables store business events, changing data, and have high write activity:
1. RwStop (127.3GB) - Very large transaction log for stop/termination events
2. RwPushLogs (60.4GB) - High-volume notification/message delivery logs
3. RwSession (34.4GB) - User session data with frequent writes
4. RwHistory (44.9GB) - Historical transaction records with recent modifications
5. RwLog (5.6GB) - System activity logs
6. RwPurchases (4.9GB) - Customer purchase transactions
7. RwPurchaseLog (967MB) - Purchase event logging
8. RwRefreshTokens (789MB) - Authentication token transactions with recent creation
9. RwRoute (2.3GB) - Route/navigation transaction records
10. RwMail (971MB) - Email transaction records with high utilization
11. RwAccount (524MB) - User account transactions with recent modifications
12. RwSchedule (427MB) - Scheduling transactions
13. RwPurchaseAddOns (336KB) - Recently created add-on purchase transactions
14. RwStopAddOns (55MB) - Recently created add-on stop records
15. RwWorkerStatus (524KB) - Worker activity status with recent modifications
16. RwWorkerOps (254KB) - Worker operations logs

Lookup Tables
These tables store relatively static reference data with less frequent updates:
1. RwSite (11.0GB) - Site configuration data (large but infrequently modified)
2. RwUser (7.1GB) - User reference data (larger lookup table)
3. RwInstall (2.7GB) - Installation reference information
4. RwBlobInfo (2.8GB) - Binary object reference data
5. RwJob (2.2GB) - Job definition data (not modified since 2014)
6. RwSubStatus (1.6GB) - Subscription status reference
7. RwMatrix (246MB) - Reference matrix with very low utilization (20.3%)
8. RwMember (503MB) - Membership reference data
9. RwStats (481MB) - Statistical reference data
10. RwIdentity (13.8MB) - Identity reference information
11. RwZone (459KB) - Zone definition data
12. RwZoneSite (418KB) - Zone-to-site mapping
13. RwConfig (16KB) - System configuration settings
14. RwSub (41KB) - Subscription reference data
15. RwZone2 (16KB) - Secondary zone definitions
16. sysdiagrams (73KB) - System diagram references

Hybrid Tables (Mixed Usage)


These tables show characteristics of both transactional and lookup data:
1. RwOnTracSesh (350MB) - Session tracking with periodic updates
2. RwFedExAuth (65MB) - FedEx authentication with regular updates
3. RwFedExAccess (4.8MB) - FedEx access reference
4. RwApiAuth (3.1MB) - API authentication reference/logs
5. RwUserNote (16KB) - User annotation reference/transactions
6. RwOrg (270KB) - Organization data with recent updates

Key Observations:
1. Heavy Transactional Activity:
• The largest tables (RwStop, RwPushLogs, RwSession, RwHistory) are all transactional
• These four tables alone account for ~267GB (over 85% of total data)
2. Lookup Reference Pattern:
• Lookup tables generally show:
‣ Lower update frequency
‣ Creation and modification dates often close together
‣ Lower space utilization in many cases
‣ Smaller size (with exceptions like RwSite and RwUser)
3. Recent Development Focus:
• New transactional tables for purchases and add-ons
• Authentication-related transactions (RwRefreshTokens)
• Worker status monitoring

Relationship Analysis and MongoDB Schema Design


Inferred Relationships
1. User-centric Relationships:
• RwUser → RwSession (users have multiple sessions)
• RwUser → RwAccount (users have accounts)
• RwUser → RwUserNote (users have notes)
• RwUser → RwMember (users may be members)
• RwUser → RwIdentity (users have identity information)
• RwUser → RwUsageLog (tracks user activities)
• RwUser → RwUserPref (stores user preferences)
2. Purchase-related Relationships:
• RwPurchases → RwPurchaseLog (purchases generate logs)
• RwPurchases → RwPurchaseAddOns (purchases have add-ons)
• RwUser → RwPurchases (users make purchases)
3. Site Hierarchy Relationships:
• RwSite → RwZoneSite (sites belong to zones)
• RwZone → RwZoneSite (zones contain sites)
• RwZone2 → RwZoneSite (secondary zone relationships)
4. Worker Management Relationships:
• RwWorkerStatus → RwWorkerOps (worker status and operations)
5. Logging Relationships:
• RwStop → RwStopAddOns (stop events have add-ons)
• Multiple entities → RwLog, RwHistory (system-wide logging)
6. Authentication Flow:
• RwUser → RwRefreshTokens (user authentication tokens)
• RwApiAuth (API authentication)
• RwFedExAuth, RwFedExAccess (federated authentication)

Proposed MongoDB Schema


// User Collection
db.users = {
_id: ObjectId,
username: String,
email: String,
password: String, // hashed
firstName: String,
lastName: String,
createdAt: Date,
updatedAt: Date,
lastLogin: Date,
status: String,
preferences: {
\// Embedded document (from RwUserPref)
theme: String,
notifications: Boolean,
language: String
},
identityInfo: {
\// Embedded document (from RwIdentity)
verificationLevel: String,
identityType: String,
verifiedAt: Date
},
\// References
accountId: ObjectId // Reference to accounts collection
}
// Accounts Collection
db.accounts = {
_id: ObjectId,
accountNumber: String,
type: String,
status: String,
createdAt: Date,
updatedAt: Date,
balance: Number,
\// Embedded stats
stats: {
totalPurchases: Number,
totalSpent: Number,
lastActivity: Date
}
}
// Sessions Collection
db.sessions = {
_id: ObjectId,
userId: ObjectId, // Reference to users collection
token: String,
ipAddress: String,
userAgent: String,
device: String,
startTime: Date,
endTime: Date,
active: Boolean,
\// For tracking on-site activity
trackedActivity: {
pageViews: Number,
lastPage: String,
duration: Number
}
}
// Purchases Collection
db.purchases = {
_id: ObjectId,
userId: ObjectId, // Reference to users collection
orderNumber: String,
amount: Number,
status: String,
createdAt: Date,
updatedAt: Date,
paymentMethod: String,
\// Embedded items array
items: [
{
itemId: String,
name: String,
quantity: Number,
price: Number
}
],
\// Embedded add-ons (from RwPurchaseAddOns)
addOns: [
{
addOnId: String,
name: String,
price: Number,
quantity: Number
}
],
\// Shipping info
shipping: {
address: String,
city: String,
state: String,
zip: String,
carrier: String,
trackingNumber: String
}
}
// Sites Collection
db.sites = {
_id: ObjectId,
name: String,
url: String,
status: String,
createdAt: Date,
updatedAt: Date,
\// Embedded config
config: {
theme: String,
features: [String],
maxUsers: Number
},
\// Zones relationship
zones: [ObjectId] // References to zones collection
}
// Zones Collection
db.zones = {
_id: ObjectId,
name: String,
description: String,
createdAt: Date,
updatedAt: Date,
status: String,
\// Embedded sites within this zone
siteIds: [ObjectId] // References to sites collection
}
// Logs Collection (consolidated from multiple log tables)
db.logs = {
_id: ObjectId,
timestamp: Date,
logType: String, // "system", "push", "purchase", "user", "stop", etc.
level: String, // "info", "warning", "error"
message: String,
details: Object, // Flexible schema for different log types
userId: ObjectId, // Optional reference to users
sessionId: ObjectId, // Optional reference to sessions
metadata: {
ipAddress: String,
userAgent: String,
source: String
}
}
// Workers Collection
db.workers = {
_id: ObjectId,
workerId: String,
name: String,
status: String,
createdAt: Date,
updatedAt: Date,
lastActive: Date,
\// Embedded operational data
operations: {
currentTasks: Number,
completedTasks: Number,
failedTasks: Number,
uptime: Number
},
\// Historical status changes
statusHistory: [
{
status: String,
timestamp: Date,
reason: String
}
]
}
// Authentication Collection (consolidating auth-related tables)
db.authentication = {
_id: ObjectId,
userId: ObjectId, // Reference to users collection
type: String, // "refresh", "api", "fedex", etc.
token: String,
issuedAt: Date,
expiresAt: Date,
lastUsed: Date,
scope: [String],
device: String,
revoked: Boolean,
revokedReason: String
}

Mapping of Relational Tables to MongoDB Collections


Below is a table showing which original relational tables were consolidated into each MongoDB
collection:
MongoDB Source Relational Tables
Collection
users RwUser, RwUserPref, RwIdentity, RwUserNote, RwMember
accounts RwAccount, RwSubStatus, RwSub
sessions RwSession, RwOnTracSesh
purchases RwPurchases, RwPurchaseLog, RwPurchaseAddOns, RwPayRecovery
sites RwSite, RwPlace, RwBlobInfo
zones RwZone, RwZone2, RwZoneSite
logs RwLog, RwHistory, RwPushLogs, RwUsageLog, RwStop, RwStopAddOns,
RwMailBox, RwMail
workers RwWorkerStatus, RwWorkerOps, RwJob
authentication RwRefreshTokens, RwApiAuth, RwFedExAuth, RwFedExAccess,
RwFedExStop

The schema design consolidates related tables to reduce redundancy and take advantage of
MongoDB's document model, while preserving the necessary relationships between entities.

Schema Design Considerations


1. Document Embedding vs. References:
• Embedded user preferences and identity info within user documents for frequently accessed
data
• Used references for accounts, purchases, and other entities that may grow independently
• Consolidated similar log tables into a single collection with a flexible schema
2. Denormalization Strategy:
• Added summary stats in the accounts collection to reduce aggregation queries
• Embedded purchase items and add-ons directly in purchase documents
• Kept session tracking data with session documents
3. Indexing Recommendations:
• Create indexes on userId fields in sessions, purchases, and authentication collections
• Index the timestamp field in logs collection
• Create compound indexes on status and updatedAt fields for frequently queried collections
4. Performance Optimizations:
• Consolidated multiple log-type tables into a single flexible logs collection
• Embedded frequently accessed related data
• Used array fields for one-to-many relationships with limited cardinality
• Maintained separate collections for entities with high cardinality relationships.

Migration Design Considerations


1. Source Data Analysis
Large Tables Assessment
• RwStop (137GB) - Primary operational data
• RwPushLogs (61GB) - Historical notification data
• RwSession (52GB) - User session tracking
• RwHistory (51GB) - Historical activity data
• RwSite (19GB) - Location information
Data Relationship Complexity
• User-related tables (RwUser, RwUserPref, RwMember)
• Stop-related tables (RwStop, RwSite, RwZone)
• Transaction tables (RwPurchases, RwPurchaseLog)
• Authentication chains (RwFedExAuth, RwUserAuth)
Data Quality Analysis
1. Historical data integrity in RwHistory
2. Session state consistency
3. Purchase transaction completeness
2. Target MongoDB Design
Core Collections Structure
• Users collection (consolidating user-related tables)
• Stops collection (location and site information)
• Sessions collection (active user sessions)
• Transactions collection (purchase history)
• Operational collection (system metrics)
Data Distribution Strategy
• Sharding for RwStop collection
• Time-series implementation for RwHistory
• Session data partitioning
• Geographic distribution of site data
• User data segmentation
3. Migration Strategy
Phased Migration Approach
• Core user data migration
• Historical data transfer
• Session state transition
• Location data migration
• Transaction history transfer
Performance Optimization
• Batch processing for large tables
• Parallel migration streams
• Index strategy during migration
• Memory allocation for large operations
• Network bandwidth optimization

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy