One-line summary: In deals with data partners or vendors, nail down data ownership over enriched/derived data up front—who owns ML features, embeddings, labels, and model outputs built from your raw data—because it directly impacts valuation, IP risk, and exit.
Why investors care
When your product moat is “data + models,” unclear data ownership over enriched/derived data can spook buyers and lenders. If a partner claims co-ownership in features, labels, or embeddings generated from your datasets, it clouds who can exploit, sell, or license those assets on exit. It also raises continuity risk (can you keep using the pipelines if the partnership ends?) and pricing pressure (royalties or MFN obligations attached to the very data that powers your model).
Practical translation: If rights aren’t explicit, you’ll end up negotiating them during diligence—when you have the least leverage.
The scenario (typical)
- You share raw customer or platform data with a partner (analytics house, labeling vendor, cloud ML provider, or channel partner).
- They run transformations: feature extraction, deduping, normalization, embeddings, labeling, knowledge graphs, or model training.
- Later, the partner says: “We created derived data (features/embeddings/labels). We own or co-own them; we can reuse them with others.”
- You realize your core advantage may have walked out the door.
The Turkish legal hooks (plain English)
1) KVKK (Turkish data protection law) — roles and purpose limitation
- If the data contains personal data, you (usually) are the data controller; your partner should be a data processor, acting only on your documented instructions.
- Purpose limitation and data minimization under KVKK support contractual fences: the processor cannot expand use into its own commercial purposes unless explicitly permitted (and, if personal data, lawfully based).
- Takeaway: Your KVKK roles and instructions should align with your IP position: controller’s purposes prevail; processor’s use is narrow and time-bound.
2) TBK (Turkish Code of Obligations) — contract is king
- Ownership and licensing of derived data in B2B settings are primarily contractual. If you don’t write it, you don’t own it.
- Good TBK drafting resolves who owns: (i) background IP/data each party brings, (ii) foreground IP/data created during the engagement, and (iii) improvements to the other party’s tools.
3) Trade secrets & unfair competition (TTK framework)
- Properly protected non-public features, labels, or recipes for feature engineering can be trade secrets.
- NDAs, technical and organizational measures, and access controls are essential to keep secret status. If you disclose without fences, you dilute your own protection.
4) Copyright/database angles (FSEK practice)
- Certain databases, annotations, and documentation can be protected if they reflect intellectual creation (selection/arrangement).
- Turkey does not mirror the EU sui generis database right verbatim, so rely on contract first, then copyright where applicable.
Bottom line: In Turkey, your cleanest route is contractual clarity reinforced by KVKK role alignment and trade secret hygiene.
What counts as “enriched/derived data”?
Spell it out in the contract. Typical categories:
- Features & embeddings: Numeric vectors, feature tables, time-series transforms, PCA/UMAP spaces.
- Labels & annotations: Human or model-assisted labeling, entity/relationship tags, quality scores.
- Aggregations & statistics: Segment stats, benchmarks, fraud rates, performance baselines.
- Model artifacts: Training/validation datasets, model weights, hyperparameters, prompts, and model cards.
- Pseudonymized/hashed variants: Still derived from your raw data, often highly valuable.
If it originates from or is computed from your source data, treat it as Derived Data unless expressly carved out.
Evidence you should maintain (to defend ownership later)
- Data lineage: End-to-end graphs showing raw → transformations → features → models (with commit IDs and timestamps).
- Feature store logs: Who wrote, read, or exported which features, when, and under what job.
- Model cards & experiment registry: Datasets, versions, metrics, and responsible team/partner for each run.
- Access controls: RBAC/ABAC snapshots, NDA acknowledgments, and DLP alerts.
- Hash proofs: Cryptographic fingerprints for dataset versions to establish provenance.
This evidence is gold in disputes, diligence, and renewals.
Contract playbook (plain clauses you can adapt)
A) Definitions that matter
- Background Data/IP: Pre-existing data, models, code, and tools owned by a party before the Effective Date (or developed independently without using the other party’s Confidential Information).
- Derived Data: Any data, label, feature, embedding, vector, graph, or metadata created, generated, or derived from Disclosing Party’s Data, including transformations and annotations.
- Aggregated Statistics: De-identified, anonymized metrics that cannot reasonably identify Disclosing Party, its customers, or data subjects (e.g., average latency, generic fraud rates).
B) Ownership of Derived Data (make it unambiguous)
Derived Data Ownership. As between the Parties, all Derived Data created from Company Data is the exclusive property of Company. Service Provider assigns, and shall cause its personnel and permitted subcontractors to assign, all right, title, and interest in the Derived Data to Company.
C) Limited carve-out for Aggregated Stats (to keep vendors comfortable)
Aggregated Statistics. Service Provider may create and use Aggregated Statistics solely to operate and improve its services, provided that (i) such statistics are irreversibly de-identified, (ii) do not include features/embeddings or labels traceable to Company Data, (iii) are not used to build or fine-tune models offered to Company’s competitors in a way that replicates Company-specific insights, and (iv) no Confidential Information is disclosed.
D) Processor role & purpose limitation (KVKK alignment)
Data Protection Roles. For any personal data within Company Data, Company is the data controller and Service Provider is the data processor acting only on Company’s documented instructions. Service Provider shall not process Company Data for its own purposes (including training or improving models) without Company’s prior written consent and a lawful basis under applicable data protection laws.
E) No training without consent (the “ML guardrail”)
Model Training Restriction. Service Provider shall not use Company Data or Derived Data to train, fine-tune, evaluate, or otherwise improve any model except models used exclusively to provide services to Company, and only during the Term. Any broader use requires a separate, express license.
F) Improvements and tool IP (avoid creeping claims)
Improvements. Each Party retains all right, title, and interest in its Background IP/tools. Improvements to Service Provider’s pre-existing tools that are general and do not embed or disclose Company Data belong to Service Provider. Any improvement that incorporates or is trained on Company Data/Derived Data in a way that conveys Company-specific patterns is Company-Owned or licensed to Company on a perpetual, royalty-free basis sufficient for continued operations.
G) Residuals (tight and safe, or exclude entirely)
If you allow residuals:
Residuals (Limited). Unassisted memories of information that is not Confidential Information may be used by individuals in the unaided recall of general ideas. This does not permit use or disclosure of Company Data, Derived Data, or any specific feature engineering methods learned from Company.
Or simply: No Residuals.
H) Exit-proofing: license-back & escrow
License-Back on Termination. Upon termination, Service Provider grants Company a perpetual, irrevocable, royalty-free license to any tooling necessary to read or use Derived Data in standard, documented formats.
Escrow & Portability. Service Provider shall deposit interface specs, export scripts, and mapping tables in escrow; updates are deposited quarterly. On request or termination, Service Provider exports all Derived Data in parquet/CSV + schema with feature dictionaries at no additional charge (or capped cost).
I) Non-compete on look-alike models (narrow and defensible)
No Look-Alike Models. Service Provider shall not provide to third parties any model or feature set that is trained on, or materially reproduces, patterns specific to Company Data/Derived Data for [12–24] months after termination.
Governance that makes the contract real
- Data Access Approvals: Joint change-control for any new data fields or purposes; DPIA as needed.
- Feature Registry: Shared registry listing all features/labels created from Company Data; ownership stamped as Company by default.
- Third-Party Flow-Down: Vendor must push these same obligations to any subcontractor (no weaker terms).
- Audit & Certification: Right to audit processing environments; require ISO/27001 or equivalent and model governance policies; annual certificate of destruction for any non-portable caches.
Red flags (pause the deal if you see these)
- “Provider may use data to improve our services” without a tight Aggregated Statistics carve-out and training limits.
- No assignment of Derived Data or vague co-ownership language.
- Broad “residuals” that swallow confidentiality.
- No KVKK role statement or purpose limitation; processor wants controller-like freedoms.
- No export formats or costs defined; portability becomes leverage against you.
- Subcontractors excluded from obligations.
Diligence checklist (investor-ready)
- The contract says Derived Data belongs to X (you).
- Aggregated Stats carve-out is narrow, de-identified, and non-repurposable.
- KVKK roles: you’re controller, partner is processor; no self-use without consent/lawful basis.
- Training clause: no training on your data except for your service, during the term.
- Evidence exists: lineage, feature store logs, model cards, and access controls.
- Exit plan: escrow, export formats, license-back.
- Flow-down to subs; audit rights; deletion timelines.
If any item is missing, price the risk—or fix it before signing.
Conclusion
Control of data ownership over enriched/derived data is not a “nice to have”; it’s your moat. Align KVKK roles with your IP position, keep trade secret hygiene, and write TBK-grade clauses that (i) assign Derived Data to you, (ii) confine Aggregated Statistics to safe, non-competitive use, (iii) block unauthorized model training, and (iv) guarantee portability on exit. That clarity preserves valuation today and prevents last-minute surprises at exit.
Yanıt yok